15
Contrasting Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,* ,1,2 Zhen Li, 3,4 Yves Van de Peer, 3,4,5 and Par K. Ingvarsson 2,6 1 Department of Plant Sciences, University of California–Davis, Davis, CA 2 Department of Ecology and Environmental Science, Umea ˚ University, Umea ˚, Sweden 3 Department of Plant Systems Biology, VIB, Ghent, Belgium 4 Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium 5 Genomics Research Institute, University of Pretoria, Hatfield Campus, Pretoria, South Africa 6 Department of Plant Biology, Uppsala Biocenter, Swedish University of Agricultural Sciences, Uppsala, Sweden *Corresponding author: E-mail: [email protected]. Associate editor: Stephen Wright Abstract The majority of variation in rates of molecular evolution among seed plants remains both unexplored and unexplained. Although some attention has been given to flowering plants, reports of molecular evolutionary rates for their sister plant clade (gymnosperms) are scarce, and to our knowledge differences in molecular evolution among seed plant clades have never been tested in a phylogenetic framework. Angiosperms and gymnosperms differ in a number of features, of which contrasting reproductive biology, life spans, and population sizes are the most prominent. The highly conserved mor- phology of gymnosperms evidenced by similarity of extant species to fossil records and the high levels of macrosynteny at the genomic level have led scientists to believe that gymnosperms are slow-evolving plants, although some studies have offered contradictory results. Here, we used 31,968 nucleotide sites obtained from orthologous genes across a wide taxonomic sampling that includes representatives of most conifers, cycads, ginkgo, and many angiosperms with a sequenced genome. Our results suggest that angiosperms and gymnosperms differ considerably in their rates of molec- ular evolution per unit time, with gymnosperm rates being, on average, seven times lower than angiosperm species. Longer generation times and larger genome sizes are some of the factors explaining the slow rates of molecular evolution found in gymnosperms. In contrast to their slow rates of molecular evolution, gymnosperms possess higher substitution rate ratios than angiosperm taxa. Finally, our study suggests stronger and more efficient purifying and diversifying selection in gymnosperm than in angiosperm species, probably in relation to larger effective population sizes. Key words: gymnosperms, angiosperms, substitution rates, selection, mutation, life-history traits. Introduction The study of the forces of mutation and selection and their effects at the molecular and phenotypic levels is crucial to understand how species have evolved over time (Lynch 2010). Under a strictly neutral model, the fate of substitutions at the molecular level is mainly determined by mutation and ran- dom genetic drift instead of by natural selection (Kimura 1968). Under the neutral theory, mutations are assumed to be selectively neutral, nearly neutral (S ¼ 0) or strongly dele- terious (S ¼ 1) while advantageous mutations are assumed to be too rare to have a significant effect on sequence evo- lution. Later, Ohta (1992) proposed a modified version of the neutral theory in which a substantial fraction of substitutions are caused by the random fixation of slightly deleterious mu- tations, and that a small fraction of all new mutations may have positive selection coefficients (the so called “nearly neu- tral model”). Although the strictly neutral model remains the most commonly used null model in population genetics, the controversy regarding the different models of evolution often comes down to a discussion about the importance of positive selection (Nielsen and Yang 2003). Selective pressures on amino acid mutations are often measured in comparative studies using the ratio of non- synonymous to synonymous substitution rates (also called “omega” and denoted by x or dN/dS). When an amino acid change is neutral, the rate of fixation will be the same as that of a synonymous mutation, and x ¼ 1. Amino acid altering substitution rate ratios are denoted by x < 1 or x > 1, indicating negative selection, and positive selection, respectively (Yang and Nielsen 2002). Omega thus also gives an estimate of the rate of substitution at selected sites in comparison to the neutral substitution rate. Several codon- based likelihood models have been developed to study the distribution of x among sites (site models) or among branches (branch models) using a phylogenetic framework (Yang et al. 1997; Yang and Nielsen 1998; Yang et al. 2000; Nielsen and Yang 2002). Estimating x along particular line- ages of a phylogeny allows the testing of hypotheses regarding Article ß The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Open Access Mol. Biol. Evol. 34(6):1363–1377 doi:10.1093/molbev/msx069 Advance Access publication February 25, 2017 1363

Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

  • Upload
    lekiet

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

Contrasting Rates of Molecular Evolution and Patterns ofSelection among Gymnosperms and Flowering Plants

Amanda R De La Torre12 Zhen Li34 Yves Van de Peer345 and Peuroar K Ingvarsson26

1Department of Plant Sciences University of CaliforniandashDavis Davis CA2Department of Ecology and Environmental Science Umea University Umea Sweden3Department of Plant Systems Biology VIB Ghent Belgium4Department of Plant Biotechnology and Bioinformatics Ghent University Ghent Belgium5Genomics Research Institute University of Pretoria Hatfield Campus Pretoria South Africa6Department of Plant Biology Uppsala Biocenter Swedish University of Agricultural Sciences Uppsala Sweden

Corresponding author E-mail ardelatorreucdavisedu

Associate editor Stephen Wright

Abstract

The majority of variation in rates of molecular evolution among seed plants remains both unexplored and unexplainedAlthough some attention has been given to flowering plants reports of molecular evolutionary rates for their sister plantclade (gymnosperms) are scarce and to our knowledge differences in molecular evolution among seed plant clades havenever been tested in a phylogenetic framework Angiosperms and gymnosperms differ in a number of features of whichcontrasting reproductive biology life spans and population sizes are the most prominent The highly conserved mor-phology of gymnosperms evidenced by similarity of extant species to fossil records and the high levels of macrosynteny atthe genomic level have led scientists to believe that gymnosperms are slow-evolving plants although some studies haveoffered contradictory results Here we used 31968 nucleotide sites obtained from orthologous genes across a widetaxonomic sampling that includes representatives of most conifers cycads ginkgo and many angiosperms with asequenced genome Our results suggest that angiosperms and gymnosperms differ considerably in their rates of molec-ular evolution per unit time with gymnosperm rates being on average seven times lower than angiosperm speciesLonger generation times and larger genome sizes are some of the factors explaining the slow rates of molecular evolutionfound in gymnosperms In contrast to their slow rates of molecular evolution gymnosperms possess higher substitutionrate ratios than angiosperm taxa Finally our study suggests stronger and more efficient purifying and diversifyingselection in gymnosperm than in angiosperm species probably in relation to larger effective population sizes

Key words gymnosperms angiosperms substitution rates selection mutation life-history traits

IntroductionThe study of the forces of mutation and selection and theireffects at the molecular and phenotypic levels is crucial tounderstand how species have evolved over time (Lynch 2010)Under a strictly neutral model the fate of substitutions at themolecular level is mainly determined by mutation and ran-dom genetic drift instead of by natural selection (Kimura1968) Under the neutral theory mutations are assumed tobe selectively neutral nearly neutral (Sfrac14 0) or strongly dele-terious (Sfrac14 ndash1) while advantageous mutations are assumedto be too rare to have a significant effect on sequence evo-lution Later Ohta (1992) proposed a modified version of theneutral theory in which a substantial fraction of substitutionsare caused by the random fixation of slightly deleterious mu-tations and that a small fraction of all new mutations mayhave positive selection coefficients (the so called ldquonearly neu-tral modelrdquo) Although the strictly neutral model remains themost commonly used null model in population genetics thecontroversy regarding the different models of evolution often

comes down to a discussion about the importance of positiveselection (Nielsen and Yang 2003)

Selective pressures on amino acid mutations are oftenmeasured in comparative studies using the ratio of non-synonymous to synonymous substitution rates (also calledldquoomegardquo and denoted by x or dNdS) When an aminoacid change is neutral the rate of fixation will be the sameas that of a synonymous mutation and xfrac14 1 Amino acidaltering substitution rate ratios are denoted by xlt 1 orxgt 1 indicating negative selection and positive selectionrespectively (Yang and Nielsen 2002) Omega thus also givesan estimate of the rate of substitution at selected sites incomparison to the neutral substitution rate Several codon-based likelihood models have been developed to study thedistribution of x among sites (site models) or amongbranches (branch models) using a phylogenetic framework(Yang et al 1997 Yang and Nielsen 1998 Yang et al 2000Nielsen and Yang 2002) Estimating x along particular line-ages of a phylogeny allows the testing of hypotheses regarding

Article

The Author 2017 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(httpcreativecommonsorglicensesby-nc40) which permits non-commercial re-use distribution and reproduction in anymedium provided the original work is properly cited For commercial re-use please contact journalspermissionsoupcom Open AccessMol Biol Evol 34(6)1363ndash1377 doi101093molbevmsx069 Advance Access publication February 25 2017 1363

the relative effects of selection in certain lineages of interest(Nielsen 2005) and also for the estimation of the distributionof selection coefficients among taxa Because positive selectionmost likely affects only a few sites at a few time points theevaluation of individual sites (site models) and branches(branch models) has more power to detect adaptive evolu-tion than the pairwise sequence comparison in which substi-tution rates are averaged over all amino acid sites (Yang 2002)

Substitution rates have been reported to be variableacross the tree of life however the causes underlying thisvariation remain uncertain In flowering plants this variationhas been explained by differences in life forms height gen-eration times genome size environmental variables andspecies richness (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013 Bromham et al 2015) Variablesubstitution rates have also been reported across nuclearand organelle genomes and among genes with differentfunctional categories (Bromham et al 2015) Althoughconsiderable attention has been given to flowering plants(angiosperms) very few reports on the rate of molecularevolution exist for their sister seed plant clade the gymno-sperms and to our knowledge differences in rates of molec-ular evolution among angiosperms and gymnosperms havenever been tested within a phylogenetic framework

Gymnosperms are an ancient and widespread plant cladethat represent four of the five main lineages of seed plantsand includes cycads ginkgos gnetophytes and conifers(Wang and Ran 2014) Gymnosperm lineages separatedfrom each other during the Late Carboniferous to the LateTriassic (311ndash212 mya) earlier than the occurrence of theearliest extant angiosperms around 300 mya (Magallon et al2013)DespitebeingdominantthroughmostoftheMesozoicgymnosperms were severely affected by extreme climaticshifts especially during the late Neogene which may have fa-vored the disproportionate loss of ancient lineages and theirreplacement byyounger lineages intheNorthernHemisphere(Won and Renner 2006 Crisp and Cook 2011 Leslie et al2012) Cenozoic extinctions may have contributed to thelow diversity of extant gymnosperms (Wang and Ran 2014)however they are unlikely to explain the 30-fold difference inspecies diversity between gymnosperm and angiospermspecies

Besides differences in species diversity gymnosperms andangiosperms differ in a number of features of which contrast-ing reproductive biology (mating system pollination typeand seed morphology) physiology (water-conducting sys-tems) and life spans are the most prominent (Leitch andLeitch 2012 De La Torre et al 2014a) Gymnosperms aretypically outcrossing species in which wind plays the mainrole in the pollination and dispersal of uncovered seeds Inaddition widespread gymnosperms are thought to have largeeffective population sizes and weak population structure(Neale and Kremer 2011) The recent genome sequencingof gymnosperm species has revealed that the enormousgenomes of gymnosperms (20ndash40 Gb) are mainly composedof large and variable sets of transposable elements and thatthey have similar numbers of protein-coding genes comparedto other plant species (De La Torre et al 2014a)

The highly conserved morphology of gymnosperms evi-denced by the similarity of extant species to the earliest fossilrecords and the high levels of macrosynteny among conifers(Pavy et al 2012) have led scientists to believe that gymno-sperms are slow-evolving plants (Won and Renner 2006)However this remains a controversial issue with studiesshowing opposing results (Willyard et al 2007 Palme et al2009 Buschiazzo et al 2012 Chen et al 2012) Limitations ofprevious studies include the use of a small number of genesandor species the use of highly diverged species comparedusing different gene sets and the lack of a phylogenetic frame-work that includes species from different taxonomic familiesConsidering that the rate of molecular evolution stronglydepends on the selective constraints of proteins or aminoacids and these constraints are variable among genes itseems risky to conclude significant differences in rates of mo-lecular evolution from the comparison among different genesets Moreover a recent whole-genome study in Picea specieshas found contrasting rates of sequence divergence amonggenes in relation to their functional category duplicationstatus and gene family size (De La Torre et al 2015a)Evolutionary and phylogenetic analyses among gymnospermsand angiosperms have also been limited by the absence oforthologous genes that allow for such comparisons

In this study we used a newly identified set of 42 single-copy genes obtained from whole genomic andor transcrip-tomic data from a broad taxonomic sampling that includes allconifers (with the exception of Araucariaceae) cycads ginkgoand many angiosperms with sequenced genomes (Li Z De LaTorre AR Sterck L Canovas FM Avila C Von Arnold SIngvarsson PK Van de Peer Y in review) We aimed to testfor differences in the rate of molecular evolution among gym-nosperm and angiosperm species and to understand thepossible causes driving any such variation With this wehope we can contribute to a better understanding of thecomplex evolutionary relationships among major plant cladesand to elucidate the main evolutionary processes that haveshaped the seed plants we see today

Results

Rates of Sequence DivergenceWhen evaluating pairwise estimates between species for eachtaxonomic family in the angiosperm and gymnosperm phy-logeny (fig 1) we found significant differences in the rates ofsequence divergence between the two major plant cladesThe results of the sign test indicated that dN dS and xwere significantly different between angiosperms and gymno-sperms across the 42 genes evaluated The number of synon-ymous substitutions per site (dS) was lower in gymnospermsthan in angiosperms in all of the 42 genes studied (Plt 0001)Similarly the number of non-synonymous substitutions persite (dN) was lower in gymnosperms for 86 of the studiedgenes (Plt 0001) whereas the ratio of non-synonymoussynonymous substitutions (x) was higher in gymnospermsfor 762 of the genes (Plt 0001) (see supplementary tableS3 Supplementary Material Online) Consistent results werefound when comparing branch estimations of dS

De La Torre et al doi101093molbevmsx069 MBE

1364

(tfrac14 3 dffrac14 5787 Plt 001) and x (t frac14 ndash40379 dffrac14 378Plt 0001) however dN was not significantly differentbetween angiosperms and gymnosperms when comparingbranch estimations in all terminal branches (t frac14 ndash0319dffrac14 5234 Pfrac14 0751) (see supplementary table S4Supplementary Material Online) The most contrasting differ-ences were found in the absolute rates of silent-site diver-gence (l) with angiosperms rates being on average seventimes higher than that observed in gymnosperm species(535 10 9 vs 771 10 10 synonymous substitu-tionssiteyear respectively) Families showing the highestrates were Brassicaceae and Poaceae and the ones with thelowest rates were Cuppresaceae and PodocarpaceaeWithin gymnosperms Gnetophytes had the highest ratesof sequence divergence (table 1)

Estimation of the Variation in Selective Pressures (x)among BranchesWe estimated the variation in selective pressures (x) amongstem branches in the gymnosperm and angiosperm phylog-enies using the branch models in PAML (fig 1) The results ofthe model testing indicated that the most parameter-richhypothesis (H4) fits the data best This hypothesis assumesthat all ancestral and terminal branches have different ome-gas The second best hypothesis was H3 which suggests along-term shift in selective pressure resulting in both lineageshaving different omegas (x06frac14xA 6frac14xG) Likelihood ratiotests were significant for H1ndashH3 (2Dlfrac14 3562 Plt 0001)and H2ndashH3 (2Dlfrac14 12931 Plt 0001) Hypothesis H2 rankedthird followed by H1 and hypothesis H0 where there wereno differences in selective pressures between lineages(x0frac14xGfrac14xA) had the worst fit to the data (table 2)Omega ratios for gymnosperms were significantly higher

than for angiosperms in all hypotheses tested with the excep-tion of H0 (one-ratio model)

Estimation of Substitution Parameters Using SiteModelsA discrete model (M3) that uses an unconstrained discretedistribution to model heterogeneous omega ratios amongsites (Yang et al 2000) showed the best fit to the data inall angiosperm taxonomic families with the exception ofBrassicaceae (model 7 beta) The nearly neutral model(model 1a) that assumes a proportion of conserved siteswith x0lt1 and a proportion of neutral sites with x1frac141and the selection model (M2a) that allows an additional classof sites with x2gt1 had the worst fit in all angiosperm taxaExcluding poorly fitted M1a and M2a the average x amongall models ranged from 0092 to 0093 (Brassicaceae) 0106ndash0108 (Malvaceae) 0117ndash0125 (Fabaceae) 0118ndash0156(Rosaceae) and 0131ndash0133 (Poaceae) Average x ratios sug-gest a nonsynonymous mutation has only 9ndash13 as muchchance as a synonymous mutation of being fixed suggestingmost sites are highly conserved in angiosperm taxa

In gymnosperms M8 (beta amp x) showed the best fit inCupressaceae and Cycads M10 (beta amp gammathorn1) in Pinusand M2a (selection) in Picea The beta model (M7) had theworst fit to the data in Picea and Pinus and the gamma model(M5) in Cupressaceae and Cycads Excluding poorly fitted M7for Picea and Pinus and M5 for Cuppressaceae and Cycadsthe average x among all models ranged from 066ndash067(Pinus) 037ndash041 (Picea) 0167ndash0178 (Cuppresaceae) and02ndash0216 (Cycads) Overall these estimates suggest a signif-icantly higher chance of fixation of nonsynonymous muta-tions in gymnosperms (17ndash67) than in angiosperm taxaParameters estimates for each of the six site models testedin each of the five angiosperm and four gymnosperm families

Table 1 Absolute Rates of Silent-Site Divergence (l) for Each Taxonomic Family or Subfamily Based on 42 Single-Copy Nuclear Genes in 31Gymnosperms and 34 Angiosperms Species

Plant Clade Taxa Subtaxa dS T (years) l (siteyear)

Gymnosperms Pinaceae 0102 728067Endash10Pinus 0070 84500000 416479Endash10

Cedrus_Abies 0128 129700000 49387Endash10Larix_Pseudotsuga 0148 61600000 119939Endash09

Picea 0061 38300000 802526Endash10Cupressaceae 0189 159200000 592216Endash10

Taxaceae 0225 153000000 734573Endash10Podocarpaceae 0170 146100000 581248Endash10

Cycadales 0333 248100000 670325Endash10Gnetophytes 0073 25000000 145547Endash09

Angiosperms Brassicaceae 0352 27000000 651648Endash09Malvaceae 0373 40900000 456Endash09

Euphorbiaceae 0463 51000000 454382Endash09Fabaceae 0502 53100000 472809Endash09

Cucurbitaceae 0239 20000000 59795Endash09Rosaceae 0421 52300000 402113Endash09

Solanaceae 0084 7300000 575068Endash09Poaceae 0372 60277Endash09

Poaceae I 0457 39700000 576174Endash09Poaceae II 0287 22800000 629366Endash09

NOTEmdashDivergence times (T) are based on fossil calibration data and published studies (see supplementary table S2 Supplementary Material Online)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1365

are included in supplementary table S5 SupplementaryMaterial Online

When comparing the proportion of conserved neutraland positively selected sites under the discrete model(M3) we observed that all taxa had a higher proportionof conserved followed by neutral and then by positivelyselected sites Conserved sites (xlt 1) in angiospermsranged from 55 to 77 whereas gymnosperm sitesranged from 66 to 76 differences between taxa werenot significant (Pfrac14 062) The proportion of neutral sites(xfrac14 1) ranged from 21 to 39 in angiosperms andfrom 21 to 28 in gymnosperms however the differ-ences between medians were not significant (Pfrac14 022)

(supplementary table S5 Supplementary MaterialOnline)

Inference of the Proportion of Sites under SelectionModels that allow for sites under positive selection such asM2a M3 and M8 all suggested the presence of a very smallnumber of weakly positively selected sites in angiosperm taxaModels M3 and M8 identified between 07ndash13 sites underweak diversifying selection (xfrac14 165 and xfrac14 138 respec-tively) in Brassicaceae However likelihood ratio test statisticsfor comparing M1a and M2a and M7 and M8 did not showsignificant results (table 3) In addition the NEB analyses didnot identify any sites under positive selection For Malvaceae

Table 2 Parameter Estimates under Models of Variable Omega (x) among Branches from the Gymnosperm and Angiosperm Lineages Based on aConcatenated Alignment of 31737 Nucleotide Sites in 61 Species

Hypothesis Model No of Parameters Background Foreground Parameter Estimates ln L AIC

H0 M0 1 x0frac14xG frac14 xA xfrac14 011356 ndash62177774 124355748H1 M2 2 x0frac14xG xA x0frac14xG frac14 01238 xA frac14 00989 ndash6216446499 12432933H2 M2 2 x0frac14xA xG x0frac14xA frac14 00935 xG frac14 02603 ndash6199280209 1239860042H3 M2 3 x0 xA xG x0frac14 00857 xA frac14 01024 xG frac14 02631 ndash619863362 1239732724H4 M1 120 x ranges between 005 to 076 ndash6175988386 1235437677

NOTEmdashWe tested the following hypotheses H0 Homogeneous selective pressure in both clades (x0frac14 xG frac14 xA) H1 Selective Pressure in the Angiosperm clade (x0frac14 xGxA) H2 selective pressure in the gymnosperm clade (x0frac14 xA xG) H3 long-term shift in selective pressure resulting in both clades having different omegas (x0 xG xA)and H4 all terminal branches in both clades have different omegas For hypotheses H1 to H3 xG and xA identify estimated omegas on stem branches of the gymnosperm andangiosperm lineages according to phylogenetic tree in figure 1 x0 represents the estimated omega in unselected branches (background) Parameter estimates of model M1(hypothesis H4) are detailed in supplementary table S7 Supplementary Material Online

02

Prumnopitys andina

Pinus massoniana

Larix kaempferi

Podocarpus macrophyllus

Oryza sativa ssp japonica

Sorghum bicolor

Sequoia sempervirens

Arabidopsis lyrata

Citrullus lanatus

Lotus japonicus

Beta vulgaris

Ginkgo biloba

Gnetum gnemon

Hordeum vulgare

Malus domestica

Pinus sylvestris

Fragaria vesca

Cryptomeria japonica

Brassica rapa

Ricinus communis

Physcomitrella patens

Populus trichocarpa

Citrus sinensis

Eucalyptus grandis

Brachypodium distachyon

Amborella trichopoda

Welwitschia mirabilis

Juniperus scopulorum

Abies alba

Zea mays

Picea glauca

Theobroma cacaoGossypium raimondii

Medicago truncatula

Manihot esculenta

Pinus pinaster

Cunninghamia lanceolata

Gnetum montanum

Pseudotsuga menziesii

Ephedra sinica

Vitis vinifera

Thellungiella parvula

Taxus baccata

Pinus taeda

Musa acuminata

Cephalotaxus harringtonia

Glycine max

Oryza sativa ssp indica

Zamia vazqueziiCycas rumphii

Cycas micholitzii

Setaria italica

Cedrus libani

Pinus contortaPinus banksiana

Picea abies

Carica papaya

Arabidopsis thaliana

Cucumis melo

Prunus persica

Pinus lambertiana

Capsella rubella

Picea sitchensis

Solanum tuberosumSolanum lycopersicum

Sciadopitys verticillata

100

100

100

100

100

100

100

100

9 5

9 0

100

100

100

100

100

100

100

100

100100

100

100

100

100

100

100

100

100

9 0

100

100

100

100

8 4

100

100

100

100

100

100

100

100

8 3

100

8 8

100

100

100

100

100

100

8 8

9 8

100

100

100

9 5

100

100

100

100

100

100

FIG 1 Phylogenetic tree obtained from a concatenated alignment of 42 single-copy genes and 66 species distributed between two main seed plantlineages (angiosperms in green gymnosperms in blue and outgroup in black) inferred by RAxML Stem branches for angiosperms and gymno-sperms were used for model testing using the branch models implemented in codeml Results of the model testing can be found in table 2

De La Torre et al doi101093molbevmsx069 MBE

1366

four sites under positive selection were identified by the BEBanalysis (Pgt 95) and the likelihood ratio test for compar-ison between M7ndashM8 was significant (2Dlfrac14 33063Plt 0001 dffrac14 2) However none of the other tested modelsidentified sites under positive selection Weakly to moderatediversifying selection (xfrac14 189 and xfrac14 275) was found in avery small proportion of sites (11 and 05) in Fabaceaebased on the results of M3 and M8 In addition the NEBanalysis identified two sites under positive selection usingM3 and the likelihood ratio test for comparison betweenM7ndashM8 was significant (2Dlfrac14 2274 Plt 0001 dffrac14 2) InRosaceae M3 and M8 identified 02 of sites under strongdiversifying selection In addition the BEB analysis identifiedtwo sites under positive selection using M3 and the likelihoodratio test for comparison between M7ndashM8 was significant(2Dlfrac14 642 Plt 005 dffrac14 2) Finally weakly diversifyingselection (xfrac14 153 and xfrac14 185) was found in 07 and13 sites in Poaceae Four sites were identified by the BEBanalyses in both M3 and M8 and the likelihood ratio testfor M7ndashM8 was significant (2Dlfrac14 15428 Plt 0001 dffrac14 2)(table 3 see supplementary table S5 Supplementary MaterialOnline)

In contrast to angiosperm taxa evidence for moderate tostrong diversifying selection and a higher number of sitesunder positive selection was found in gymnosperm taxa InPinus M2a M3 and M8 identified 163 2313 and 2221of sites under moderate to strong diversifying selection(xfrac14 32 xfrac14 273 and xfrac14 28 respectively) This taxo-nomic family had the highest number of sites under positiveselection (240 245 and 1070) evidenced by the results of theBEB analyses under M2a and M8 and the NEB analysis underM3 Likelihood ratio tests were significant for both the M1andashM2a (2Dlfrac14 51146 Plt 0001 dffrac14 2) and the M7ndashM8(2Dlfrac14 54937 Plt 0001 dffrac14 2) comparisons Similarly inPicea M2a identified 2148 of weakly selected sites(xfrac14 119) and 325 of sites under strong positive selection(xfrac14 329) Models M3 and M8 identified 56 of sites undermoderate positive selection (xfrac14 28) The number of sitesunder positive selection was highly variable among models 1301 and 885 based on the BEB analyses under M2a and M8and by the NEB results based on M3 The LRT statistics weresignificant for both the M1andashM2a and M7ndashM8 comparisons(table 3) A small proportion (12ndash14) of sites under

moderate positive selection (xfrac14 221 and xfrac14 232) wereevidenced by the results of M3 and M8 in CuppresaceaeThree sites were found to be under positive selection basedon the NEB analysis of M3 Finally in Cycads M3 identified46 of weakly selected sites (xfrac14 171) and M8 identified19 of sited under moderate selection (xfrac14 238) The LRTstatistics for M7ndashM8 comparisons were significant for bothCuppressaceae and Cycads (table 3 see supplementary tableS5 Supplementary Material Online)

Distribution of Selection CoefficientsParameter estimates a and b from the gamma distribution ofx across sites in each taxonomic family were used to estimatethe distribution of the selection coefficients of new mutations(Yang et al 2000 Nielsen and Yang 2003) As expected wefound that a and b varied across taxonomic families resultingin varying estimates of S and f(S) For all taxonomic familiesthe distributions of selection coefficients for new mutationshad a peak at S frac14 ndash5 suggesting that the majority of sitesare under purifying selection In all angiosperms exceptBrassicaceae selection coefficients are mostly distributedwith S frac14 [ndash10 0] whereas in gymnosperms the majorityof selection coefficients are distributed within a wider rangewith S frac14 [ndash20 10] Gymnosperm families had a greater pro-portion of sites under strong purifying selection (S lt ndash15)than angiosperms (t frac14 ndash357 Plt 005) whereas angiospermfamilies had a greater proportion of neutral and mildly dele-terious (S 0 and S gt ndash15) sites than gymnosperms(tfrac14 4119 Plt 005) (fig 2) The proportion of advantageousmutations (positive selection coefficients) also varied amongfamilies with gymnosperm taxa having on average more sitesunder positive selection than angiosperm taxa (t frac14 ndash574Plt 0001) (fig 2)

Correlations with Life-History TraitsIn order to explain the variation we observe in dN dS and xbetween the major plant clades we tested whether thesevariables showed any associations with life history traitsOur results suggest that generation time (time to reach re-productive maturity measured in years) was strongly nega-tively correlated with l (family level r frac14 ndash078 Plt 0001)(table 4) Also genome size (amount of DNA contained in ahaploid nucleus and measured in picograms (1C)) wasstrongly negatively correlated with l (family level r frac14ndash067) but positively correlated with x (species levelrfrac14 056 Plt 0001 family level rfrac14 052 Plt 005) (table 4)Finally species richness (number of extant species in eachtaxonomic family) was significantly correlated with dS(rfrac14 07 Plt 0001) and dN (rfrac14 048 Plt 005) at the familylevel (not tested at species level)

We found significant differences (P lt 005) in l amonggroups with different life forms when using six groups(angiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes) Differences in x and l were alsosignificant among groups when comparing four groups (an-giosperms herbs angiosperms treesshrubs gymnospermsand Gnetophytes) and three groups (angiosperms herbs

Table 3 Results of the Likelihood Ratio Tests for Evidence of PositiveSelection in Each of the Taxonomic Families Studied

Taxa M1andashM2a (2DL) P Value M7ndashM8 (2DL) P Value

Brassicaceae 0 NS 2577926 02766Malvaceae 0 NS 33063102 000000007Fabaceae 0 NS 22744606 000001151Rosaceae 0 NS 6429714 004016Poaceae 0 NS 15428744 00004463Pinus 5114601 0 549377428 0Picea 116785146 0 116646654 0Cupressaceae 0 NS 1511157 0000523Cycadales 0 NS 6535688 003808

NOTEmdashComparisons between site models M1a and M2a and M7 and M8 for eachtaxonomic family were evaluated

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1367

angiosperms shrubstrees and gymnosperms) (see supplementary table S6 Supplementary Material Online) Numberof synonymous (dS) and non-synonymous substitutions (dN)were not significantly different in any of the groups whenassessed by P lt 005 Boxplots showing differences amonggroups when groups are divided into three (angiospermsherbs angiosperms treesshrubs and gymnosperms) are plot-ted in figure 2 In the Gnetophyte group only Gnetum specieswere included and not Welwitschia and Ephedra (seeDiscussion part)

Discussion

Slower Rates of Molecular Evolution in GymnospermsOur study clearly suggests slower rates of molecular evolutionin gymnosperm than in angiosperm protein-coding genes

evidenced by a lower number of synonymous substitutions(dS) and lower rates of silent-site divergence (l)

Based on the evolutionary and phylogenetic analyses of31968 nucleotide sites from a wide taxonomic sampling ourresults suggest l values of 41ndash14 10 10 in gymnospermtaxa and 4ndash65 10 9 in angiosperm taxa (table 1) Takentogether this corresponds to a 7-fold average variationamong gymnosperms and angiosperms (771 10 10 vs535 10 9 synonymous substitutionssiteyear respec-tively) Our results lay within the ranges obtained by previousstudies comparing a few Pinaceae species suggesting the var-iation found in this taxonomic family is a relatively goodrepresentation of the variation found in all the gymnospermtaxa with the exception of Gnetophytes For exampleWillyard et al (2007) while comparing different speciesof Pinus found absolute rates of silent-site divergence of 7ndash131 10 10 Similarly Chen et al (2012) estimated pairwise

A

B

C

FIG 2 (A) Boxplots showing differences in number of synonymous substitutions (dS) nonsynonymous substitutions (dN) absolute rate of silent-site divergence (l) and substitution rate ratio (x) among life forms defined as angiosperms herbs (green) angiosperms shrubtrees (light green)and gymnosperms (blue) Results of the statistical tests of comparisons among groups can be found in supplementary table S6 SupplementaryMaterial Online (B) Proportion of sites under negative selection when all sites are evaluated (top left) when Sltndash15 (top right) and when Sgtndash15(bottom left) and proportion of sites under positive selection (bottom right) Green boxes represent angiosperm species and blue boxesgymnosperms (C) Distribution of the selection coefficient (f(S)) of new mutations It was calculated for all species in each taxonomic familystudied assuming a gamma distribution of x among sites Parameter estimates a and b were obtained from running the site model 5 in codeml(PAML) using a concatenated alignment of 29000ndash31000 sites per taxonomic family Selection coefficients were then obtained replacing a and bin equation 1 Green dotted vertical lines are used to show the distribution of f(S) when S 0 and xfrac14 1 (neutrality) Taxonomic familiescontaining less than 5 species were not analyzed due to difficulties in constructing the phylogenetic trees in RaxML

De La Torre et al doi101093molbevmsx069 MBE

1368

sequence divergence between three Picea and one Taxus spe-cies and obtained l ranging from 55ndash124 10 10 FinallyBuschiazzo et al (2012) obtained a 25-fold difference whencomparing substitution rates between orthologs of Piceasitchensis and Pinus taeda with Arabidopsis thaliana and a4-fold difference when comparing the same Picea-Pinusorthologs with Populus trichocarpa

The pattern of slower rates of molecular evolution is con-sistent when grouping species according to their life formswith gymnosperms showing lower dS and l than angio-sperms herbs and angiosperms shrubs and trees This levelof evolutionary conservation (which probably help explainthe high levels of macrosynteny previously observed insome Pinaceae species) is surprising considering the ancientnature of the plant clade which appeared on Earth muchearlier than flowering plants

Higher Substitution Rate Ratio (x) in Gymnospermsthan in AngiospermsEqually surprising are the high substitution rate ratios (x)found in gymnosperms despite their slow rates of evolutionParameter estimates under models of variable x betweenstem branches of the angiosperm and gymnosperms lineagesbased on a concatenated alignment of 31737 nucleotide sitessupported a model of different and higher x in gymnospermsthan in angiosperms (table 2) In addition substitutionparameters based on seven different site models foundangiosperms rate ratios varied from 009ndash013 whereas gym-nosperms ratios varied from 017ndash067 Overall these esti-mates suggest a significantly higher chance of fixation ofnonsynonymous mutations than synonymous mutationsin gymnosperms than in angiosperm taxa (supplementarytable S5 Supplementary Material Online)

High x ratios could result from a high dN over a low dShowever this does not seem to be the case for gymnospermsInstead higher x ratios result from a low to moderate dNover a very low dS in gymnosperm genes Our results are incontrast with previous smaller scale studies either showingsignificantly lower dS and dN as a cause of a 4-fold higher x in

gymnosperms than in angiosperms (Buschiazzo et al 2012)or the ones that did not find significant differences in x ratiosamong gymnosperms and angiosperms (Chen et al 2012)These studies were limited by a small number of species thatwould not allow for evolutionary analyses within a phyloge-netic context When using a small number of species pairwiseestimates of dN dS and x will strongly depend on theselection of species In addition the evaluation of individualsites (site models) and the branch models have more powerto test variable x ratios than the pairwise sequence compar-isons in which x rates are averaged over all amino acidsites (Yang 2002) In the case of the Buschiazzorsquos study highlydiverged sets of species were used for comparisons(P sitchensisndashPtaeda and AthalianandashPtrichocarpa) High se-quence divergence between species is often associated withdifficulties in the alignment different codon usage biases andnucleotide compositions In addition saturation of substitu-tions may be particularly problematic when using pairwisemethods in comparison with branch methods

Positive Selection and Adaptive Evolution inGymnospermsUnder the Nearly Neutral Theory the strength and efficacy ofselection depends on the long-term effective population size(Ne) Theoretical approaches predict that as population sizesincrease the power of natural selection increases faster thanthe influx of new mutations (Akashi et al 2012) Naturalselection thus becomes more effective in removing deleteri-ous mutations and in fixing advantageous mutations whichresults in a lower substitution rate of deleterious mutationsand also a higher substitution rate of advantageous mutationsin large populations (Lanfear et al 2014)

Current estimates suggest that species with large popula-tion sizes such as Drosophila mice bacteria and rabbits showsigns of adaptive evolution (Bierne and Eyre-Walker 2004Carneiro et al 2012 Phifer-Rixey et al 2012) In contrast littleevidence of adaptive amino acid substitutions has been foundin flowering plants (mostly with modest Nelt 100000)(Gossmann et al 2010 Hough et al 2013) Few exceptions

Table 4 Correlation among Number of Synonymous (dS) and Nonsynonymous Substitutions (dN) Nonsynonymoussynonymous Rate Ratio(x) and the Absolute Rate of Silent-Site Divergence (l) with Life-History Traits Using the Phylogenetically Independent Contrast Method (PIC)

Species Correlations Taxonomic Family Correlations

Parameters Generation Time (years) Genome Size (1C) Generation Time (years) Genome Size (1C) Species Richness

dS r ndash0172 ndash0135 ndash0153 ndash0311 0703P-value 0223 0341 0558 0225 0002df 50 50 15 15 15

dN r 0005 0041 0189 ndash0026 0485P-value 0973 0771 0467 0922 0048df 50 50 15 15 15

M r ndash0183 ndash0037 ndash0781 ndash0670 0057P-value 0194 0797 0000 0003 0829df 50 50 15 15 15

X r 0212 0567 0482 0527 ndash0224P-value 0132 0000 0050 0030 0387df 50 50 15 15 15

NOTEmdashSubstitution rates and life-history traits were estimated by species pairs (species correlations table) and averaged within taxonomic family (taxonomic familycorrelations table) We reject a null hypothesis of correlation equal to zero when Plt 005 (shaded areas)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1369

have been found in angiosperms with Negt100000 such asCapsella grandiflora (Slotte et al 2010 Williamson et al 2014)Helianthus spp (Strasburg et al 2011) and Populus spp(Ingvarsson 2010 Wang et al 2016a 2016b) In gymnospermsprevious reports of Ne range from 120000ndash560000 for Piceaand Pinus species (Brown et al 2004 Bouille and Bosuquet2005 Syring et al 2007)

In our study we found a higher proportion of sites withpositive selection coefficients and evidence of stronger diver-sifying selection in gymnosperms than in angiosperms(Pfrac14 0033) Models that allow for sites under positive selec-tion such as M2a M3 and M8 all suggested the presence of avery small number (02ndash13) of weakly positively selectedsites (xfrac14 153ndash189) in angiosperm taxa in contrast to ahigher number (2313) of sites with moderate to strongdiversifying selection (xfrac14 221ndash329) in gymnosperm taxa Inaddition results of the BEB analysis using site model 2a iden-tified 241 (0759) sites under selection in gymnosperms andnone in angiosperms The same analysis using site model M8identified 546 (172) sites under selection in gymnospermsand 10 (003) in angiosperms Finally the NEB analysis usingsite model M3 identified 1960 (6175) sites in gymnospermsand only six (0018) in angiosperms Likelihood ratio tests forthe presence of diversifying selection were significant for Piceaand Pinus (M1andashM2a comparison) and for all taxa with theexception of Brassicaceae under M7 and M8 comparisonsUnder M3 Picea and Pinus showed the highest proportionsof sites under selection whereas Rosaceae showed the lowestproportion Our results are consistent with previous esti-mates of low levels of adaptive evolution in eleven angio-sperm species (Gossmann et al 2010) and with estimatesin Pinus contorta and Pinus taeda in which the proportionsof sites fixed by positive selection were 13ndash52 and 22ndash37respectively (Eckert et al 2013 Hodgins et al 2016)

There is a rich literature reporting the influence of positiveselection in Pinaceae species which in absence of reports inother gymnosperm families has frequently been used as anexample of all gymnosperm taxa The recent sequencingof some Pinaceae genomes have paved the way to test theinfluence of positive selection at the genome level (De LaTorre et al 2015a Hodgins et al 2016) which have confirmedwhat was previously found at the genetic (small number ofgenes) and population levels (Eckert et al 2010 Buschiazzoet al 2012 Pavy et al 2012 De La Torre et al 2015b)Furthermore studies on quantitative trait variation suggestnatural selection is highly efficient in producing a relativelyfast evolutionary response in contrast to the slow evolution-ary rate (Savolainen and Pyheuroajeuroarvi 2007) Examples of thisinclude the evolution of cold adapted genotypes in severallong-lived tree species in the northern hemisphere after theLast Glaciation (Mimura and Aitken 2007 Wachowiak et al2009 Holliday et al 2010 Kujala and Savolainen 2012 De LaTorre et al 2014b) In addition because of their long gener-ation times and low mutation rates gymnosperm genomesmay retain the consequences of demographic events for along time suggesting that even species that have contractedtheir ranges and currently have small distributions may re-semble those with long-term large effective population sizes

Although our results suggest an important role of positiveselection in the evolution of gymnosperm taxa we do notbelieve that Pinaceae is a good representative of other gym-nosperm taxa Pinus and Picea have indeed significantly highernumber of sites under diversifying selection than other gym-nosperm taxa Recent evolutionary radiations may have re-sulted in lower dS and higher x in comparisons within thesetaxonomic families (Palme et al 2009)

We expect our estimation of the proportion of sites underselection to be conservative because of the use of single copygenes Single copy genes are usually ubiquitously expressedgenes that encode for basic cellular functions that are pre-served across taxa For this reason they are likely to be understronger purifying selection due to functional and structuralconstraints than paralogous genes in gene families in bothangiosperms (De Smet et al 2013 Li et al 2016) andgymnosperms (De La Torre et al 2015a) These genes alsoseem to experience less frequent positive selection than paral-ogous genes which tend to be more narrowly expressed(Larracuente et al 2007) For this reason we expect a lowerproportion of conserved sites and a higher proportion ofselected sites in paralogous genes (which account for themajority of genes in plant genomes) In fact it has been sug-gested that many of the genes under diversifying selection inconifers belong to large multi-copy gene families such asLeucine Rich Repeats Cytochrome P450 among others(Pavy et al 2012 Neale et al 2014 De La Torre et al2015a) In addition some sites under positive selection maygo undetected because structural constraints may inducepurifying selection to push x to values lower than 1 in highlyconserved genes (Echave et al 2016) However the introduc-tion of paralogous genes in a phylogeny has confoundingeffects for both the phylogenetic reconstruction and alsofor the estimation of sequence divergence among taxa there-fore we believe than the use of single-copy genes is probablythe best method for comparisons among diverse taxa In ourefforts to identify a highly confident set of very well alignedorthologs across a broad taxonomic sampling the number ofgenes used in this study was significantly reduced raising con-cerns about the representativeness of these genes in agenome-wide context However comparisons with previousstudies suggest our estimates of rates of divergence are quitecomparable with estimates of rates with much larger data setsin specific taxa (Gossmann et al 2010 Eckert et al 2013Hodgins et al 2016)

Stronger and More Efficient Purifying Selection inGymnospermsWhen estimating the distribution of selection coefficientswe found that the majority of sites in both angiospermsand gymnosperms were highly conserved suggesting animportant role of purifying selection in the evolution of plantgenomes Previous studies in angiosperm species have sug-gested that the largest proportion of mutations is stronglydeleterious and that the number of slightly and mildly dele-terious mutations seems to be relatively conserved amongspecies (Gossmann et al 2010) In contrast with these studiesour study has found differences in the proportion of sites

De La Torre et al doi101093molbevmsx069 MBE

1370

under mildly and strong purifying selection Gymnospermfamilies had a greater proportion of sites under strong puri-fying selection (S lt ndash15) than angiosperms (t frac14 ndash357Plt 005) whereas angiosperm families had a greater propor-tion of neutral and mildly deleterious (S 0 and S gtndash15)sites than gymnosperms (tfrac14 4119 Plt 005) (fig 2)

This could be explained by the expectation that specieswith larger Ne generally experience stronger purifying selec-tion (Slotte et al 2010 Gossmann et al 2010 Hough et al2013 Williamson et al 2014) Slightly deleterious mutationswill be less likely to segregate at higher frequencies and fixin large populations (Akashi et al 2012) Such mutations arealso selected against by the partial low degree of selfing andhigh early inbreeding depression in conifers (Williams andSavolainen 1996 Remington and OrsquoMalley 2000) Larger pop-ulation sizes may also explain the lower number of neutralsites in gymnosperms than in angiosperms found in thisstudy as it has been shown that the number of effectivelyneutral mutations is negatively correlated with Ne in manyspecies (Piganeau and Eyre-Walker 2009 Gossmann et al2010)

Life Form Does Not Explain Differences in MolecularEvolution among Woody Angiosperms and WoodyGymnosperms or within GymnospermsIn flowering plants it has been recently suggested that lifeform (also measured as plant height) is correlated with ratesof molecular evolution with shrubs and trees showing lowerrates of molecular evolution than herbaceous plants (Gautet al 1992 Smith and Donoghue 2008 Lanfear et al 2013)Similarly in our study we found significantly lower l onaverage in angiosperm shrubstrees compared to angiospermherbs (see supplementary table S6 Supplementary MaterialOnline and fig 2) Both dS and dN showed large variationamong angiosperm herbs and differences in dS and dNamong angiosperm shrubstrees and herbs were not signifi-cant (see supplementary table S6 Supplementary MaterialOnline) When incorporating gymnosperms into the analysislife form on its own does not explain the differences in rates ofmolecular evolution among seed plants as we observe signif-icant differences in dN dS x and l also among woodyangiosperms and woody gymnosperms (fig 2)

Our analyses showed that Gnetophytes the only gymno-sperm division that contains non-woody genera (Welwitschiaand Ephedra) shows higher dN and l than what we observein other gymnosperm families (table 1) Therefore differencesin life form may drive differences in the rate of molecularevolution found in Gnetophytes However we still observethese differences when only including the woody genusGnetum (and excluding Ephedra and Welwitschia) suggestingother factors may influence the differences in rate of sequenceevolution (table 1) In fact Gnetophytes are different from therest of gymnosperms in a number of features that include butare not restricted to their morphology ecology and thepresence of angiosperm-like characteristics such as specialwater-conducting wood vessels and reproductive structuresorganized in compound strobili (Doyle and Donoghue 1986

Friedman 1998) For these reasons the position ofGnetophytes within gymnosperms is contentious (Doyle1998 Braukmann et al 2009 Cibrian-Jaramillo et al 2010Ran et al 2010 Xi et al 2013) In any case the lack of closeextant relatives to Gnetophytes suggests that this problemmight never really be solved satisfactorily regardless of theamount of sequence data we have access to Our resultssuggest that estimations of rates of sequence divergenceamong a few species from very diverged Gnetophyte taxa(such as Gnetum Welwitschia and Ephedra) may lead toinaccurate estimates of dN and dS For this reason we de-cided to exclude the more distant Welwitschia and Ephedrafrom all correlations with life history traits

Generation Times and Their Effect on LowSubstitution RatesThe differences in rates of molecular evolution between an-giosperms with different life forms have been presumed toreflect differences in generation times between herbaceousand woody plants (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013) The generation time hypothesissuggests that species with shorter generation times (eg her-baceous plants) accumulate more replication errors per unittime because they copy their genomes more often whichresults in higher mutation rates (Li et al 1996) Althoughgeneration time may not be a good indicator of the overallrate of genome replication because the number of mitotic celldivisions can vary substantially between generations andamong plant species it remains strongly associated withthe long-term rates of meiosis in plants (Petit and Hampe2006 Lanfear et al 2013) Therefore if a significant proportionof heritable mutations occur during meiosis plants with lon-ger generation times (eg gymnosperms) would have lowermutation and substitution rates per unit of time (Lanfearet al 2013) Alternatively it is been proposed that differencesin the rates of mitosis (mitotic cell divisions that occur in theapical meristem before gametogenesis) can account for theobserved differences in rates of molecular evolution amongplants of different height (Lanfear et al 2013) Lower absolutegrowth rates in long-lived woody perennials would translatein fewer cell divisions and less opportunities for DNA repli-cations errors than short-lived plants (Bromham et al 2015)This would result in a lower mutation rate per unit of time inlong-lived species Our results are consistent with this expec-tation as we found a strong negative correlation betweengeneration times and l (r frac14 ndash078 Plt 0001) (table 4)Long generation times large effective population sizes andlow recombination rates may also help explain the low syn-onymous polymorphism in gymnosperms (Savolainen andPyheuroajeuroarvi 2007 Jaramillo-Correa et al 2010)

In plants in the absence of a segregated germline gametesarise from the apical meristem late in development followingperiods of vegetative growth Because of this somatic muta-tions acquired during vegetative growth can be transmittedto the next generation (Watson et al 2016) Longer growthperiods are thought to result in a larger number of cell divi-sions increasing the opportunities for mutations to occur pergeneration (Schultz and Scofield 2009) Our results suggest

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1371

than on average for the species under study long-lived gym-nosperms accumulate four times more mutations per gener-ation than short-lived angiosperms (assuming an averagegeneration time of 20 years in gymnosperms and an averagel equal to 157 10 8 in gymnosperms and 392 10 9

in angiosperms) This is coincident with previous reports ofincreased number of somatic mutations in gymnospermsspecies (Cloutier et al 2003 OrsquoConnell and Ritland 2004)and other woody perennials (Ally et al 2010 Bobiwashet al 2013) However because the per-generation increasein mutation rate in woody perennials may be less thanpredicted from their differences in generation time (Petitand Hampe 2006) we would expect our reported valuesto be an over-estimation of the actual difference betweenplant seed clades Differences in generation times wouldpredict a mutation rategt100 times larger in mangrovetrees than in annuals when estimates were only 25 timeslarger (Klekowski and Godfrey 1989) Alternatively theremay be selection to reduce DNA replication-dependenterrors through minimizing the number of cell divisionsrequired during development as recently suggested inArabidopsis (Watson et al 2016)

Larger Genomes Are Correlated with Larger Omegasand Slower Rates of Sequence DivergenceIn our study we also explored the relationship between ge-nome size and rates of molecular evolution Gymnospermgenomes are characterized by their enormous genome sizes(20ndash40 Gb) unique genome silencing mechanisms and lowunequal recombination (Leitch and Leitch 2012 Nystedt et al2013) While angiosperm genomes are highly dynamic andhave efficient mechanisms to counteract the increase in DNAamount stemming from WGDs or transposable elements (egreplication or recombination-based errors generating indelsunequal recombination between sister chromosomes Groveland Wendel 2010) gymnosperm genomes seem to be lessdynamic and may have evolved their own epigenetic mech-anisms to silence retrotransposons (Leitch and Leitch 2012)Our results indicate that genome size is positively correlatedwith x (rfrac14 056 Plt 0001 at the species level and rfrac14 052Plt 005 at the family level) In addition genome size is neg-atively correlated with l (r frac14 ndash067 Plt 001) suggestingspecies with large genomes such as gymnosperms have slowerrates of molecular evolution than species with small genomesconfirming previous studies on angiosperms species(Bromham et al 2015) A possible explanation for this isthat if large genome sizes correlate with larger cells and areduction in growth then plants with larger genomes mighthave fewer replications and therefore less accumulated mu-tations per unit of time (Bromham et al 2015)

Our study aims to contribute to the unexplored field ofmolecular evolution in seed plants by investigating thedifferences in rates of molecular evolution among gym-nosperms and angiosperms using a phylogenetic frame-work The recent genomic and transcriptomic resourcesin gymnosperm species opened a window to understandthe evolution of this ancient and important plant cladeHowever gymnosperm resources are still limited in

comparison to their sister clade of flowering plants Wehope that with the development of new genomic re-sources studies in molecular evolution will include abroader taxonomic sampling that provides us with amore complete understanding of the evolution of seedplants

Materials and Methods

Generation of Phylogenetic MarkersTo develop our set of phylogenetic markers we performeddeep sequencing and assembly of whole transcriptomesfrom two conifer species (Pinus pinaster and Pinus sylvestris)(Li Z et al (in review) transcriptome data can be found athttpbioinformaticspsbugentbesupplementary_datazheliphylo) We integrated this data with whole-transcriptomedata from another 29 gymnosperms 34 angiosperms and 1outgroup (Physcomitrella patens) obtained from public data-bases PlantGDB oneKP TreeGenes and PLAZA v30 (Proostet al 2015) We used OrthoMCL (Li et al 2003) to buildorthologous gene families across the species To reduce thenumber of single-copy genes we selected only the gene fam-ilies that were conserved and had low copy number Later weused hidden Markov probabilistic models implemented inHMMER to build an HMM profile for each gene family basedon multiple sequence alignment (Eddy 2009) These HMMprofiles were then used to assign additional proteins to theexisting gene families Then we used HMMSEARCH to findthe best protein hit of an HMM profile in each species andselected the markers with only reciprocal best hits for phylo-genomic analysis In order to increase the spectrum of thephylogenetic markers we only selected single-copy markersthat were present in a majority of species

Multiple sequence alignments were carried out for eachgene family based on amino acid sequences using Musclev3831 (Edgar et al 2004) Trimal v14 (Capella-Gutierrezet al 2009) was used to back translate the amino acid align-ments into coding sequence alignments and to remove lowquality alignment regions and spurious sequences We de-fined a spurious sequence as a sequence having less than70 of the total alignment positions that were present in75 of 66 species Genes that lost their sequences in morethan 10 of the species after removing spurious sequenceswere not used for further analyses This way we obtained 42single-copy markers across 66 seed plant species that wereused for evolutionary and phylogenetic analyses Annotationsof all genes are reported in supplementary table S1Supplementary Material Online

Rates of Sequence DivergenceMultiple sequence alignments for each of the 42 single-copygenes were divided in two groups containing either angio-sperms (34 species) or gymnosperms (31 species) to allowfurther comparisons between major plant taxa Alignmentgaps and low quality regions were manually removed usingJalview version 281 (Waterhouse et al 2009) Also speciescontaining more than 30 gaps or ambiguous sites were notkept for further analyses

De La Torre et al doi101093molbevmsx069 MBE

1372

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 2: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

the relative effects of selection in certain lineages of interest(Nielsen 2005) and also for the estimation of the distributionof selection coefficients among taxa Because positive selectionmost likely affects only a few sites at a few time points theevaluation of individual sites (site models) and branches(branch models) has more power to detect adaptive evolu-tion than the pairwise sequence comparison in which substi-tution rates are averaged over all amino acid sites (Yang 2002)

Substitution rates have been reported to be variableacross the tree of life however the causes underlying thisvariation remain uncertain In flowering plants this variationhas been explained by differences in life forms height gen-eration times genome size environmental variables andspecies richness (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013 Bromham et al 2015) Variablesubstitution rates have also been reported across nuclearand organelle genomes and among genes with differentfunctional categories (Bromham et al 2015) Althoughconsiderable attention has been given to flowering plants(angiosperms) very few reports on the rate of molecularevolution exist for their sister seed plant clade the gymno-sperms and to our knowledge differences in rates of molec-ular evolution among angiosperms and gymnosperms havenever been tested within a phylogenetic framework

Gymnosperms are an ancient and widespread plant cladethat represent four of the five main lineages of seed plantsand includes cycads ginkgos gnetophytes and conifers(Wang and Ran 2014) Gymnosperm lineages separatedfrom each other during the Late Carboniferous to the LateTriassic (311ndash212 mya) earlier than the occurrence of theearliest extant angiosperms around 300 mya (Magallon et al2013)DespitebeingdominantthroughmostoftheMesozoicgymnosperms were severely affected by extreme climaticshifts especially during the late Neogene which may have fa-vored the disproportionate loss of ancient lineages and theirreplacement byyounger lineages intheNorthernHemisphere(Won and Renner 2006 Crisp and Cook 2011 Leslie et al2012) Cenozoic extinctions may have contributed to thelow diversity of extant gymnosperms (Wang and Ran 2014)however they are unlikely to explain the 30-fold difference inspecies diversity between gymnosperm and angiospermspecies

Besides differences in species diversity gymnosperms andangiosperms differ in a number of features of which contrast-ing reproductive biology (mating system pollination typeand seed morphology) physiology (water-conducting sys-tems) and life spans are the most prominent (Leitch andLeitch 2012 De La Torre et al 2014a) Gymnosperms aretypically outcrossing species in which wind plays the mainrole in the pollination and dispersal of uncovered seeds Inaddition widespread gymnosperms are thought to have largeeffective population sizes and weak population structure(Neale and Kremer 2011) The recent genome sequencingof gymnosperm species has revealed that the enormousgenomes of gymnosperms (20ndash40 Gb) are mainly composedof large and variable sets of transposable elements and thatthey have similar numbers of protein-coding genes comparedto other plant species (De La Torre et al 2014a)

The highly conserved morphology of gymnosperms evi-denced by the similarity of extant species to the earliest fossilrecords and the high levels of macrosynteny among conifers(Pavy et al 2012) have led scientists to believe that gymno-sperms are slow-evolving plants (Won and Renner 2006)However this remains a controversial issue with studiesshowing opposing results (Willyard et al 2007 Palme et al2009 Buschiazzo et al 2012 Chen et al 2012) Limitations ofprevious studies include the use of a small number of genesandor species the use of highly diverged species comparedusing different gene sets and the lack of a phylogenetic frame-work that includes species from different taxonomic familiesConsidering that the rate of molecular evolution stronglydepends on the selective constraints of proteins or aminoacids and these constraints are variable among genes itseems risky to conclude significant differences in rates of mo-lecular evolution from the comparison among different genesets Moreover a recent whole-genome study in Picea specieshas found contrasting rates of sequence divergence amonggenes in relation to their functional category duplicationstatus and gene family size (De La Torre et al 2015a)Evolutionary and phylogenetic analyses among gymnospermsand angiosperms have also been limited by the absence oforthologous genes that allow for such comparisons

In this study we used a newly identified set of 42 single-copy genes obtained from whole genomic andor transcrip-tomic data from a broad taxonomic sampling that includes allconifers (with the exception of Araucariaceae) cycads ginkgoand many angiosperms with sequenced genomes (Li Z De LaTorre AR Sterck L Canovas FM Avila C Von Arnold SIngvarsson PK Van de Peer Y in review) We aimed to testfor differences in the rate of molecular evolution among gym-nosperm and angiosperm species and to understand thepossible causes driving any such variation With this wehope we can contribute to a better understanding of thecomplex evolutionary relationships among major plant cladesand to elucidate the main evolutionary processes that haveshaped the seed plants we see today

Results

Rates of Sequence DivergenceWhen evaluating pairwise estimates between species for eachtaxonomic family in the angiosperm and gymnosperm phy-logeny (fig 1) we found significant differences in the rates ofsequence divergence between the two major plant cladesThe results of the sign test indicated that dN dS and xwere significantly different between angiosperms and gymno-sperms across the 42 genes evaluated The number of synon-ymous substitutions per site (dS) was lower in gymnospermsthan in angiosperms in all of the 42 genes studied (Plt 0001)Similarly the number of non-synonymous substitutions persite (dN) was lower in gymnosperms for 86 of the studiedgenes (Plt 0001) whereas the ratio of non-synonymoussynonymous substitutions (x) was higher in gymnospermsfor 762 of the genes (Plt 0001) (see supplementary tableS3 Supplementary Material Online) Consistent results werefound when comparing branch estimations of dS

De La Torre et al doi101093molbevmsx069 MBE

1364

(tfrac14 3 dffrac14 5787 Plt 001) and x (t frac14 ndash40379 dffrac14 378Plt 0001) however dN was not significantly differentbetween angiosperms and gymnosperms when comparingbranch estimations in all terminal branches (t frac14 ndash0319dffrac14 5234 Pfrac14 0751) (see supplementary table S4Supplementary Material Online) The most contrasting differ-ences were found in the absolute rates of silent-site diver-gence (l) with angiosperms rates being on average seventimes higher than that observed in gymnosperm species(535 10 9 vs 771 10 10 synonymous substitu-tionssiteyear respectively) Families showing the highestrates were Brassicaceae and Poaceae and the ones with thelowest rates were Cuppresaceae and PodocarpaceaeWithin gymnosperms Gnetophytes had the highest ratesof sequence divergence (table 1)

Estimation of the Variation in Selective Pressures (x)among BranchesWe estimated the variation in selective pressures (x) amongstem branches in the gymnosperm and angiosperm phylog-enies using the branch models in PAML (fig 1) The results ofthe model testing indicated that the most parameter-richhypothesis (H4) fits the data best This hypothesis assumesthat all ancestral and terminal branches have different ome-gas The second best hypothesis was H3 which suggests along-term shift in selective pressure resulting in both lineageshaving different omegas (x06frac14xA 6frac14xG) Likelihood ratiotests were significant for H1ndashH3 (2Dlfrac14 3562 Plt 0001)and H2ndashH3 (2Dlfrac14 12931 Plt 0001) Hypothesis H2 rankedthird followed by H1 and hypothesis H0 where there wereno differences in selective pressures between lineages(x0frac14xGfrac14xA) had the worst fit to the data (table 2)Omega ratios for gymnosperms were significantly higher

than for angiosperms in all hypotheses tested with the excep-tion of H0 (one-ratio model)

Estimation of Substitution Parameters Using SiteModelsA discrete model (M3) that uses an unconstrained discretedistribution to model heterogeneous omega ratios amongsites (Yang et al 2000) showed the best fit to the data inall angiosperm taxonomic families with the exception ofBrassicaceae (model 7 beta) The nearly neutral model(model 1a) that assumes a proportion of conserved siteswith x0lt1 and a proportion of neutral sites with x1frac141and the selection model (M2a) that allows an additional classof sites with x2gt1 had the worst fit in all angiosperm taxaExcluding poorly fitted M1a and M2a the average x amongall models ranged from 0092 to 0093 (Brassicaceae) 0106ndash0108 (Malvaceae) 0117ndash0125 (Fabaceae) 0118ndash0156(Rosaceae) and 0131ndash0133 (Poaceae) Average x ratios sug-gest a nonsynonymous mutation has only 9ndash13 as muchchance as a synonymous mutation of being fixed suggestingmost sites are highly conserved in angiosperm taxa

In gymnosperms M8 (beta amp x) showed the best fit inCupressaceae and Cycads M10 (beta amp gammathorn1) in Pinusand M2a (selection) in Picea The beta model (M7) had theworst fit to the data in Picea and Pinus and the gamma model(M5) in Cupressaceae and Cycads Excluding poorly fitted M7for Picea and Pinus and M5 for Cuppressaceae and Cycadsthe average x among all models ranged from 066ndash067(Pinus) 037ndash041 (Picea) 0167ndash0178 (Cuppresaceae) and02ndash0216 (Cycads) Overall these estimates suggest a signif-icantly higher chance of fixation of nonsynonymous muta-tions in gymnosperms (17ndash67) than in angiosperm taxaParameters estimates for each of the six site models testedin each of the five angiosperm and four gymnosperm families

Table 1 Absolute Rates of Silent-Site Divergence (l) for Each Taxonomic Family or Subfamily Based on 42 Single-Copy Nuclear Genes in 31Gymnosperms and 34 Angiosperms Species

Plant Clade Taxa Subtaxa dS T (years) l (siteyear)

Gymnosperms Pinaceae 0102 728067Endash10Pinus 0070 84500000 416479Endash10

Cedrus_Abies 0128 129700000 49387Endash10Larix_Pseudotsuga 0148 61600000 119939Endash09

Picea 0061 38300000 802526Endash10Cupressaceae 0189 159200000 592216Endash10

Taxaceae 0225 153000000 734573Endash10Podocarpaceae 0170 146100000 581248Endash10

Cycadales 0333 248100000 670325Endash10Gnetophytes 0073 25000000 145547Endash09

Angiosperms Brassicaceae 0352 27000000 651648Endash09Malvaceae 0373 40900000 456Endash09

Euphorbiaceae 0463 51000000 454382Endash09Fabaceae 0502 53100000 472809Endash09

Cucurbitaceae 0239 20000000 59795Endash09Rosaceae 0421 52300000 402113Endash09

Solanaceae 0084 7300000 575068Endash09Poaceae 0372 60277Endash09

Poaceae I 0457 39700000 576174Endash09Poaceae II 0287 22800000 629366Endash09

NOTEmdashDivergence times (T) are based on fossil calibration data and published studies (see supplementary table S2 Supplementary Material Online)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1365

are included in supplementary table S5 SupplementaryMaterial Online

When comparing the proportion of conserved neutraland positively selected sites under the discrete model(M3) we observed that all taxa had a higher proportionof conserved followed by neutral and then by positivelyselected sites Conserved sites (xlt 1) in angiospermsranged from 55 to 77 whereas gymnosperm sitesranged from 66 to 76 differences between taxa werenot significant (Pfrac14 062) The proportion of neutral sites(xfrac14 1) ranged from 21 to 39 in angiosperms andfrom 21 to 28 in gymnosperms however the differ-ences between medians were not significant (Pfrac14 022)

(supplementary table S5 Supplementary MaterialOnline)

Inference of the Proportion of Sites under SelectionModels that allow for sites under positive selection such asM2a M3 and M8 all suggested the presence of a very smallnumber of weakly positively selected sites in angiosperm taxaModels M3 and M8 identified between 07ndash13 sites underweak diversifying selection (xfrac14 165 and xfrac14 138 respec-tively) in Brassicaceae However likelihood ratio test statisticsfor comparing M1a and M2a and M7 and M8 did not showsignificant results (table 3) In addition the NEB analyses didnot identify any sites under positive selection For Malvaceae

Table 2 Parameter Estimates under Models of Variable Omega (x) among Branches from the Gymnosperm and Angiosperm Lineages Based on aConcatenated Alignment of 31737 Nucleotide Sites in 61 Species

Hypothesis Model No of Parameters Background Foreground Parameter Estimates ln L AIC

H0 M0 1 x0frac14xG frac14 xA xfrac14 011356 ndash62177774 124355748H1 M2 2 x0frac14xG xA x0frac14xG frac14 01238 xA frac14 00989 ndash6216446499 12432933H2 M2 2 x0frac14xA xG x0frac14xA frac14 00935 xG frac14 02603 ndash6199280209 1239860042H3 M2 3 x0 xA xG x0frac14 00857 xA frac14 01024 xG frac14 02631 ndash619863362 1239732724H4 M1 120 x ranges between 005 to 076 ndash6175988386 1235437677

NOTEmdashWe tested the following hypotheses H0 Homogeneous selective pressure in both clades (x0frac14 xG frac14 xA) H1 Selective Pressure in the Angiosperm clade (x0frac14 xGxA) H2 selective pressure in the gymnosperm clade (x0frac14 xA xG) H3 long-term shift in selective pressure resulting in both clades having different omegas (x0 xG xA)and H4 all terminal branches in both clades have different omegas For hypotheses H1 to H3 xG and xA identify estimated omegas on stem branches of the gymnosperm andangiosperm lineages according to phylogenetic tree in figure 1 x0 represents the estimated omega in unselected branches (background) Parameter estimates of model M1(hypothesis H4) are detailed in supplementary table S7 Supplementary Material Online

02

Prumnopitys andina

Pinus massoniana

Larix kaempferi

Podocarpus macrophyllus

Oryza sativa ssp japonica

Sorghum bicolor

Sequoia sempervirens

Arabidopsis lyrata

Citrullus lanatus

Lotus japonicus

Beta vulgaris

Ginkgo biloba

Gnetum gnemon

Hordeum vulgare

Malus domestica

Pinus sylvestris

Fragaria vesca

Cryptomeria japonica

Brassica rapa

Ricinus communis

Physcomitrella patens

Populus trichocarpa

Citrus sinensis

Eucalyptus grandis

Brachypodium distachyon

Amborella trichopoda

Welwitschia mirabilis

Juniperus scopulorum

Abies alba

Zea mays

Picea glauca

Theobroma cacaoGossypium raimondii

Medicago truncatula

Manihot esculenta

Pinus pinaster

Cunninghamia lanceolata

Gnetum montanum

Pseudotsuga menziesii

Ephedra sinica

Vitis vinifera

Thellungiella parvula

Taxus baccata

Pinus taeda

Musa acuminata

Cephalotaxus harringtonia

Glycine max

Oryza sativa ssp indica

Zamia vazqueziiCycas rumphii

Cycas micholitzii

Setaria italica

Cedrus libani

Pinus contortaPinus banksiana

Picea abies

Carica papaya

Arabidopsis thaliana

Cucumis melo

Prunus persica

Pinus lambertiana

Capsella rubella

Picea sitchensis

Solanum tuberosumSolanum lycopersicum

Sciadopitys verticillata

100

100

100

100

100

100

100

100

9 5

9 0

100

100

100

100

100

100

100

100

100100

100

100

100

100

100

100

100

100

9 0

100

100

100

100

8 4

100

100

100

100

100

100

100

100

8 3

100

8 8

100

100

100

100

100

100

8 8

9 8

100

100

100

9 5

100

100

100

100

100

100

FIG 1 Phylogenetic tree obtained from a concatenated alignment of 42 single-copy genes and 66 species distributed between two main seed plantlineages (angiosperms in green gymnosperms in blue and outgroup in black) inferred by RAxML Stem branches for angiosperms and gymno-sperms were used for model testing using the branch models implemented in codeml Results of the model testing can be found in table 2

De La Torre et al doi101093molbevmsx069 MBE

1366

four sites under positive selection were identified by the BEBanalysis (Pgt 95) and the likelihood ratio test for compar-ison between M7ndashM8 was significant (2Dlfrac14 33063Plt 0001 dffrac14 2) However none of the other tested modelsidentified sites under positive selection Weakly to moderatediversifying selection (xfrac14 189 and xfrac14 275) was found in avery small proportion of sites (11 and 05) in Fabaceaebased on the results of M3 and M8 In addition the NEBanalysis identified two sites under positive selection usingM3 and the likelihood ratio test for comparison betweenM7ndashM8 was significant (2Dlfrac14 2274 Plt 0001 dffrac14 2) InRosaceae M3 and M8 identified 02 of sites under strongdiversifying selection In addition the BEB analysis identifiedtwo sites under positive selection using M3 and the likelihoodratio test for comparison between M7ndashM8 was significant(2Dlfrac14 642 Plt 005 dffrac14 2) Finally weakly diversifyingselection (xfrac14 153 and xfrac14 185) was found in 07 and13 sites in Poaceae Four sites were identified by the BEBanalyses in both M3 and M8 and the likelihood ratio testfor M7ndashM8 was significant (2Dlfrac14 15428 Plt 0001 dffrac14 2)(table 3 see supplementary table S5 Supplementary MaterialOnline)

In contrast to angiosperm taxa evidence for moderate tostrong diversifying selection and a higher number of sitesunder positive selection was found in gymnosperm taxa InPinus M2a M3 and M8 identified 163 2313 and 2221of sites under moderate to strong diversifying selection(xfrac14 32 xfrac14 273 and xfrac14 28 respectively) This taxo-nomic family had the highest number of sites under positiveselection (240 245 and 1070) evidenced by the results of theBEB analyses under M2a and M8 and the NEB analysis underM3 Likelihood ratio tests were significant for both the M1andashM2a (2Dlfrac14 51146 Plt 0001 dffrac14 2) and the M7ndashM8(2Dlfrac14 54937 Plt 0001 dffrac14 2) comparisons Similarly inPicea M2a identified 2148 of weakly selected sites(xfrac14 119) and 325 of sites under strong positive selection(xfrac14 329) Models M3 and M8 identified 56 of sites undermoderate positive selection (xfrac14 28) The number of sitesunder positive selection was highly variable among models 1301 and 885 based on the BEB analyses under M2a and M8and by the NEB results based on M3 The LRT statistics weresignificant for both the M1andashM2a and M7ndashM8 comparisons(table 3) A small proportion (12ndash14) of sites under

moderate positive selection (xfrac14 221 and xfrac14 232) wereevidenced by the results of M3 and M8 in CuppresaceaeThree sites were found to be under positive selection basedon the NEB analysis of M3 Finally in Cycads M3 identified46 of weakly selected sites (xfrac14 171) and M8 identified19 of sited under moderate selection (xfrac14 238) The LRTstatistics for M7ndashM8 comparisons were significant for bothCuppressaceae and Cycads (table 3 see supplementary tableS5 Supplementary Material Online)

Distribution of Selection CoefficientsParameter estimates a and b from the gamma distribution ofx across sites in each taxonomic family were used to estimatethe distribution of the selection coefficients of new mutations(Yang et al 2000 Nielsen and Yang 2003) As expected wefound that a and b varied across taxonomic families resultingin varying estimates of S and f(S) For all taxonomic familiesthe distributions of selection coefficients for new mutationshad a peak at S frac14 ndash5 suggesting that the majority of sitesare under purifying selection In all angiosperms exceptBrassicaceae selection coefficients are mostly distributedwith S frac14 [ndash10 0] whereas in gymnosperms the majorityof selection coefficients are distributed within a wider rangewith S frac14 [ndash20 10] Gymnosperm families had a greater pro-portion of sites under strong purifying selection (S lt ndash15)than angiosperms (t frac14 ndash357 Plt 005) whereas angiospermfamilies had a greater proportion of neutral and mildly dele-terious (S 0 and S gt ndash15) sites than gymnosperms(tfrac14 4119 Plt 005) (fig 2) The proportion of advantageousmutations (positive selection coefficients) also varied amongfamilies with gymnosperm taxa having on average more sitesunder positive selection than angiosperm taxa (t frac14 ndash574Plt 0001) (fig 2)

Correlations with Life-History TraitsIn order to explain the variation we observe in dN dS and xbetween the major plant clades we tested whether thesevariables showed any associations with life history traitsOur results suggest that generation time (time to reach re-productive maturity measured in years) was strongly nega-tively correlated with l (family level r frac14 ndash078 Plt 0001)(table 4) Also genome size (amount of DNA contained in ahaploid nucleus and measured in picograms (1C)) wasstrongly negatively correlated with l (family level r frac14ndash067) but positively correlated with x (species levelrfrac14 056 Plt 0001 family level rfrac14 052 Plt 005) (table 4)Finally species richness (number of extant species in eachtaxonomic family) was significantly correlated with dS(rfrac14 07 Plt 0001) and dN (rfrac14 048 Plt 005) at the familylevel (not tested at species level)

We found significant differences (P lt 005) in l amonggroups with different life forms when using six groups(angiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes) Differences in x and l were alsosignificant among groups when comparing four groups (an-giosperms herbs angiosperms treesshrubs gymnospermsand Gnetophytes) and three groups (angiosperms herbs

Table 3 Results of the Likelihood Ratio Tests for Evidence of PositiveSelection in Each of the Taxonomic Families Studied

Taxa M1andashM2a (2DL) P Value M7ndashM8 (2DL) P Value

Brassicaceae 0 NS 2577926 02766Malvaceae 0 NS 33063102 000000007Fabaceae 0 NS 22744606 000001151Rosaceae 0 NS 6429714 004016Poaceae 0 NS 15428744 00004463Pinus 5114601 0 549377428 0Picea 116785146 0 116646654 0Cupressaceae 0 NS 1511157 0000523Cycadales 0 NS 6535688 003808

NOTEmdashComparisons between site models M1a and M2a and M7 and M8 for eachtaxonomic family were evaluated

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1367

angiosperms shrubstrees and gymnosperms) (see supplementary table S6 Supplementary Material Online) Numberof synonymous (dS) and non-synonymous substitutions (dN)were not significantly different in any of the groups whenassessed by P lt 005 Boxplots showing differences amonggroups when groups are divided into three (angiospermsherbs angiosperms treesshrubs and gymnosperms) are plot-ted in figure 2 In the Gnetophyte group only Gnetum specieswere included and not Welwitschia and Ephedra (seeDiscussion part)

Discussion

Slower Rates of Molecular Evolution in GymnospermsOur study clearly suggests slower rates of molecular evolutionin gymnosperm than in angiosperm protein-coding genes

evidenced by a lower number of synonymous substitutions(dS) and lower rates of silent-site divergence (l)

Based on the evolutionary and phylogenetic analyses of31968 nucleotide sites from a wide taxonomic sampling ourresults suggest l values of 41ndash14 10 10 in gymnospermtaxa and 4ndash65 10 9 in angiosperm taxa (table 1) Takentogether this corresponds to a 7-fold average variationamong gymnosperms and angiosperms (771 10 10 vs535 10 9 synonymous substitutionssiteyear respec-tively) Our results lay within the ranges obtained by previousstudies comparing a few Pinaceae species suggesting the var-iation found in this taxonomic family is a relatively goodrepresentation of the variation found in all the gymnospermtaxa with the exception of Gnetophytes For exampleWillyard et al (2007) while comparing different speciesof Pinus found absolute rates of silent-site divergence of 7ndash131 10 10 Similarly Chen et al (2012) estimated pairwise

A

B

C

FIG 2 (A) Boxplots showing differences in number of synonymous substitutions (dS) nonsynonymous substitutions (dN) absolute rate of silent-site divergence (l) and substitution rate ratio (x) among life forms defined as angiosperms herbs (green) angiosperms shrubtrees (light green)and gymnosperms (blue) Results of the statistical tests of comparisons among groups can be found in supplementary table S6 SupplementaryMaterial Online (B) Proportion of sites under negative selection when all sites are evaluated (top left) when Sltndash15 (top right) and when Sgtndash15(bottom left) and proportion of sites under positive selection (bottom right) Green boxes represent angiosperm species and blue boxesgymnosperms (C) Distribution of the selection coefficient (f(S)) of new mutations It was calculated for all species in each taxonomic familystudied assuming a gamma distribution of x among sites Parameter estimates a and b were obtained from running the site model 5 in codeml(PAML) using a concatenated alignment of 29000ndash31000 sites per taxonomic family Selection coefficients were then obtained replacing a and bin equation 1 Green dotted vertical lines are used to show the distribution of f(S) when S 0 and xfrac14 1 (neutrality) Taxonomic familiescontaining less than 5 species were not analyzed due to difficulties in constructing the phylogenetic trees in RaxML

De La Torre et al doi101093molbevmsx069 MBE

1368

sequence divergence between three Picea and one Taxus spe-cies and obtained l ranging from 55ndash124 10 10 FinallyBuschiazzo et al (2012) obtained a 25-fold difference whencomparing substitution rates between orthologs of Piceasitchensis and Pinus taeda with Arabidopsis thaliana and a4-fold difference when comparing the same Picea-Pinusorthologs with Populus trichocarpa

The pattern of slower rates of molecular evolution is con-sistent when grouping species according to their life formswith gymnosperms showing lower dS and l than angio-sperms herbs and angiosperms shrubs and trees This levelof evolutionary conservation (which probably help explainthe high levels of macrosynteny previously observed insome Pinaceae species) is surprising considering the ancientnature of the plant clade which appeared on Earth muchearlier than flowering plants

Higher Substitution Rate Ratio (x) in Gymnospermsthan in AngiospermsEqually surprising are the high substitution rate ratios (x)found in gymnosperms despite their slow rates of evolutionParameter estimates under models of variable x betweenstem branches of the angiosperm and gymnosperms lineagesbased on a concatenated alignment of 31737 nucleotide sitessupported a model of different and higher x in gymnospermsthan in angiosperms (table 2) In addition substitutionparameters based on seven different site models foundangiosperms rate ratios varied from 009ndash013 whereas gym-nosperms ratios varied from 017ndash067 Overall these esti-mates suggest a significantly higher chance of fixation ofnonsynonymous mutations than synonymous mutationsin gymnosperms than in angiosperm taxa (supplementarytable S5 Supplementary Material Online)

High x ratios could result from a high dN over a low dShowever this does not seem to be the case for gymnospermsInstead higher x ratios result from a low to moderate dNover a very low dS in gymnosperm genes Our results are incontrast with previous smaller scale studies either showingsignificantly lower dS and dN as a cause of a 4-fold higher x in

gymnosperms than in angiosperms (Buschiazzo et al 2012)or the ones that did not find significant differences in x ratiosamong gymnosperms and angiosperms (Chen et al 2012)These studies were limited by a small number of species thatwould not allow for evolutionary analyses within a phyloge-netic context When using a small number of species pairwiseestimates of dN dS and x will strongly depend on theselection of species In addition the evaluation of individualsites (site models) and the branch models have more powerto test variable x ratios than the pairwise sequence compar-isons in which x rates are averaged over all amino acidsites (Yang 2002) In the case of the Buschiazzorsquos study highlydiverged sets of species were used for comparisons(P sitchensisndashPtaeda and AthalianandashPtrichocarpa) High se-quence divergence between species is often associated withdifficulties in the alignment different codon usage biases andnucleotide compositions In addition saturation of substitu-tions may be particularly problematic when using pairwisemethods in comparison with branch methods

Positive Selection and Adaptive Evolution inGymnospermsUnder the Nearly Neutral Theory the strength and efficacy ofselection depends on the long-term effective population size(Ne) Theoretical approaches predict that as population sizesincrease the power of natural selection increases faster thanthe influx of new mutations (Akashi et al 2012) Naturalselection thus becomes more effective in removing deleteri-ous mutations and in fixing advantageous mutations whichresults in a lower substitution rate of deleterious mutationsand also a higher substitution rate of advantageous mutationsin large populations (Lanfear et al 2014)

Current estimates suggest that species with large popula-tion sizes such as Drosophila mice bacteria and rabbits showsigns of adaptive evolution (Bierne and Eyre-Walker 2004Carneiro et al 2012 Phifer-Rixey et al 2012) In contrast littleevidence of adaptive amino acid substitutions has been foundin flowering plants (mostly with modest Nelt 100000)(Gossmann et al 2010 Hough et al 2013) Few exceptions

Table 4 Correlation among Number of Synonymous (dS) and Nonsynonymous Substitutions (dN) Nonsynonymoussynonymous Rate Ratio(x) and the Absolute Rate of Silent-Site Divergence (l) with Life-History Traits Using the Phylogenetically Independent Contrast Method (PIC)

Species Correlations Taxonomic Family Correlations

Parameters Generation Time (years) Genome Size (1C) Generation Time (years) Genome Size (1C) Species Richness

dS r ndash0172 ndash0135 ndash0153 ndash0311 0703P-value 0223 0341 0558 0225 0002df 50 50 15 15 15

dN r 0005 0041 0189 ndash0026 0485P-value 0973 0771 0467 0922 0048df 50 50 15 15 15

M r ndash0183 ndash0037 ndash0781 ndash0670 0057P-value 0194 0797 0000 0003 0829df 50 50 15 15 15

X r 0212 0567 0482 0527 ndash0224P-value 0132 0000 0050 0030 0387df 50 50 15 15 15

NOTEmdashSubstitution rates and life-history traits were estimated by species pairs (species correlations table) and averaged within taxonomic family (taxonomic familycorrelations table) We reject a null hypothesis of correlation equal to zero when Plt 005 (shaded areas)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1369

have been found in angiosperms with Negt100000 such asCapsella grandiflora (Slotte et al 2010 Williamson et al 2014)Helianthus spp (Strasburg et al 2011) and Populus spp(Ingvarsson 2010 Wang et al 2016a 2016b) In gymnospermsprevious reports of Ne range from 120000ndash560000 for Piceaand Pinus species (Brown et al 2004 Bouille and Bosuquet2005 Syring et al 2007)

In our study we found a higher proportion of sites withpositive selection coefficients and evidence of stronger diver-sifying selection in gymnosperms than in angiosperms(Pfrac14 0033) Models that allow for sites under positive selec-tion such as M2a M3 and M8 all suggested the presence of avery small number (02ndash13) of weakly positively selectedsites (xfrac14 153ndash189) in angiosperm taxa in contrast to ahigher number (2313) of sites with moderate to strongdiversifying selection (xfrac14 221ndash329) in gymnosperm taxa Inaddition results of the BEB analysis using site model 2a iden-tified 241 (0759) sites under selection in gymnosperms andnone in angiosperms The same analysis using site model M8identified 546 (172) sites under selection in gymnospermsand 10 (003) in angiosperms Finally the NEB analysis usingsite model M3 identified 1960 (6175) sites in gymnospermsand only six (0018) in angiosperms Likelihood ratio tests forthe presence of diversifying selection were significant for Piceaand Pinus (M1andashM2a comparison) and for all taxa with theexception of Brassicaceae under M7 and M8 comparisonsUnder M3 Picea and Pinus showed the highest proportionsof sites under selection whereas Rosaceae showed the lowestproportion Our results are consistent with previous esti-mates of low levels of adaptive evolution in eleven angio-sperm species (Gossmann et al 2010) and with estimatesin Pinus contorta and Pinus taeda in which the proportionsof sites fixed by positive selection were 13ndash52 and 22ndash37respectively (Eckert et al 2013 Hodgins et al 2016)

There is a rich literature reporting the influence of positiveselection in Pinaceae species which in absence of reports inother gymnosperm families has frequently been used as anexample of all gymnosperm taxa The recent sequencingof some Pinaceae genomes have paved the way to test theinfluence of positive selection at the genome level (De LaTorre et al 2015a Hodgins et al 2016) which have confirmedwhat was previously found at the genetic (small number ofgenes) and population levels (Eckert et al 2010 Buschiazzoet al 2012 Pavy et al 2012 De La Torre et al 2015b)Furthermore studies on quantitative trait variation suggestnatural selection is highly efficient in producing a relativelyfast evolutionary response in contrast to the slow evolution-ary rate (Savolainen and Pyheuroajeuroarvi 2007) Examples of thisinclude the evolution of cold adapted genotypes in severallong-lived tree species in the northern hemisphere after theLast Glaciation (Mimura and Aitken 2007 Wachowiak et al2009 Holliday et al 2010 Kujala and Savolainen 2012 De LaTorre et al 2014b) In addition because of their long gener-ation times and low mutation rates gymnosperm genomesmay retain the consequences of demographic events for along time suggesting that even species that have contractedtheir ranges and currently have small distributions may re-semble those with long-term large effective population sizes

Although our results suggest an important role of positiveselection in the evolution of gymnosperm taxa we do notbelieve that Pinaceae is a good representative of other gym-nosperm taxa Pinus and Picea have indeed significantly highernumber of sites under diversifying selection than other gym-nosperm taxa Recent evolutionary radiations may have re-sulted in lower dS and higher x in comparisons within thesetaxonomic families (Palme et al 2009)

We expect our estimation of the proportion of sites underselection to be conservative because of the use of single copygenes Single copy genes are usually ubiquitously expressedgenes that encode for basic cellular functions that are pre-served across taxa For this reason they are likely to be understronger purifying selection due to functional and structuralconstraints than paralogous genes in gene families in bothangiosperms (De Smet et al 2013 Li et al 2016) andgymnosperms (De La Torre et al 2015a) These genes alsoseem to experience less frequent positive selection than paral-ogous genes which tend to be more narrowly expressed(Larracuente et al 2007) For this reason we expect a lowerproportion of conserved sites and a higher proportion ofselected sites in paralogous genes (which account for themajority of genes in plant genomes) In fact it has been sug-gested that many of the genes under diversifying selection inconifers belong to large multi-copy gene families such asLeucine Rich Repeats Cytochrome P450 among others(Pavy et al 2012 Neale et al 2014 De La Torre et al2015a) In addition some sites under positive selection maygo undetected because structural constraints may inducepurifying selection to push x to values lower than 1 in highlyconserved genes (Echave et al 2016) However the introduc-tion of paralogous genes in a phylogeny has confoundingeffects for both the phylogenetic reconstruction and alsofor the estimation of sequence divergence among taxa there-fore we believe than the use of single-copy genes is probablythe best method for comparisons among diverse taxa In ourefforts to identify a highly confident set of very well alignedorthologs across a broad taxonomic sampling the number ofgenes used in this study was significantly reduced raising con-cerns about the representativeness of these genes in agenome-wide context However comparisons with previousstudies suggest our estimates of rates of divergence are quitecomparable with estimates of rates with much larger data setsin specific taxa (Gossmann et al 2010 Eckert et al 2013Hodgins et al 2016)

Stronger and More Efficient Purifying Selection inGymnospermsWhen estimating the distribution of selection coefficientswe found that the majority of sites in both angiospermsand gymnosperms were highly conserved suggesting animportant role of purifying selection in the evolution of plantgenomes Previous studies in angiosperm species have sug-gested that the largest proportion of mutations is stronglydeleterious and that the number of slightly and mildly dele-terious mutations seems to be relatively conserved amongspecies (Gossmann et al 2010) In contrast with these studiesour study has found differences in the proportion of sites

De La Torre et al doi101093molbevmsx069 MBE

1370

under mildly and strong purifying selection Gymnospermfamilies had a greater proportion of sites under strong puri-fying selection (S lt ndash15) than angiosperms (t frac14 ndash357Plt 005) whereas angiosperm families had a greater propor-tion of neutral and mildly deleterious (S 0 and S gtndash15)sites than gymnosperms (tfrac14 4119 Plt 005) (fig 2)

This could be explained by the expectation that specieswith larger Ne generally experience stronger purifying selec-tion (Slotte et al 2010 Gossmann et al 2010 Hough et al2013 Williamson et al 2014) Slightly deleterious mutationswill be less likely to segregate at higher frequencies and fixin large populations (Akashi et al 2012) Such mutations arealso selected against by the partial low degree of selfing andhigh early inbreeding depression in conifers (Williams andSavolainen 1996 Remington and OrsquoMalley 2000) Larger pop-ulation sizes may also explain the lower number of neutralsites in gymnosperms than in angiosperms found in thisstudy as it has been shown that the number of effectivelyneutral mutations is negatively correlated with Ne in manyspecies (Piganeau and Eyre-Walker 2009 Gossmann et al2010)

Life Form Does Not Explain Differences in MolecularEvolution among Woody Angiosperms and WoodyGymnosperms or within GymnospermsIn flowering plants it has been recently suggested that lifeform (also measured as plant height) is correlated with ratesof molecular evolution with shrubs and trees showing lowerrates of molecular evolution than herbaceous plants (Gautet al 1992 Smith and Donoghue 2008 Lanfear et al 2013)Similarly in our study we found significantly lower l onaverage in angiosperm shrubstrees compared to angiospermherbs (see supplementary table S6 Supplementary MaterialOnline and fig 2) Both dS and dN showed large variationamong angiosperm herbs and differences in dS and dNamong angiosperm shrubstrees and herbs were not signifi-cant (see supplementary table S6 Supplementary MaterialOnline) When incorporating gymnosperms into the analysislife form on its own does not explain the differences in rates ofmolecular evolution among seed plants as we observe signif-icant differences in dN dS x and l also among woodyangiosperms and woody gymnosperms (fig 2)

Our analyses showed that Gnetophytes the only gymno-sperm division that contains non-woody genera (Welwitschiaand Ephedra) shows higher dN and l than what we observein other gymnosperm families (table 1) Therefore differencesin life form may drive differences in the rate of molecularevolution found in Gnetophytes However we still observethese differences when only including the woody genusGnetum (and excluding Ephedra and Welwitschia) suggestingother factors may influence the differences in rate of sequenceevolution (table 1) In fact Gnetophytes are different from therest of gymnosperms in a number of features that include butare not restricted to their morphology ecology and thepresence of angiosperm-like characteristics such as specialwater-conducting wood vessels and reproductive structuresorganized in compound strobili (Doyle and Donoghue 1986

Friedman 1998) For these reasons the position ofGnetophytes within gymnosperms is contentious (Doyle1998 Braukmann et al 2009 Cibrian-Jaramillo et al 2010Ran et al 2010 Xi et al 2013) In any case the lack of closeextant relatives to Gnetophytes suggests that this problemmight never really be solved satisfactorily regardless of theamount of sequence data we have access to Our resultssuggest that estimations of rates of sequence divergenceamong a few species from very diverged Gnetophyte taxa(such as Gnetum Welwitschia and Ephedra) may lead toinaccurate estimates of dN and dS For this reason we de-cided to exclude the more distant Welwitschia and Ephedrafrom all correlations with life history traits

Generation Times and Their Effect on LowSubstitution RatesThe differences in rates of molecular evolution between an-giosperms with different life forms have been presumed toreflect differences in generation times between herbaceousand woody plants (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013) The generation time hypothesissuggests that species with shorter generation times (eg her-baceous plants) accumulate more replication errors per unittime because they copy their genomes more often whichresults in higher mutation rates (Li et al 1996) Althoughgeneration time may not be a good indicator of the overallrate of genome replication because the number of mitotic celldivisions can vary substantially between generations andamong plant species it remains strongly associated withthe long-term rates of meiosis in plants (Petit and Hampe2006 Lanfear et al 2013) Therefore if a significant proportionof heritable mutations occur during meiosis plants with lon-ger generation times (eg gymnosperms) would have lowermutation and substitution rates per unit of time (Lanfearet al 2013) Alternatively it is been proposed that differencesin the rates of mitosis (mitotic cell divisions that occur in theapical meristem before gametogenesis) can account for theobserved differences in rates of molecular evolution amongplants of different height (Lanfear et al 2013) Lower absolutegrowth rates in long-lived woody perennials would translatein fewer cell divisions and less opportunities for DNA repli-cations errors than short-lived plants (Bromham et al 2015)This would result in a lower mutation rate per unit of time inlong-lived species Our results are consistent with this expec-tation as we found a strong negative correlation betweengeneration times and l (r frac14 ndash078 Plt 0001) (table 4)Long generation times large effective population sizes andlow recombination rates may also help explain the low syn-onymous polymorphism in gymnosperms (Savolainen andPyheuroajeuroarvi 2007 Jaramillo-Correa et al 2010)

In plants in the absence of a segregated germline gametesarise from the apical meristem late in development followingperiods of vegetative growth Because of this somatic muta-tions acquired during vegetative growth can be transmittedto the next generation (Watson et al 2016) Longer growthperiods are thought to result in a larger number of cell divi-sions increasing the opportunities for mutations to occur pergeneration (Schultz and Scofield 2009) Our results suggest

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1371

than on average for the species under study long-lived gym-nosperms accumulate four times more mutations per gener-ation than short-lived angiosperms (assuming an averagegeneration time of 20 years in gymnosperms and an averagel equal to 157 10 8 in gymnosperms and 392 10 9

in angiosperms) This is coincident with previous reports ofincreased number of somatic mutations in gymnospermsspecies (Cloutier et al 2003 OrsquoConnell and Ritland 2004)and other woody perennials (Ally et al 2010 Bobiwashet al 2013) However because the per-generation increasein mutation rate in woody perennials may be less thanpredicted from their differences in generation time (Petitand Hampe 2006) we would expect our reported valuesto be an over-estimation of the actual difference betweenplant seed clades Differences in generation times wouldpredict a mutation rategt100 times larger in mangrovetrees than in annuals when estimates were only 25 timeslarger (Klekowski and Godfrey 1989) Alternatively theremay be selection to reduce DNA replication-dependenterrors through minimizing the number of cell divisionsrequired during development as recently suggested inArabidopsis (Watson et al 2016)

Larger Genomes Are Correlated with Larger Omegasand Slower Rates of Sequence DivergenceIn our study we also explored the relationship between ge-nome size and rates of molecular evolution Gymnospermgenomes are characterized by their enormous genome sizes(20ndash40 Gb) unique genome silencing mechanisms and lowunequal recombination (Leitch and Leitch 2012 Nystedt et al2013) While angiosperm genomes are highly dynamic andhave efficient mechanisms to counteract the increase in DNAamount stemming from WGDs or transposable elements (egreplication or recombination-based errors generating indelsunequal recombination between sister chromosomes Groveland Wendel 2010) gymnosperm genomes seem to be lessdynamic and may have evolved their own epigenetic mech-anisms to silence retrotransposons (Leitch and Leitch 2012)Our results indicate that genome size is positively correlatedwith x (rfrac14 056 Plt 0001 at the species level and rfrac14 052Plt 005 at the family level) In addition genome size is neg-atively correlated with l (r frac14 ndash067 Plt 001) suggestingspecies with large genomes such as gymnosperms have slowerrates of molecular evolution than species with small genomesconfirming previous studies on angiosperms species(Bromham et al 2015) A possible explanation for this isthat if large genome sizes correlate with larger cells and areduction in growth then plants with larger genomes mighthave fewer replications and therefore less accumulated mu-tations per unit of time (Bromham et al 2015)

Our study aims to contribute to the unexplored field ofmolecular evolution in seed plants by investigating thedifferences in rates of molecular evolution among gym-nosperms and angiosperms using a phylogenetic frame-work The recent genomic and transcriptomic resourcesin gymnosperm species opened a window to understandthe evolution of this ancient and important plant cladeHowever gymnosperm resources are still limited in

comparison to their sister clade of flowering plants Wehope that with the development of new genomic re-sources studies in molecular evolution will include abroader taxonomic sampling that provides us with amore complete understanding of the evolution of seedplants

Materials and Methods

Generation of Phylogenetic MarkersTo develop our set of phylogenetic markers we performeddeep sequencing and assembly of whole transcriptomesfrom two conifer species (Pinus pinaster and Pinus sylvestris)(Li Z et al (in review) transcriptome data can be found athttpbioinformaticspsbugentbesupplementary_datazheliphylo) We integrated this data with whole-transcriptomedata from another 29 gymnosperms 34 angiosperms and 1outgroup (Physcomitrella patens) obtained from public data-bases PlantGDB oneKP TreeGenes and PLAZA v30 (Proostet al 2015) We used OrthoMCL (Li et al 2003) to buildorthologous gene families across the species To reduce thenumber of single-copy genes we selected only the gene fam-ilies that were conserved and had low copy number Later weused hidden Markov probabilistic models implemented inHMMER to build an HMM profile for each gene family basedon multiple sequence alignment (Eddy 2009) These HMMprofiles were then used to assign additional proteins to theexisting gene families Then we used HMMSEARCH to findthe best protein hit of an HMM profile in each species andselected the markers with only reciprocal best hits for phylo-genomic analysis In order to increase the spectrum of thephylogenetic markers we only selected single-copy markersthat were present in a majority of species

Multiple sequence alignments were carried out for eachgene family based on amino acid sequences using Musclev3831 (Edgar et al 2004) Trimal v14 (Capella-Gutierrezet al 2009) was used to back translate the amino acid align-ments into coding sequence alignments and to remove lowquality alignment regions and spurious sequences We de-fined a spurious sequence as a sequence having less than70 of the total alignment positions that were present in75 of 66 species Genes that lost their sequences in morethan 10 of the species after removing spurious sequenceswere not used for further analyses This way we obtained 42single-copy markers across 66 seed plant species that wereused for evolutionary and phylogenetic analyses Annotationsof all genes are reported in supplementary table S1Supplementary Material Online

Rates of Sequence DivergenceMultiple sequence alignments for each of the 42 single-copygenes were divided in two groups containing either angio-sperms (34 species) or gymnosperms (31 species) to allowfurther comparisons between major plant taxa Alignmentgaps and low quality regions were manually removed usingJalview version 281 (Waterhouse et al 2009) Also speciescontaining more than 30 gaps or ambiguous sites were notkept for further analyses

De La Torre et al doi101093molbevmsx069 MBE

1372

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 3: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

(tfrac14 3 dffrac14 5787 Plt 001) and x (t frac14 ndash40379 dffrac14 378Plt 0001) however dN was not significantly differentbetween angiosperms and gymnosperms when comparingbranch estimations in all terminal branches (t frac14 ndash0319dffrac14 5234 Pfrac14 0751) (see supplementary table S4Supplementary Material Online) The most contrasting differ-ences were found in the absolute rates of silent-site diver-gence (l) with angiosperms rates being on average seventimes higher than that observed in gymnosperm species(535 10 9 vs 771 10 10 synonymous substitu-tionssiteyear respectively) Families showing the highestrates were Brassicaceae and Poaceae and the ones with thelowest rates were Cuppresaceae and PodocarpaceaeWithin gymnosperms Gnetophytes had the highest ratesof sequence divergence (table 1)

Estimation of the Variation in Selective Pressures (x)among BranchesWe estimated the variation in selective pressures (x) amongstem branches in the gymnosperm and angiosperm phylog-enies using the branch models in PAML (fig 1) The results ofthe model testing indicated that the most parameter-richhypothesis (H4) fits the data best This hypothesis assumesthat all ancestral and terminal branches have different ome-gas The second best hypothesis was H3 which suggests along-term shift in selective pressure resulting in both lineageshaving different omegas (x06frac14xA 6frac14xG) Likelihood ratiotests were significant for H1ndashH3 (2Dlfrac14 3562 Plt 0001)and H2ndashH3 (2Dlfrac14 12931 Plt 0001) Hypothesis H2 rankedthird followed by H1 and hypothesis H0 where there wereno differences in selective pressures between lineages(x0frac14xGfrac14xA) had the worst fit to the data (table 2)Omega ratios for gymnosperms were significantly higher

than for angiosperms in all hypotheses tested with the excep-tion of H0 (one-ratio model)

Estimation of Substitution Parameters Using SiteModelsA discrete model (M3) that uses an unconstrained discretedistribution to model heterogeneous omega ratios amongsites (Yang et al 2000) showed the best fit to the data inall angiosperm taxonomic families with the exception ofBrassicaceae (model 7 beta) The nearly neutral model(model 1a) that assumes a proportion of conserved siteswith x0lt1 and a proportion of neutral sites with x1frac141and the selection model (M2a) that allows an additional classof sites with x2gt1 had the worst fit in all angiosperm taxaExcluding poorly fitted M1a and M2a the average x amongall models ranged from 0092 to 0093 (Brassicaceae) 0106ndash0108 (Malvaceae) 0117ndash0125 (Fabaceae) 0118ndash0156(Rosaceae) and 0131ndash0133 (Poaceae) Average x ratios sug-gest a nonsynonymous mutation has only 9ndash13 as muchchance as a synonymous mutation of being fixed suggestingmost sites are highly conserved in angiosperm taxa

In gymnosperms M8 (beta amp x) showed the best fit inCupressaceae and Cycads M10 (beta amp gammathorn1) in Pinusand M2a (selection) in Picea The beta model (M7) had theworst fit to the data in Picea and Pinus and the gamma model(M5) in Cupressaceae and Cycads Excluding poorly fitted M7for Picea and Pinus and M5 for Cuppressaceae and Cycadsthe average x among all models ranged from 066ndash067(Pinus) 037ndash041 (Picea) 0167ndash0178 (Cuppresaceae) and02ndash0216 (Cycads) Overall these estimates suggest a signif-icantly higher chance of fixation of nonsynonymous muta-tions in gymnosperms (17ndash67) than in angiosperm taxaParameters estimates for each of the six site models testedin each of the five angiosperm and four gymnosperm families

Table 1 Absolute Rates of Silent-Site Divergence (l) for Each Taxonomic Family or Subfamily Based on 42 Single-Copy Nuclear Genes in 31Gymnosperms and 34 Angiosperms Species

Plant Clade Taxa Subtaxa dS T (years) l (siteyear)

Gymnosperms Pinaceae 0102 728067Endash10Pinus 0070 84500000 416479Endash10

Cedrus_Abies 0128 129700000 49387Endash10Larix_Pseudotsuga 0148 61600000 119939Endash09

Picea 0061 38300000 802526Endash10Cupressaceae 0189 159200000 592216Endash10

Taxaceae 0225 153000000 734573Endash10Podocarpaceae 0170 146100000 581248Endash10

Cycadales 0333 248100000 670325Endash10Gnetophytes 0073 25000000 145547Endash09

Angiosperms Brassicaceae 0352 27000000 651648Endash09Malvaceae 0373 40900000 456Endash09

Euphorbiaceae 0463 51000000 454382Endash09Fabaceae 0502 53100000 472809Endash09

Cucurbitaceae 0239 20000000 59795Endash09Rosaceae 0421 52300000 402113Endash09

Solanaceae 0084 7300000 575068Endash09Poaceae 0372 60277Endash09

Poaceae I 0457 39700000 576174Endash09Poaceae II 0287 22800000 629366Endash09

NOTEmdashDivergence times (T) are based on fossil calibration data and published studies (see supplementary table S2 Supplementary Material Online)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1365

are included in supplementary table S5 SupplementaryMaterial Online

When comparing the proportion of conserved neutraland positively selected sites under the discrete model(M3) we observed that all taxa had a higher proportionof conserved followed by neutral and then by positivelyselected sites Conserved sites (xlt 1) in angiospermsranged from 55 to 77 whereas gymnosperm sitesranged from 66 to 76 differences between taxa werenot significant (Pfrac14 062) The proportion of neutral sites(xfrac14 1) ranged from 21 to 39 in angiosperms andfrom 21 to 28 in gymnosperms however the differ-ences between medians were not significant (Pfrac14 022)

(supplementary table S5 Supplementary MaterialOnline)

Inference of the Proportion of Sites under SelectionModels that allow for sites under positive selection such asM2a M3 and M8 all suggested the presence of a very smallnumber of weakly positively selected sites in angiosperm taxaModels M3 and M8 identified between 07ndash13 sites underweak diversifying selection (xfrac14 165 and xfrac14 138 respec-tively) in Brassicaceae However likelihood ratio test statisticsfor comparing M1a and M2a and M7 and M8 did not showsignificant results (table 3) In addition the NEB analyses didnot identify any sites under positive selection For Malvaceae

Table 2 Parameter Estimates under Models of Variable Omega (x) among Branches from the Gymnosperm and Angiosperm Lineages Based on aConcatenated Alignment of 31737 Nucleotide Sites in 61 Species

Hypothesis Model No of Parameters Background Foreground Parameter Estimates ln L AIC

H0 M0 1 x0frac14xG frac14 xA xfrac14 011356 ndash62177774 124355748H1 M2 2 x0frac14xG xA x0frac14xG frac14 01238 xA frac14 00989 ndash6216446499 12432933H2 M2 2 x0frac14xA xG x0frac14xA frac14 00935 xG frac14 02603 ndash6199280209 1239860042H3 M2 3 x0 xA xG x0frac14 00857 xA frac14 01024 xG frac14 02631 ndash619863362 1239732724H4 M1 120 x ranges between 005 to 076 ndash6175988386 1235437677

NOTEmdashWe tested the following hypotheses H0 Homogeneous selective pressure in both clades (x0frac14 xG frac14 xA) H1 Selective Pressure in the Angiosperm clade (x0frac14 xGxA) H2 selective pressure in the gymnosperm clade (x0frac14 xA xG) H3 long-term shift in selective pressure resulting in both clades having different omegas (x0 xG xA)and H4 all terminal branches in both clades have different omegas For hypotheses H1 to H3 xG and xA identify estimated omegas on stem branches of the gymnosperm andangiosperm lineages according to phylogenetic tree in figure 1 x0 represents the estimated omega in unselected branches (background) Parameter estimates of model M1(hypothesis H4) are detailed in supplementary table S7 Supplementary Material Online

02

Prumnopitys andina

Pinus massoniana

Larix kaempferi

Podocarpus macrophyllus

Oryza sativa ssp japonica

Sorghum bicolor

Sequoia sempervirens

Arabidopsis lyrata

Citrullus lanatus

Lotus japonicus

Beta vulgaris

Ginkgo biloba

Gnetum gnemon

Hordeum vulgare

Malus domestica

Pinus sylvestris

Fragaria vesca

Cryptomeria japonica

Brassica rapa

Ricinus communis

Physcomitrella patens

Populus trichocarpa

Citrus sinensis

Eucalyptus grandis

Brachypodium distachyon

Amborella trichopoda

Welwitschia mirabilis

Juniperus scopulorum

Abies alba

Zea mays

Picea glauca

Theobroma cacaoGossypium raimondii

Medicago truncatula

Manihot esculenta

Pinus pinaster

Cunninghamia lanceolata

Gnetum montanum

Pseudotsuga menziesii

Ephedra sinica

Vitis vinifera

Thellungiella parvula

Taxus baccata

Pinus taeda

Musa acuminata

Cephalotaxus harringtonia

Glycine max

Oryza sativa ssp indica

Zamia vazqueziiCycas rumphii

Cycas micholitzii

Setaria italica

Cedrus libani

Pinus contortaPinus banksiana

Picea abies

Carica papaya

Arabidopsis thaliana

Cucumis melo

Prunus persica

Pinus lambertiana

Capsella rubella

Picea sitchensis

Solanum tuberosumSolanum lycopersicum

Sciadopitys verticillata

100

100

100

100

100

100

100

100

9 5

9 0

100

100

100

100

100

100

100

100

100100

100

100

100

100

100

100

100

100

9 0

100

100

100

100

8 4

100

100

100

100

100

100

100

100

8 3

100

8 8

100

100

100

100

100

100

8 8

9 8

100

100

100

9 5

100

100

100

100

100

100

FIG 1 Phylogenetic tree obtained from a concatenated alignment of 42 single-copy genes and 66 species distributed between two main seed plantlineages (angiosperms in green gymnosperms in blue and outgroup in black) inferred by RAxML Stem branches for angiosperms and gymno-sperms were used for model testing using the branch models implemented in codeml Results of the model testing can be found in table 2

De La Torre et al doi101093molbevmsx069 MBE

1366

four sites under positive selection were identified by the BEBanalysis (Pgt 95) and the likelihood ratio test for compar-ison between M7ndashM8 was significant (2Dlfrac14 33063Plt 0001 dffrac14 2) However none of the other tested modelsidentified sites under positive selection Weakly to moderatediversifying selection (xfrac14 189 and xfrac14 275) was found in avery small proportion of sites (11 and 05) in Fabaceaebased on the results of M3 and M8 In addition the NEBanalysis identified two sites under positive selection usingM3 and the likelihood ratio test for comparison betweenM7ndashM8 was significant (2Dlfrac14 2274 Plt 0001 dffrac14 2) InRosaceae M3 and M8 identified 02 of sites under strongdiversifying selection In addition the BEB analysis identifiedtwo sites under positive selection using M3 and the likelihoodratio test for comparison between M7ndashM8 was significant(2Dlfrac14 642 Plt 005 dffrac14 2) Finally weakly diversifyingselection (xfrac14 153 and xfrac14 185) was found in 07 and13 sites in Poaceae Four sites were identified by the BEBanalyses in both M3 and M8 and the likelihood ratio testfor M7ndashM8 was significant (2Dlfrac14 15428 Plt 0001 dffrac14 2)(table 3 see supplementary table S5 Supplementary MaterialOnline)

In contrast to angiosperm taxa evidence for moderate tostrong diversifying selection and a higher number of sitesunder positive selection was found in gymnosperm taxa InPinus M2a M3 and M8 identified 163 2313 and 2221of sites under moderate to strong diversifying selection(xfrac14 32 xfrac14 273 and xfrac14 28 respectively) This taxo-nomic family had the highest number of sites under positiveselection (240 245 and 1070) evidenced by the results of theBEB analyses under M2a and M8 and the NEB analysis underM3 Likelihood ratio tests were significant for both the M1andashM2a (2Dlfrac14 51146 Plt 0001 dffrac14 2) and the M7ndashM8(2Dlfrac14 54937 Plt 0001 dffrac14 2) comparisons Similarly inPicea M2a identified 2148 of weakly selected sites(xfrac14 119) and 325 of sites under strong positive selection(xfrac14 329) Models M3 and M8 identified 56 of sites undermoderate positive selection (xfrac14 28) The number of sitesunder positive selection was highly variable among models 1301 and 885 based on the BEB analyses under M2a and M8and by the NEB results based on M3 The LRT statistics weresignificant for both the M1andashM2a and M7ndashM8 comparisons(table 3) A small proportion (12ndash14) of sites under

moderate positive selection (xfrac14 221 and xfrac14 232) wereevidenced by the results of M3 and M8 in CuppresaceaeThree sites were found to be under positive selection basedon the NEB analysis of M3 Finally in Cycads M3 identified46 of weakly selected sites (xfrac14 171) and M8 identified19 of sited under moderate selection (xfrac14 238) The LRTstatistics for M7ndashM8 comparisons were significant for bothCuppressaceae and Cycads (table 3 see supplementary tableS5 Supplementary Material Online)

Distribution of Selection CoefficientsParameter estimates a and b from the gamma distribution ofx across sites in each taxonomic family were used to estimatethe distribution of the selection coefficients of new mutations(Yang et al 2000 Nielsen and Yang 2003) As expected wefound that a and b varied across taxonomic families resultingin varying estimates of S and f(S) For all taxonomic familiesthe distributions of selection coefficients for new mutationshad a peak at S frac14 ndash5 suggesting that the majority of sitesare under purifying selection In all angiosperms exceptBrassicaceae selection coefficients are mostly distributedwith S frac14 [ndash10 0] whereas in gymnosperms the majorityof selection coefficients are distributed within a wider rangewith S frac14 [ndash20 10] Gymnosperm families had a greater pro-portion of sites under strong purifying selection (S lt ndash15)than angiosperms (t frac14 ndash357 Plt 005) whereas angiospermfamilies had a greater proportion of neutral and mildly dele-terious (S 0 and S gt ndash15) sites than gymnosperms(tfrac14 4119 Plt 005) (fig 2) The proportion of advantageousmutations (positive selection coefficients) also varied amongfamilies with gymnosperm taxa having on average more sitesunder positive selection than angiosperm taxa (t frac14 ndash574Plt 0001) (fig 2)

Correlations with Life-History TraitsIn order to explain the variation we observe in dN dS and xbetween the major plant clades we tested whether thesevariables showed any associations with life history traitsOur results suggest that generation time (time to reach re-productive maturity measured in years) was strongly nega-tively correlated with l (family level r frac14 ndash078 Plt 0001)(table 4) Also genome size (amount of DNA contained in ahaploid nucleus and measured in picograms (1C)) wasstrongly negatively correlated with l (family level r frac14ndash067) but positively correlated with x (species levelrfrac14 056 Plt 0001 family level rfrac14 052 Plt 005) (table 4)Finally species richness (number of extant species in eachtaxonomic family) was significantly correlated with dS(rfrac14 07 Plt 0001) and dN (rfrac14 048 Plt 005) at the familylevel (not tested at species level)

We found significant differences (P lt 005) in l amonggroups with different life forms when using six groups(angiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes) Differences in x and l were alsosignificant among groups when comparing four groups (an-giosperms herbs angiosperms treesshrubs gymnospermsand Gnetophytes) and three groups (angiosperms herbs

Table 3 Results of the Likelihood Ratio Tests for Evidence of PositiveSelection in Each of the Taxonomic Families Studied

Taxa M1andashM2a (2DL) P Value M7ndashM8 (2DL) P Value

Brassicaceae 0 NS 2577926 02766Malvaceae 0 NS 33063102 000000007Fabaceae 0 NS 22744606 000001151Rosaceae 0 NS 6429714 004016Poaceae 0 NS 15428744 00004463Pinus 5114601 0 549377428 0Picea 116785146 0 116646654 0Cupressaceae 0 NS 1511157 0000523Cycadales 0 NS 6535688 003808

NOTEmdashComparisons between site models M1a and M2a and M7 and M8 for eachtaxonomic family were evaluated

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1367

angiosperms shrubstrees and gymnosperms) (see supplementary table S6 Supplementary Material Online) Numberof synonymous (dS) and non-synonymous substitutions (dN)were not significantly different in any of the groups whenassessed by P lt 005 Boxplots showing differences amonggroups when groups are divided into three (angiospermsherbs angiosperms treesshrubs and gymnosperms) are plot-ted in figure 2 In the Gnetophyte group only Gnetum specieswere included and not Welwitschia and Ephedra (seeDiscussion part)

Discussion

Slower Rates of Molecular Evolution in GymnospermsOur study clearly suggests slower rates of molecular evolutionin gymnosperm than in angiosperm protein-coding genes

evidenced by a lower number of synonymous substitutions(dS) and lower rates of silent-site divergence (l)

Based on the evolutionary and phylogenetic analyses of31968 nucleotide sites from a wide taxonomic sampling ourresults suggest l values of 41ndash14 10 10 in gymnospermtaxa and 4ndash65 10 9 in angiosperm taxa (table 1) Takentogether this corresponds to a 7-fold average variationamong gymnosperms and angiosperms (771 10 10 vs535 10 9 synonymous substitutionssiteyear respec-tively) Our results lay within the ranges obtained by previousstudies comparing a few Pinaceae species suggesting the var-iation found in this taxonomic family is a relatively goodrepresentation of the variation found in all the gymnospermtaxa with the exception of Gnetophytes For exampleWillyard et al (2007) while comparing different speciesof Pinus found absolute rates of silent-site divergence of 7ndash131 10 10 Similarly Chen et al (2012) estimated pairwise

A

B

C

FIG 2 (A) Boxplots showing differences in number of synonymous substitutions (dS) nonsynonymous substitutions (dN) absolute rate of silent-site divergence (l) and substitution rate ratio (x) among life forms defined as angiosperms herbs (green) angiosperms shrubtrees (light green)and gymnosperms (blue) Results of the statistical tests of comparisons among groups can be found in supplementary table S6 SupplementaryMaterial Online (B) Proportion of sites under negative selection when all sites are evaluated (top left) when Sltndash15 (top right) and when Sgtndash15(bottom left) and proportion of sites under positive selection (bottom right) Green boxes represent angiosperm species and blue boxesgymnosperms (C) Distribution of the selection coefficient (f(S)) of new mutations It was calculated for all species in each taxonomic familystudied assuming a gamma distribution of x among sites Parameter estimates a and b were obtained from running the site model 5 in codeml(PAML) using a concatenated alignment of 29000ndash31000 sites per taxonomic family Selection coefficients were then obtained replacing a and bin equation 1 Green dotted vertical lines are used to show the distribution of f(S) when S 0 and xfrac14 1 (neutrality) Taxonomic familiescontaining less than 5 species were not analyzed due to difficulties in constructing the phylogenetic trees in RaxML

De La Torre et al doi101093molbevmsx069 MBE

1368

sequence divergence between three Picea and one Taxus spe-cies and obtained l ranging from 55ndash124 10 10 FinallyBuschiazzo et al (2012) obtained a 25-fold difference whencomparing substitution rates between orthologs of Piceasitchensis and Pinus taeda with Arabidopsis thaliana and a4-fold difference when comparing the same Picea-Pinusorthologs with Populus trichocarpa

The pattern of slower rates of molecular evolution is con-sistent when grouping species according to their life formswith gymnosperms showing lower dS and l than angio-sperms herbs and angiosperms shrubs and trees This levelof evolutionary conservation (which probably help explainthe high levels of macrosynteny previously observed insome Pinaceae species) is surprising considering the ancientnature of the plant clade which appeared on Earth muchearlier than flowering plants

Higher Substitution Rate Ratio (x) in Gymnospermsthan in AngiospermsEqually surprising are the high substitution rate ratios (x)found in gymnosperms despite their slow rates of evolutionParameter estimates under models of variable x betweenstem branches of the angiosperm and gymnosperms lineagesbased on a concatenated alignment of 31737 nucleotide sitessupported a model of different and higher x in gymnospermsthan in angiosperms (table 2) In addition substitutionparameters based on seven different site models foundangiosperms rate ratios varied from 009ndash013 whereas gym-nosperms ratios varied from 017ndash067 Overall these esti-mates suggest a significantly higher chance of fixation ofnonsynonymous mutations than synonymous mutationsin gymnosperms than in angiosperm taxa (supplementarytable S5 Supplementary Material Online)

High x ratios could result from a high dN over a low dShowever this does not seem to be the case for gymnospermsInstead higher x ratios result from a low to moderate dNover a very low dS in gymnosperm genes Our results are incontrast with previous smaller scale studies either showingsignificantly lower dS and dN as a cause of a 4-fold higher x in

gymnosperms than in angiosperms (Buschiazzo et al 2012)or the ones that did not find significant differences in x ratiosamong gymnosperms and angiosperms (Chen et al 2012)These studies were limited by a small number of species thatwould not allow for evolutionary analyses within a phyloge-netic context When using a small number of species pairwiseestimates of dN dS and x will strongly depend on theselection of species In addition the evaluation of individualsites (site models) and the branch models have more powerto test variable x ratios than the pairwise sequence compar-isons in which x rates are averaged over all amino acidsites (Yang 2002) In the case of the Buschiazzorsquos study highlydiverged sets of species were used for comparisons(P sitchensisndashPtaeda and AthalianandashPtrichocarpa) High se-quence divergence between species is often associated withdifficulties in the alignment different codon usage biases andnucleotide compositions In addition saturation of substitu-tions may be particularly problematic when using pairwisemethods in comparison with branch methods

Positive Selection and Adaptive Evolution inGymnospermsUnder the Nearly Neutral Theory the strength and efficacy ofselection depends on the long-term effective population size(Ne) Theoretical approaches predict that as population sizesincrease the power of natural selection increases faster thanthe influx of new mutations (Akashi et al 2012) Naturalselection thus becomes more effective in removing deleteri-ous mutations and in fixing advantageous mutations whichresults in a lower substitution rate of deleterious mutationsand also a higher substitution rate of advantageous mutationsin large populations (Lanfear et al 2014)

Current estimates suggest that species with large popula-tion sizes such as Drosophila mice bacteria and rabbits showsigns of adaptive evolution (Bierne and Eyre-Walker 2004Carneiro et al 2012 Phifer-Rixey et al 2012) In contrast littleevidence of adaptive amino acid substitutions has been foundin flowering plants (mostly with modest Nelt 100000)(Gossmann et al 2010 Hough et al 2013) Few exceptions

Table 4 Correlation among Number of Synonymous (dS) and Nonsynonymous Substitutions (dN) Nonsynonymoussynonymous Rate Ratio(x) and the Absolute Rate of Silent-Site Divergence (l) with Life-History Traits Using the Phylogenetically Independent Contrast Method (PIC)

Species Correlations Taxonomic Family Correlations

Parameters Generation Time (years) Genome Size (1C) Generation Time (years) Genome Size (1C) Species Richness

dS r ndash0172 ndash0135 ndash0153 ndash0311 0703P-value 0223 0341 0558 0225 0002df 50 50 15 15 15

dN r 0005 0041 0189 ndash0026 0485P-value 0973 0771 0467 0922 0048df 50 50 15 15 15

M r ndash0183 ndash0037 ndash0781 ndash0670 0057P-value 0194 0797 0000 0003 0829df 50 50 15 15 15

X r 0212 0567 0482 0527 ndash0224P-value 0132 0000 0050 0030 0387df 50 50 15 15 15

NOTEmdashSubstitution rates and life-history traits were estimated by species pairs (species correlations table) and averaged within taxonomic family (taxonomic familycorrelations table) We reject a null hypothesis of correlation equal to zero when Plt 005 (shaded areas)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1369

have been found in angiosperms with Negt100000 such asCapsella grandiflora (Slotte et al 2010 Williamson et al 2014)Helianthus spp (Strasburg et al 2011) and Populus spp(Ingvarsson 2010 Wang et al 2016a 2016b) In gymnospermsprevious reports of Ne range from 120000ndash560000 for Piceaand Pinus species (Brown et al 2004 Bouille and Bosuquet2005 Syring et al 2007)

In our study we found a higher proportion of sites withpositive selection coefficients and evidence of stronger diver-sifying selection in gymnosperms than in angiosperms(Pfrac14 0033) Models that allow for sites under positive selec-tion such as M2a M3 and M8 all suggested the presence of avery small number (02ndash13) of weakly positively selectedsites (xfrac14 153ndash189) in angiosperm taxa in contrast to ahigher number (2313) of sites with moderate to strongdiversifying selection (xfrac14 221ndash329) in gymnosperm taxa Inaddition results of the BEB analysis using site model 2a iden-tified 241 (0759) sites under selection in gymnosperms andnone in angiosperms The same analysis using site model M8identified 546 (172) sites under selection in gymnospermsand 10 (003) in angiosperms Finally the NEB analysis usingsite model M3 identified 1960 (6175) sites in gymnospermsand only six (0018) in angiosperms Likelihood ratio tests forthe presence of diversifying selection were significant for Piceaand Pinus (M1andashM2a comparison) and for all taxa with theexception of Brassicaceae under M7 and M8 comparisonsUnder M3 Picea and Pinus showed the highest proportionsof sites under selection whereas Rosaceae showed the lowestproportion Our results are consistent with previous esti-mates of low levels of adaptive evolution in eleven angio-sperm species (Gossmann et al 2010) and with estimatesin Pinus contorta and Pinus taeda in which the proportionsof sites fixed by positive selection were 13ndash52 and 22ndash37respectively (Eckert et al 2013 Hodgins et al 2016)

There is a rich literature reporting the influence of positiveselection in Pinaceae species which in absence of reports inother gymnosperm families has frequently been used as anexample of all gymnosperm taxa The recent sequencingof some Pinaceae genomes have paved the way to test theinfluence of positive selection at the genome level (De LaTorre et al 2015a Hodgins et al 2016) which have confirmedwhat was previously found at the genetic (small number ofgenes) and population levels (Eckert et al 2010 Buschiazzoet al 2012 Pavy et al 2012 De La Torre et al 2015b)Furthermore studies on quantitative trait variation suggestnatural selection is highly efficient in producing a relativelyfast evolutionary response in contrast to the slow evolution-ary rate (Savolainen and Pyheuroajeuroarvi 2007) Examples of thisinclude the evolution of cold adapted genotypes in severallong-lived tree species in the northern hemisphere after theLast Glaciation (Mimura and Aitken 2007 Wachowiak et al2009 Holliday et al 2010 Kujala and Savolainen 2012 De LaTorre et al 2014b) In addition because of their long gener-ation times and low mutation rates gymnosperm genomesmay retain the consequences of demographic events for along time suggesting that even species that have contractedtheir ranges and currently have small distributions may re-semble those with long-term large effective population sizes

Although our results suggest an important role of positiveselection in the evolution of gymnosperm taxa we do notbelieve that Pinaceae is a good representative of other gym-nosperm taxa Pinus and Picea have indeed significantly highernumber of sites under diversifying selection than other gym-nosperm taxa Recent evolutionary radiations may have re-sulted in lower dS and higher x in comparisons within thesetaxonomic families (Palme et al 2009)

We expect our estimation of the proportion of sites underselection to be conservative because of the use of single copygenes Single copy genes are usually ubiquitously expressedgenes that encode for basic cellular functions that are pre-served across taxa For this reason they are likely to be understronger purifying selection due to functional and structuralconstraints than paralogous genes in gene families in bothangiosperms (De Smet et al 2013 Li et al 2016) andgymnosperms (De La Torre et al 2015a) These genes alsoseem to experience less frequent positive selection than paral-ogous genes which tend to be more narrowly expressed(Larracuente et al 2007) For this reason we expect a lowerproportion of conserved sites and a higher proportion ofselected sites in paralogous genes (which account for themajority of genes in plant genomes) In fact it has been sug-gested that many of the genes under diversifying selection inconifers belong to large multi-copy gene families such asLeucine Rich Repeats Cytochrome P450 among others(Pavy et al 2012 Neale et al 2014 De La Torre et al2015a) In addition some sites under positive selection maygo undetected because structural constraints may inducepurifying selection to push x to values lower than 1 in highlyconserved genes (Echave et al 2016) However the introduc-tion of paralogous genes in a phylogeny has confoundingeffects for both the phylogenetic reconstruction and alsofor the estimation of sequence divergence among taxa there-fore we believe than the use of single-copy genes is probablythe best method for comparisons among diverse taxa In ourefforts to identify a highly confident set of very well alignedorthologs across a broad taxonomic sampling the number ofgenes used in this study was significantly reduced raising con-cerns about the representativeness of these genes in agenome-wide context However comparisons with previousstudies suggest our estimates of rates of divergence are quitecomparable with estimates of rates with much larger data setsin specific taxa (Gossmann et al 2010 Eckert et al 2013Hodgins et al 2016)

Stronger and More Efficient Purifying Selection inGymnospermsWhen estimating the distribution of selection coefficientswe found that the majority of sites in both angiospermsand gymnosperms were highly conserved suggesting animportant role of purifying selection in the evolution of plantgenomes Previous studies in angiosperm species have sug-gested that the largest proportion of mutations is stronglydeleterious and that the number of slightly and mildly dele-terious mutations seems to be relatively conserved amongspecies (Gossmann et al 2010) In contrast with these studiesour study has found differences in the proportion of sites

De La Torre et al doi101093molbevmsx069 MBE

1370

under mildly and strong purifying selection Gymnospermfamilies had a greater proportion of sites under strong puri-fying selection (S lt ndash15) than angiosperms (t frac14 ndash357Plt 005) whereas angiosperm families had a greater propor-tion of neutral and mildly deleterious (S 0 and S gtndash15)sites than gymnosperms (tfrac14 4119 Plt 005) (fig 2)

This could be explained by the expectation that specieswith larger Ne generally experience stronger purifying selec-tion (Slotte et al 2010 Gossmann et al 2010 Hough et al2013 Williamson et al 2014) Slightly deleterious mutationswill be less likely to segregate at higher frequencies and fixin large populations (Akashi et al 2012) Such mutations arealso selected against by the partial low degree of selfing andhigh early inbreeding depression in conifers (Williams andSavolainen 1996 Remington and OrsquoMalley 2000) Larger pop-ulation sizes may also explain the lower number of neutralsites in gymnosperms than in angiosperms found in thisstudy as it has been shown that the number of effectivelyneutral mutations is negatively correlated with Ne in manyspecies (Piganeau and Eyre-Walker 2009 Gossmann et al2010)

Life Form Does Not Explain Differences in MolecularEvolution among Woody Angiosperms and WoodyGymnosperms or within GymnospermsIn flowering plants it has been recently suggested that lifeform (also measured as plant height) is correlated with ratesof molecular evolution with shrubs and trees showing lowerrates of molecular evolution than herbaceous plants (Gautet al 1992 Smith and Donoghue 2008 Lanfear et al 2013)Similarly in our study we found significantly lower l onaverage in angiosperm shrubstrees compared to angiospermherbs (see supplementary table S6 Supplementary MaterialOnline and fig 2) Both dS and dN showed large variationamong angiosperm herbs and differences in dS and dNamong angiosperm shrubstrees and herbs were not signifi-cant (see supplementary table S6 Supplementary MaterialOnline) When incorporating gymnosperms into the analysislife form on its own does not explain the differences in rates ofmolecular evolution among seed plants as we observe signif-icant differences in dN dS x and l also among woodyangiosperms and woody gymnosperms (fig 2)

Our analyses showed that Gnetophytes the only gymno-sperm division that contains non-woody genera (Welwitschiaand Ephedra) shows higher dN and l than what we observein other gymnosperm families (table 1) Therefore differencesin life form may drive differences in the rate of molecularevolution found in Gnetophytes However we still observethese differences when only including the woody genusGnetum (and excluding Ephedra and Welwitschia) suggestingother factors may influence the differences in rate of sequenceevolution (table 1) In fact Gnetophytes are different from therest of gymnosperms in a number of features that include butare not restricted to their morphology ecology and thepresence of angiosperm-like characteristics such as specialwater-conducting wood vessels and reproductive structuresorganized in compound strobili (Doyle and Donoghue 1986

Friedman 1998) For these reasons the position ofGnetophytes within gymnosperms is contentious (Doyle1998 Braukmann et al 2009 Cibrian-Jaramillo et al 2010Ran et al 2010 Xi et al 2013) In any case the lack of closeextant relatives to Gnetophytes suggests that this problemmight never really be solved satisfactorily regardless of theamount of sequence data we have access to Our resultssuggest that estimations of rates of sequence divergenceamong a few species from very diverged Gnetophyte taxa(such as Gnetum Welwitschia and Ephedra) may lead toinaccurate estimates of dN and dS For this reason we de-cided to exclude the more distant Welwitschia and Ephedrafrom all correlations with life history traits

Generation Times and Their Effect on LowSubstitution RatesThe differences in rates of molecular evolution between an-giosperms with different life forms have been presumed toreflect differences in generation times between herbaceousand woody plants (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013) The generation time hypothesissuggests that species with shorter generation times (eg her-baceous plants) accumulate more replication errors per unittime because they copy their genomes more often whichresults in higher mutation rates (Li et al 1996) Althoughgeneration time may not be a good indicator of the overallrate of genome replication because the number of mitotic celldivisions can vary substantially between generations andamong plant species it remains strongly associated withthe long-term rates of meiosis in plants (Petit and Hampe2006 Lanfear et al 2013) Therefore if a significant proportionof heritable mutations occur during meiosis plants with lon-ger generation times (eg gymnosperms) would have lowermutation and substitution rates per unit of time (Lanfearet al 2013) Alternatively it is been proposed that differencesin the rates of mitosis (mitotic cell divisions that occur in theapical meristem before gametogenesis) can account for theobserved differences in rates of molecular evolution amongplants of different height (Lanfear et al 2013) Lower absolutegrowth rates in long-lived woody perennials would translatein fewer cell divisions and less opportunities for DNA repli-cations errors than short-lived plants (Bromham et al 2015)This would result in a lower mutation rate per unit of time inlong-lived species Our results are consistent with this expec-tation as we found a strong negative correlation betweengeneration times and l (r frac14 ndash078 Plt 0001) (table 4)Long generation times large effective population sizes andlow recombination rates may also help explain the low syn-onymous polymorphism in gymnosperms (Savolainen andPyheuroajeuroarvi 2007 Jaramillo-Correa et al 2010)

In plants in the absence of a segregated germline gametesarise from the apical meristem late in development followingperiods of vegetative growth Because of this somatic muta-tions acquired during vegetative growth can be transmittedto the next generation (Watson et al 2016) Longer growthperiods are thought to result in a larger number of cell divi-sions increasing the opportunities for mutations to occur pergeneration (Schultz and Scofield 2009) Our results suggest

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1371

than on average for the species under study long-lived gym-nosperms accumulate four times more mutations per gener-ation than short-lived angiosperms (assuming an averagegeneration time of 20 years in gymnosperms and an averagel equal to 157 10 8 in gymnosperms and 392 10 9

in angiosperms) This is coincident with previous reports ofincreased number of somatic mutations in gymnospermsspecies (Cloutier et al 2003 OrsquoConnell and Ritland 2004)and other woody perennials (Ally et al 2010 Bobiwashet al 2013) However because the per-generation increasein mutation rate in woody perennials may be less thanpredicted from their differences in generation time (Petitand Hampe 2006) we would expect our reported valuesto be an over-estimation of the actual difference betweenplant seed clades Differences in generation times wouldpredict a mutation rategt100 times larger in mangrovetrees than in annuals when estimates were only 25 timeslarger (Klekowski and Godfrey 1989) Alternatively theremay be selection to reduce DNA replication-dependenterrors through minimizing the number of cell divisionsrequired during development as recently suggested inArabidopsis (Watson et al 2016)

Larger Genomes Are Correlated with Larger Omegasand Slower Rates of Sequence DivergenceIn our study we also explored the relationship between ge-nome size and rates of molecular evolution Gymnospermgenomes are characterized by their enormous genome sizes(20ndash40 Gb) unique genome silencing mechanisms and lowunequal recombination (Leitch and Leitch 2012 Nystedt et al2013) While angiosperm genomes are highly dynamic andhave efficient mechanisms to counteract the increase in DNAamount stemming from WGDs or transposable elements (egreplication or recombination-based errors generating indelsunequal recombination between sister chromosomes Groveland Wendel 2010) gymnosperm genomes seem to be lessdynamic and may have evolved their own epigenetic mech-anisms to silence retrotransposons (Leitch and Leitch 2012)Our results indicate that genome size is positively correlatedwith x (rfrac14 056 Plt 0001 at the species level and rfrac14 052Plt 005 at the family level) In addition genome size is neg-atively correlated with l (r frac14 ndash067 Plt 001) suggestingspecies with large genomes such as gymnosperms have slowerrates of molecular evolution than species with small genomesconfirming previous studies on angiosperms species(Bromham et al 2015) A possible explanation for this isthat if large genome sizes correlate with larger cells and areduction in growth then plants with larger genomes mighthave fewer replications and therefore less accumulated mu-tations per unit of time (Bromham et al 2015)

Our study aims to contribute to the unexplored field ofmolecular evolution in seed plants by investigating thedifferences in rates of molecular evolution among gym-nosperms and angiosperms using a phylogenetic frame-work The recent genomic and transcriptomic resourcesin gymnosperm species opened a window to understandthe evolution of this ancient and important plant cladeHowever gymnosperm resources are still limited in

comparison to their sister clade of flowering plants Wehope that with the development of new genomic re-sources studies in molecular evolution will include abroader taxonomic sampling that provides us with amore complete understanding of the evolution of seedplants

Materials and Methods

Generation of Phylogenetic MarkersTo develop our set of phylogenetic markers we performeddeep sequencing and assembly of whole transcriptomesfrom two conifer species (Pinus pinaster and Pinus sylvestris)(Li Z et al (in review) transcriptome data can be found athttpbioinformaticspsbugentbesupplementary_datazheliphylo) We integrated this data with whole-transcriptomedata from another 29 gymnosperms 34 angiosperms and 1outgroup (Physcomitrella patens) obtained from public data-bases PlantGDB oneKP TreeGenes and PLAZA v30 (Proostet al 2015) We used OrthoMCL (Li et al 2003) to buildorthologous gene families across the species To reduce thenumber of single-copy genes we selected only the gene fam-ilies that were conserved and had low copy number Later weused hidden Markov probabilistic models implemented inHMMER to build an HMM profile for each gene family basedon multiple sequence alignment (Eddy 2009) These HMMprofiles were then used to assign additional proteins to theexisting gene families Then we used HMMSEARCH to findthe best protein hit of an HMM profile in each species andselected the markers with only reciprocal best hits for phylo-genomic analysis In order to increase the spectrum of thephylogenetic markers we only selected single-copy markersthat were present in a majority of species

Multiple sequence alignments were carried out for eachgene family based on amino acid sequences using Musclev3831 (Edgar et al 2004) Trimal v14 (Capella-Gutierrezet al 2009) was used to back translate the amino acid align-ments into coding sequence alignments and to remove lowquality alignment regions and spurious sequences We de-fined a spurious sequence as a sequence having less than70 of the total alignment positions that were present in75 of 66 species Genes that lost their sequences in morethan 10 of the species after removing spurious sequenceswere not used for further analyses This way we obtained 42single-copy markers across 66 seed plant species that wereused for evolutionary and phylogenetic analyses Annotationsof all genes are reported in supplementary table S1Supplementary Material Online

Rates of Sequence DivergenceMultiple sequence alignments for each of the 42 single-copygenes were divided in two groups containing either angio-sperms (34 species) or gymnosperms (31 species) to allowfurther comparisons between major plant taxa Alignmentgaps and low quality regions were manually removed usingJalview version 281 (Waterhouse et al 2009) Also speciescontaining more than 30 gaps or ambiguous sites were notkept for further analyses

De La Torre et al doi101093molbevmsx069 MBE

1372

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 4: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

are included in supplementary table S5 SupplementaryMaterial Online

When comparing the proportion of conserved neutraland positively selected sites under the discrete model(M3) we observed that all taxa had a higher proportionof conserved followed by neutral and then by positivelyselected sites Conserved sites (xlt 1) in angiospermsranged from 55 to 77 whereas gymnosperm sitesranged from 66 to 76 differences between taxa werenot significant (Pfrac14 062) The proportion of neutral sites(xfrac14 1) ranged from 21 to 39 in angiosperms andfrom 21 to 28 in gymnosperms however the differ-ences between medians were not significant (Pfrac14 022)

(supplementary table S5 Supplementary MaterialOnline)

Inference of the Proportion of Sites under SelectionModels that allow for sites under positive selection such asM2a M3 and M8 all suggested the presence of a very smallnumber of weakly positively selected sites in angiosperm taxaModels M3 and M8 identified between 07ndash13 sites underweak diversifying selection (xfrac14 165 and xfrac14 138 respec-tively) in Brassicaceae However likelihood ratio test statisticsfor comparing M1a and M2a and M7 and M8 did not showsignificant results (table 3) In addition the NEB analyses didnot identify any sites under positive selection For Malvaceae

Table 2 Parameter Estimates under Models of Variable Omega (x) among Branches from the Gymnosperm and Angiosperm Lineages Based on aConcatenated Alignment of 31737 Nucleotide Sites in 61 Species

Hypothesis Model No of Parameters Background Foreground Parameter Estimates ln L AIC

H0 M0 1 x0frac14xG frac14 xA xfrac14 011356 ndash62177774 124355748H1 M2 2 x0frac14xG xA x0frac14xG frac14 01238 xA frac14 00989 ndash6216446499 12432933H2 M2 2 x0frac14xA xG x0frac14xA frac14 00935 xG frac14 02603 ndash6199280209 1239860042H3 M2 3 x0 xA xG x0frac14 00857 xA frac14 01024 xG frac14 02631 ndash619863362 1239732724H4 M1 120 x ranges between 005 to 076 ndash6175988386 1235437677

NOTEmdashWe tested the following hypotheses H0 Homogeneous selective pressure in both clades (x0frac14 xG frac14 xA) H1 Selective Pressure in the Angiosperm clade (x0frac14 xGxA) H2 selective pressure in the gymnosperm clade (x0frac14 xA xG) H3 long-term shift in selective pressure resulting in both clades having different omegas (x0 xG xA)and H4 all terminal branches in both clades have different omegas For hypotheses H1 to H3 xG and xA identify estimated omegas on stem branches of the gymnosperm andangiosperm lineages according to phylogenetic tree in figure 1 x0 represents the estimated omega in unselected branches (background) Parameter estimates of model M1(hypothesis H4) are detailed in supplementary table S7 Supplementary Material Online

02

Prumnopitys andina

Pinus massoniana

Larix kaempferi

Podocarpus macrophyllus

Oryza sativa ssp japonica

Sorghum bicolor

Sequoia sempervirens

Arabidopsis lyrata

Citrullus lanatus

Lotus japonicus

Beta vulgaris

Ginkgo biloba

Gnetum gnemon

Hordeum vulgare

Malus domestica

Pinus sylvestris

Fragaria vesca

Cryptomeria japonica

Brassica rapa

Ricinus communis

Physcomitrella patens

Populus trichocarpa

Citrus sinensis

Eucalyptus grandis

Brachypodium distachyon

Amborella trichopoda

Welwitschia mirabilis

Juniperus scopulorum

Abies alba

Zea mays

Picea glauca

Theobroma cacaoGossypium raimondii

Medicago truncatula

Manihot esculenta

Pinus pinaster

Cunninghamia lanceolata

Gnetum montanum

Pseudotsuga menziesii

Ephedra sinica

Vitis vinifera

Thellungiella parvula

Taxus baccata

Pinus taeda

Musa acuminata

Cephalotaxus harringtonia

Glycine max

Oryza sativa ssp indica

Zamia vazqueziiCycas rumphii

Cycas micholitzii

Setaria italica

Cedrus libani

Pinus contortaPinus banksiana

Picea abies

Carica papaya

Arabidopsis thaliana

Cucumis melo

Prunus persica

Pinus lambertiana

Capsella rubella

Picea sitchensis

Solanum tuberosumSolanum lycopersicum

Sciadopitys verticillata

100

100

100

100

100

100

100

100

9 5

9 0

100

100

100

100

100

100

100

100

100100

100

100

100

100

100

100

100

100

9 0

100

100

100

100

8 4

100

100

100

100

100

100

100

100

8 3

100

8 8

100

100

100

100

100

100

8 8

9 8

100

100

100

9 5

100

100

100

100

100

100

FIG 1 Phylogenetic tree obtained from a concatenated alignment of 42 single-copy genes and 66 species distributed between two main seed plantlineages (angiosperms in green gymnosperms in blue and outgroup in black) inferred by RAxML Stem branches for angiosperms and gymno-sperms were used for model testing using the branch models implemented in codeml Results of the model testing can be found in table 2

De La Torre et al doi101093molbevmsx069 MBE

1366

four sites under positive selection were identified by the BEBanalysis (Pgt 95) and the likelihood ratio test for compar-ison between M7ndashM8 was significant (2Dlfrac14 33063Plt 0001 dffrac14 2) However none of the other tested modelsidentified sites under positive selection Weakly to moderatediversifying selection (xfrac14 189 and xfrac14 275) was found in avery small proportion of sites (11 and 05) in Fabaceaebased on the results of M3 and M8 In addition the NEBanalysis identified two sites under positive selection usingM3 and the likelihood ratio test for comparison betweenM7ndashM8 was significant (2Dlfrac14 2274 Plt 0001 dffrac14 2) InRosaceae M3 and M8 identified 02 of sites under strongdiversifying selection In addition the BEB analysis identifiedtwo sites under positive selection using M3 and the likelihoodratio test for comparison between M7ndashM8 was significant(2Dlfrac14 642 Plt 005 dffrac14 2) Finally weakly diversifyingselection (xfrac14 153 and xfrac14 185) was found in 07 and13 sites in Poaceae Four sites were identified by the BEBanalyses in both M3 and M8 and the likelihood ratio testfor M7ndashM8 was significant (2Dlfrac14 15428 Plt 0001 dffrac14 2)(table 3 see supplementary table S5 Supplementary MaterialOnline)

In contrast to angiosperm taxa evidence for moderate tostrong diversifying selection and a higher number of sitesunder positive selection was found in gymnosperm taxa InPinus M2a M3 and M8 identified 163 2313 and 2221of sites under moderate to strong diversifying selection(xfrac14 32 xfrac14 273 and xfrac14 28 respectively) This taxo-nomic family had the highest number of sites under positiveselection (240 245 and 1070) evidenced by the results of theBEB analyses under M2a and M8 and the NEB analysis underM3 Likelihood ratio tests were significant for both the M1andashM2a (2Dlfrac14 51146 Plt 0001 dffrac14 2) and the M7ndashM8(2Dlfrac14 54937 Plt 0001 dffrac14 2) comparisons Similarly inPicea M2a identified 2148 of weakly selected sites(xfrac14 119) and 325 of sites under strong positive selection(xfrac14 329) Models M3 and M8 identified 56 of sites undermoderate positive selection (xfrac14 28) The number of sitesunder positive selection was highly variable among models 1301 and 885 based on the BEB analyses under M2a and M8and by the NEB results based on M3 The LRT statistics weresignificant for both the M1andashM2a and M7ndashM8 comparisons(table 3) A small proportion (12ndash14) of sites under

moderate positive selection (xfrac14 221 and xfrac14 232) wereevidenced by the results of M3 and M8 in CuppresaceaeThree sites were found to be under positive selection basedon the NEB analysis of M3 Finally in Cycads M3 identified46 of weakly selected sites (xfrac14 171) and M8 identified19 of sited under moderate selection (xfrac14 238) The LRTstatistics for M7ndashM8 comparisons were significant for bothCuppressaceae and Cycads (table 3 see supplementary tableS5 Supplementary Material Online)

Distribution of Selection CoefficientsParameter estimates a and b from the gamma distribution ofx across sites in each taxonomic family were used to estimatethe distribution of the selection coefficients of new mutations(Yang et al 2000 Nielsen and Yang 2003) As expected wefound that a and b varied across taxonomic families resultingin varying estimates of S and f(S) For all taxonomic familiesthe distributions of selection coefficients for new mutationshad a peak at S frac14 ndash5 suggesting that the majority of sitesare under purifying selection In all angiosperms exceptBrassicaceae selection coefficients are mostly distributedwith S frac14 [ndash10 0] whereas in gymnosperms the majorityof selection coefficients are distributed within a wider rangewith S frac14 [ndash20 10] Gymnosperm families had a greater pro-portion of sites under strong purifying selection (S lt ndash15)than angiosperms (t frac14 ndash357 Plt 005) whereas angiospermfamilies had a greater proportion of neutral and mildly dele-terious (S 0 and S gt ndash15) sites than gymnosperms(tfrac14 4119 Plt 005) (fig 2) The proportion of advantageousmutations (positive selection coefficients) also varied amongfamilies with gymnosperm taxa having on average more sitesunder positive selection than angiosperm taxa (t frac14 ndash574Plt 0001) (fig 2)

Correlations with Life-History TraitsIn order to explain the variation we observe in dN dS and xbetween the major plant clades we tested whether thesevariables showed any associations with life history traitsOur results suggest that generation time (time to reach re-productive maturity measured in years) was strongly nega-tively correlated with l (family level r frac14 ndash078 Plt 0001)(table 4) Also genome size (amount of DNA contained in ahaploid nucleus and measured in picograms (1C)) wasstrongly negatively correlated with l (family level r frac14ndash067) but positively correlated with x (species levelrfrac14 056 Plt 0001 family level rfrac14 052 Plt 005) (table 4)Finally species richness (number of extant species in eachtaxonomic family) was significantly correlated with dS(rfrac14 07 Plt 0001) and dN (rfrac14 048 Plt 005) at the familylevel (not tested at species level)

We found significant differences (P lt 005) in l amonggroups with different life forms when using six groups(angiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes) Differences in x and l were alsosignificant among groups when comparing four groups (an-giosperms herbs angiosperms treesshrubs gymnospermsand Gnetophytes) and three groups (angiosperms herbs

Table 3 Results of the Likelihood Ratio Tests for Evidence of PositiveSelection in Each of the Taxonomic Families Studied

Taxa M1andashM2a (2DL) P Value M7ndashM8 (2DL) P Value

Brassicaceae 0 NS 2577926 02766Malvaceae 0 NS 33063102 000000007Fabaceae 0 NS 22744606 000001151Rosaceae 0 NS 6429714 004016Poaceae 0 NS 15428744 00004463Pinus 5114601 0 549377428 0Picea 116785146 0 116646654 0Cupressaceae 0 NS 1511157 0000523Cycadales 0 NS 6535688 003808

NOTEmdashComparisons between site models M1a and M2a and M7 and M8 for eachtaxonomic family were evaluated

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1367

angiosperms shrubstrees and gymnosperms) (see supplementary table S6 Supplementary Material Online) Numberof synonymous (dS) and non-synonymous substitutions (dN)were not significantly different in any of the groups whenassessed by P lt 005 Boxplots showing differences amonggroups when groups are divided into three (angiospermsherbs angiosperms treesshrubs and gymnosperms) are plot-ted in figure 2 In the Gnetophyte group only Gnetum specieswere included and not Welwitschia and Ephedra (seeDiscussion part)

Discussion

Slower Rates of Molecular Evolution in GymnospermsOur study clearly suggests slower rates of molecular evolutionin gymnosperm than in angiosperm protein-coding genes

evidenced by a lower number of synonymous substitutions(dS) and lower rates of silent-site divergence (l)

Based on the evolutionary and phylogenetic analyses of31968 nucleotide sites from a wide taxonomic sampling ourresults suggest l values of 41ndash14 10 10 in gymnospermtaxa and 4ndash65 10 9 in angiosperm taxa (table 1) Takentogether this corresponds to a 7-fold average variationamong gymnosperms and angiosperms (771 10 10 vs535 10 9 synonymous substitutionssiteyear respec-tively) Our results lay within the ranges obtained by previousstudies comparing a few Pinaceae species suggesting the var-iation found in this taxonomic family is a relatively goodrepresentation of the variation found in all the gymnospermtaxa with the exception of Gnetophytes For exampleWillyard et al (2007) while comparing different speciesof Pinus found absolute rates of silent-site divergence of 7ndash131 10 10 Similarly Chen et al (2012) estimated pairwise

A

B

C

FIG 2 (A) Boxplots showing differences in number of synonymous substitutions (dS) nonsynonymous substitutions (dN) absolute rate of silent-site divergence (l) and substitution rate ratio (x) among life forms defined as angiosperms herbs (green) angiosperms shrubtrees (light green)and gymnosperms (blue) Results of the statistical tests of comparisons among groups can be found in supplementary table S6 SupplementaryMaterial Online (B) Proportion of sites under negative selection when all sites are evaluated (top left) when Sltndash15 (top right) and when Sgtndash15(bottom left) and proportion of sites under positive selection (bottom right) Green boxes represent angiosperm species and blue boxesgymnosperms (C) Distribution of the selection coefficient (f(S)) of new mutations It was calculated for all species in each taxonomic familystudied assuming a gamma distribution of x among sites Parameter estimates a and b were obtained from running the site model 5 in codeml(PAML) using a concatenated alignment of 29000ndash31000 sites per taxonomic family Selection coefficients were then obtained replacing a and bin equation 1 Green dotted vertical lines are used to show the distribution of f(S) when S 0 and xfrac14 1 (neutrality) Taxonomic familiescontaining less than 5 species were not analyzed due to difficulties in constructing the phylogenetic trees in RaxML

De La Torre et al doi101093molbevmsx069 MBE

1368

sequence divergence between three Picea and one Taxus spe-cies and obtained l ranging from 55ndash124 10 10 FinallyBuschiazzo et al (2012) obtained a 25-fold difference whencomparing substitution rates between orthologs of Piceasitchensis and Pinus taeda with Arabidopsis thaliana and a4-fold difference when comparing the same Picea-Pinusorthologs with Populus trichocarpa

The pattern of slower rates of molecular evolution is con-sistent when grouping species according to their life formswith gymnosperms showing lower dS and l than angio-sperms herbs and angiosperms shrubs and trees This levelof evolutionary conservation (which probably help explainthe high levels of macrosynteny previously observed insome Pinaceae species) is surprising considering the ancientnature of the plant clade which appeared on Earth muchearlier than flowering plants

Higher Substitution Rate Ratio (x) in Gymnospermsthan in AngiospermsEqually surprising are the high substitution rate ratios (x)found in gymnosperms despite their slow rates of evolutionParameter estimates under models of variable x betweenstem branches of the angiosperm and gymnosperms lineagesbased on a concatenated alignment of 31737 nucleotide sitessupported a model of different and higher x in gymnospermsthan in angiosperms (table 2) In addition substitutionparameters based on seven different site models foundangiosperms rate ratios varied from 009ndash013 whereas gym-nosperms ratios varied from 017ndash067 Overall these esti-mates suggest a significantly higher chance of fixation ofnonsynonymous mutations than synonymous mutationsin gymnosperms than in angiosperm taxa (supplementarytable S5 Supplementary Material Online)

High x ratios could result from a high dN over a low dShowever this does not seem to be the case for gymnospermsInstead higher x ratios result from a low to moderate dNover a very low dS in gymnosperm genes Our results are incontrast with previous smaller scale studies either showingsignificantly lower dS and dN as a cause of a 4-fold higher x in

gymnosperms than in angiosperms (Buschiazzo et al 2012)or the ones that did not find significant differences in x ratiosamong gymnosperms and angiosperms (Chen et al 2012)These studies were limited by a small number of species thatwould not allow for evolutionary analyses within a phyloge-netic context When using a small number of species pairwiseestimates of dN dS and x will strongly depend on theselection of species In addition the evaluation of individualsites (site models) and the branch models have more powerto test variable x ratios than the pairwise sequence compar-isons in which x rates are averaged over all amino acidsites (Yang 2002) In the case of the Buschiazzorsquos study highlydiverged sets of species were used for comparisons(P sitchensisndashPtaeda and AthalianandashPtrichocarpa) High se-quence divergence between species is often associated withdifficulties in the alignment different codon usage biases andnucleotide compositions In addition saturation of substitu-tions may be particularly problematic when using pairwisemethods in comparison with branch methods

Positive Selection and Adaptive Evolution inGymnospermsUnder the Nearly Neutral Theory the strength and efficacy ofselection depends on the long-term effective population size(Ne) Theoretical approaches predict that as population sizesincrease the power of natural selection increases faster thanthe influx of new mutations (Akashi et al 2012) Naturalselection thus becomes more effective in removing deleteri-ous mutations and in fixing advantageous mutations whichresults in a lower substitution rate of deleterious mutationsand also a higher substitution rate of advantageous mutationsin large populations (Lanfear et al 2014)

Current estimates suggest that species with large popula-tion sizes such as Drosophila mice bacteria and rabbits showsigns of adaptive evolution (Bierne and Eyre-Walker 2004Carneiro et al 2012 Phifer-Rixey et al 2012) In contrast littleevidence of adaptive amino acid substitutions has been foundin flowering plants (mostly with modest Nelt 100000)(Gossmann et al 2010 Hough et al 2013) Few exceptions

Table 4 Correlation among Number of Synonymous (dS) and Nonsynonymous Substitutions (dN) Nonsynonymoussynonymous Rate Ratio(x) and the Absolute Rate of Silent-Site Divergence (l) with Life-History Traits Using the Phylogenetically Independent Contrast Method (PIC)

Species Correlations Taxonomic Family Correlations

Parameters Generation Time (years) Genome Size (1C) Generation Time (years) Genome Size (1C) Species Richness

dS r ndash0172 ndash0135 ndash0153 ndash0311 0703P-value 0223 0341 0558 0225 0002df 50 50 15 15 15

dN r 0005 0041 0189 ndash0026 0485P-value 0973 0771 0467 0922 0048df 50 50 15 15 15

M r ndash0183 ndash0037 ndash0781 ndash0670 0057P-value 0194 0797 0000 0003 0829df 50 50 15 15 15

X r 0212 0567 0482 0527 ndash0224P-value 0132 0000 0050 0030 0387df 50 50 15 15 15

NOTEmdashSubstitution rates and life-history traits were estimated by species pairs (species correlations table) and averaged within taxonomic family (taxonomic familycorrelations table) We reject a null hypothesis of correlation equal to zero when Plt 005 (shaded areas)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1369

have been found in angiosperms with Negt100000 such asCapsella grandiflora (Slotte et al 2010 Williamson et al 2014)Helianthus spp (Strasburg et al 2011) and Populus spp(Ingvarsson 2010 Wang et al 2016a 2016b) In gymnospermsprevious reports of Ne range from 120000ndash560000 for Piceaand Pinus species (Brown et al 2004 Bouille and Bosuquet2005 Syring et al 2007)

In our study we found a higher proportion of sites withpositive selection coefficients and evidence of stronger diver-sifying selection in gymnosperms than in angiosperms(Pfrac14 0033) Models that allow for sites under positive selec-tion such as M2a M3 and M8 all suggested the presence of avery small number (02ndash13) of weakly positively selectedsites (xfrac14 153ndash189) in angiosperm taxa in contrast to ahigher number (2313) of sites with moderate to strongdiversifying selection (xfrac14 221ndash329) in gymnosperm taxa Inaddition results of the BEB analysis using site model 2a iden-tified 241 (0759) sites under selection in gymnosperms andnone in angiosperms The same analysis using site model M8identified 546 (172) sites under selection in gymnospermsand 10 (003) in angiosperms Finally the NEB analysis usingsite model M3 identified 1960 (6175) sites in gymnospermsand only six (0018) in angiosperms Likelihood ratio tests forthe presence of diversifying selection were significant for Piceaand Pinus (M1andashM2a comparison) and for all taxa with theexception of Brassicaceae under M7 and M8 comparisonsUnder M3 Picea and Pinus showed the highest proportionsof sites under selection whereas Rosaceae showed the lowestproportion Our results are consistent with previous esti-mates of low levels of adaptive evolution in eleven angio-sperm species (Gossmann et al 2010) and with estimatesin Pinus contorta and Pinus taeda in which the proportionsof sites fixed by positive selection were 13ndash52 and 22ndash37respectively (Eckert et al 2013 Hodgins et al 2016)

There is a rich literature reporting the influence of positiveselection in Pinaceae species which in absence of reports inother gymnosperm families has frequently been used as anexample of all gymnosperm taxa The recent sequencingof some Pinaceae genomes have paved the way to test theinfluence of positive selection at the genome level (De LaTorre et al 2015a Hodgins et al 2016) which have confirmedwhat was previously found at the genetic (small number ofgenes) and population levels (Eckert et al 2010 Buschiazzoet al 2012 Pavy et al 2012 De La Torre et al 2015b)Furthermore studies on quantitative trait variation suggestnatural selection is highly efficient in producing a relativelyfast evolutionary response in contrast to the slow evolution-ary rate (Savolainen and Pyheuroajeuroarvi 2007) Examples of thisinclude the evolution of cold adapted genotypes in severallong-lived tree species in the northern hemisphere after theLast Glaciation (Mimura and Aitken 2007 Wachowiak et al2009 Holliday et al 2010 Kujala and Savolainen 2012 De LaTorre et al 2014b) In addition because of their long gener-ation times and low mutation rates gymnosperm genomesmay retain the consequences of demographic events for along time suggesting that even species that have contractedtheir ranges and currently have small distributions may re-semble those with long-term large effective population sizes

Although our results suggest an important role of positiveselection in the evolution of gymnosperm taxa we do notbelieve that Pinaceae is a good representative of other gym-nosperm taxa Pinus and Picea have indeed significantly highernumber of sites under diversifying selection than other gym-nosperm taxa Recent evolutionary radiations may have re-sulted in lower dS and higher x in comparisons within thesetaxonomic families (Palme et al 2009)

We expect our estimation of the proportion of sites underselection to be conservative because of the use of single copygenes Single copy genes are usually ubiquitously expressedgenes that encode for basic cellular functions that are pre-served across taxa For this reason they are likely to be understronger purifying selection due to functional and structuralconstraints than paralogous genes in gene families in bothangiosperms (De Smet et al 2013 Li et al 2016) andgymnosperms (De La Torre et al 2015a) These genes alsoseem to experience less frequent positive selection than paral-ogous genes which tend to be more narrowly expressed(Larracuente et al 2007) For this reason we expect a lowerproportion of conserved sites and a higher proportion ofselected sites in paralogous genes (which account for themajority of genes in plant genomes) In fact it has been sug-gested that many of the genes under diversifying selection inconifers belong to large multi-copy gene families such asLeucine Rich Repeats Cytochrome P450 among others(Pavy et al 2012 Neale et al 2014 De La Torre et al2015a) In addition some sites under positive selection maygo undetected because structural constraints may inducepurifying selection to push x to values lower than 1 in highlyconserved genes (Echave et al 2016) However the introduc-tion of paralogous genes in a phylogeny has confoundingeffects for both the phylogenetic reconstruction and alsofor the estimation of sequence divergence among taxa there-fore we believe than the use of single-copy genes is probablythe best method for comparisons among diverse taxa In ourefforts to identify a highly confident set of very well alignedorthologs across a broad taxonomic sampling the number ofgenes used in this study was significantly reduced raising con-cerns about the representativeness of these genes in agenome-wide context However comparisons with previousstudies suggest our estimates of rates of divergence are quitecomparable with estimates of rates with much larger data setsin specific taxa (Gossmann et al 2010 Eckert et al 2013Hodgins et al 2016)

Stronger and More Efficient Purifying Selection inGymnospermsWhen estimating the distribution of selection coefficientswe found that the majority of sites in both angiospermsand gymnosperms were highly conserved suggesting animportant role of purifying selection in the evolution of plantgenomes Previous studies in angiosperm species have sug-gested that the largest proportion of mutations is stronglydeleterious and that the number of slightly and mildly dele-terious mutations seems to be relatively conserved amongspecies (Gossmann et al 2010) In contrast with these studiesour study has found differences in the proportion of sites

De La Torre et al doi101093molbevmsx069 MBE

1370

under mildly and strong purifying selection Gymnospermfamilies had a greater proportion of sites under strong puri-fying selection (S lt ndash15) than angiosperms (t frac14 ndash357Plt 005) whereas angiosperm families had a greater propor-tion of neutral and mildly deleterious (S 0 and S gtndash15)sites than gymnosperms (tfrac14 4119 Plt 005) (fig 2)

This could be explained by the expectation that specieswith larger Ne generally experience stronger purifying selec-tion (Slotte et al 2010 Gossmann et al 2010 Hough et al2013 Williamson et al 2014) Slightly deleterious mutationswill be less likely to segregate at higher frequencies and fixin large populations (Akashi et al 2012) Such mutations arealso selected against by the partial low degree of selfing andhigh early inbreeding depression in conifers (Williams andSavolainen 1996 Remington and OrsquoMalley 2000) Larger pop-ulation sizes may also explain the lower number of neutralsites in gymnosperms than in angiosperms found in thisstudy as it has been shown that the number of effectivelyneutral mutations is negatively correlated with Ne in manyspecies (Piganeau and Eyre-Walker 2009 Gossmann et al2010)

Life Form Does Not Explain Differences in MolecularEvolution among Woody Angiosperms and WoodyGymnosperms or within GymnospermsIn flowering plants it has been recently suggested that lifeform (also measured as plant height) is correlated with ratesof molecular evolution with shrubs and trees showing lowerrates of molecular evolution than herbaceous plants (Gautet al 1992 Smith and Donoghue 2008 Lanfear et al 2013)Similarly in our study we found significantly lower l onaverage in angiosperm shrubstrees compared to angiospermherbs (see supplementary table S6 Supplementary MaterialOnline and fig 2) Both dS and dN showed large variationamong angiosperm herbs and differences in dS and dNamong angiosperm shrubstrees and herbs were not signifi-cant (see supplementary table S6 Supplementary MaterialOnline) When incorporating gymnosperms into the analysislife form on its own does not explain the differences in rates ofmolecular evolution among seed plants as we observe signif-icant differences in dN dS x and l also among woodyangiosperms and woody gymnosperms (fig 2)

Our analyses showed that Gnetophytes the only gymno-sperm division that contains non-woody genera (Welwitschiaand Ephedra) shows higher dN and l than what we observein other gymnosperm families (table 1) Therefore differencesin life form may drive differences in the rate of molecularevolution found in Gnetophytes However we still observethese differences when only including the woody genusGnetum (and excluding Ephedra and Welwitschia) suggestingother factors may influence the differences in rate of sequenceevolution (table 1) In fact Gnetophytes are different from therest of gymnosperms in a number of features that include butare not restricted to their morphology ecology and thepresence of angiosperm-like characteristics such as specialwater-conducting wood vessels and reproductive structuresorganized in compound strobili (Doyle and Donoghue 1986

Friedman 1998) For these reasons the position ofGnetophytes within gymnosperms is contentious (Doyle1998 Braukmann et al 2009 Cibrian-Jaramillo et al 2010Ran et al 2010 Xi et al 2013) In any case the lack of closeextant relatives to Gnetophytes suggests that this problemmight never really be solved satisfactorily regardless of theamount of sequence data we have access to Our resultssuggest that estimations of rates of sequence divergenceamong a few species from very diverged Gnetophyte taxa(such as Gnetum Welwitschia and Ephedra) may lead toinaccurate estimates of dN and dS For this reason we de-cided to exclude the more distant Welwitschia and Ephedrafrom all correlations with life history traits

Generation Times and Their Effect on LowSubstitution RatesThe differences in rates of molecular evolution between an-giosperms with different life forms have been presumed toreflect differences in generation times between herbaceousand woody plants (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013) The generation time hypothesissuggests that species with shorter generation times (eg her-baceous plants) accumulate more replication errors per unittime because they copy their genomes more often whichresults in higher mutation rates (Li et al 1996) Althoughgeneration time may not be a good indicator of the overallrate of genome replication because the number of mitotic celldivisions can vary substantially between generations andamong plant species it remains strongly associated withthe long-term rates of meiosis in plants (Petit and Hampe2006 Lanfear et al 2013) Therefore if a significant proportionof heritable mutations occur during meiosis plants with lon-ger generation times (eg gymnosperms) would have lowermutation and substitution rates per unit of time (Lanfearet al 2013) Alternatively it is been proposed that differencesin the rates of mitosis (mitotic cell divisions that occur in theapical meristem before gametogenesis) can account for theobserved differences in rates of molecular evolution amongplants of different height (Lanfear et al 2013) Lower absolutegrowth rates in long-lived woody perennials would translatein fewer cell divisions and less opportunities for DNA repli-cations errors than short-lived plants (Bromham et al 2015)This would result in a lower mutation rate per unit of time inlong-lived species Our results are consistent with this expec-tation as we found a strong negative correlation betweengeneration times and l (r frac14 ndash078 Plt 0001) (table 4)Long generation times large effective population sizes andlow recombination rates may also help explain the low syn-onymous polymorphism in gymnosperms (Savolainen andPyheuroajeuroarvi 2007 Jaramillo-Correa et al 2010)

In plants in the absence of a segregated germline gametesarise from the apical meristem late in development followingperiods of vegetative growth Because of this somatic muta-tions acquired during vegetative growth can be transmittedto the next generation (Watson et al 2016) Longer growthperiods are thought to result in a larger number of cell divi-sions increasing the opportunities for mutations to occur pergeneration (Schultz and Scofield 2009) Our results suggest

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1371

than on average for the species under study long-lived gym-nosperms accumulate four times more mutations per gener-ation than short-lived angiosperms (assuming an averagegeneration time of 20 years in gymnosperms and an averagel equal to 157 10 8 in gymnosperms and 392 10 9

in angiosperms) This is coincident with previous reports ofincreased number of somatic mutations in gymnospermsspecies (Cloutier et al 2003 OrsquoConnell and Ritland 2004)and other woody perennials (Ally et al 2010 Bobiwashet al 2013) However because the per-generation increasein mutation rate in woody perennials may be less thanpredicted from their differences in generation time (Petitand Hampe 2006) we would expect our reported valuesto be an over-estimation of the actual difference betweenplant seed clades Differences in generation times wouldpredict a mutation rategt100 times larger in mangrovetrees than in annuals when estimates were only 25 timeslarger (Klekowski and Godfrey 1989) Alternatively theremay be selection to reduce DNA replication-dependenterrors through minimizing the number of cell divisionsrequired during development as recently suggested inArabidopsis (Watson et al 2016)

Larger Genomes Are Correlated with Larger Omegasand Slower Rates of Sequence DivergenceIn our study we also explored the relationship between ge-nome size and rates of molecular evolution Gymnospermgenomes are characterized by their enormous genome sizes(20ndash40 Gb) unique genome silencing mechanisms and lowunequal recombination (Leitch and Leitch 2012 Nystedt et al2013) While angiosperm genomes are highly dynamic andhave efficient mechanisms to counteract the increase in DNAamount stemming from WGDs or transposable elements (egreplication or recombination-based errors generating indelsunequal recombination between sister chromosomes Groveland Wendel 2010) gymnosperm genomes seem to be lessdynamic and may have evolved their own epigenetic mech-anisms to silence retrotransposons (Leitch and Leitch 2012)Our results indicate that genome size is positively correlatedwith x (rfrac14 056 Plt 0001 at the species level and rfrac14 052Plt 005 at the family level) In addition genome size is neg-atively correlated with l (r frac14 ndash067 Plt 001) suggestingspecies with large genomes such as gymnosperms have slowerrates of molecular evolution than species with small genomesconfirming previous studies on angiosperms species(Bromham et al 2015) A possible explanation for this isthat if large genome sizes correlate with larger cells and areduction in growth then plants with larger genomes mighthave fewer replications and therefore less accumulated mu-tations per unit of time (Bromham et al 2015)

Our study aims to contribute to the unexplored field ofmolecular evolution in seed plants by investigating thedifferences in rates of molecular evolution among gym-nosperms and angiosperms using a phylogenetic frame-work The recent genomic and transcriptomic resourcesin gymnosperm species opened a window to understandthe evolution of this ancient and important plant cladeHowever gymnosperm resources are still limited in

comparison to their sister clade of flowering plants Wehope that with the development of new genomic re-sources studies in molecular evolution will include abroader taxonomic sampling that provides us with amore complete understanding of the evolution of seedplants

Materials and Methods

Generation of Phylogenetic MarkersTo develop our set of phylogenetic markers we performeddeep sequencing and assembly of whole transcriptomesfrom two conifer species (Pinus pinaster and Pinus sylvestris)(Li Z et al (in review) transcriptome data can be found athttpbioinformaticspsbugentbesupplementary_datazheliphylo) We integrated this data with whole-transcriptomedata from another 29 gymnosperms 34 angiosperms and 1outgroup (Physcomitrella patens) obtained from public data-bases PlantGDB oneKP TreeGenes and PLAZA v30 (Proostet al 2015) We used OrthoMCL (Li et al 2003) to buildorthologous gene families across the species To reduce thenumber of single-copy genes we selected only the gene fam-ilies that were conserved and had low copy number Later weused hidden Markov probabilistic models implemented inHMMER to build an HMM profile for each gene family basedon multiple sequence alignment (Eddy 2009) These HMMprofiles were then used to assign additional proteins to theexisting gene families Then we used HMMSEARCH to findthe best protein hit of an HMM profile in each species andselected the markers with only reciprocal best hits for phylo-genomic analysis In order to increase the spectrum of thephylogenetic markers we only selected single-copy markersthat were present in a majority of species

Multiple sequence alignments were carried out for eachgene family based on amino acid sequences using Musclev3831 (Edgar et al 2004) Trimal v14 (Capella-Gutierrezet al 2009) was used to back translate the amino acid align-ments into coding sequence alignments and to remove lowquality alignment regions and spurious sequences We de-fined a spurious sequence as a sequence having less than70 of the total alignment positions that were present in75 of 66 species Genes that lost their sequences in morethan 10 of the species after removing spurious sequenceswere not used for further analyses This way we obtained 42single-copy markers across 66 seed plant species that wereused for evolutionary and phylogenetic analyses Annotationsof all genes are reported in supplementary table S1Supplementary Material Online

Rates of Sequence DivergenceMultiple sequence alignments for each of the 42 single-copygenes were divided in two groups containing either angio-sperms (34 species) or gymnosperms (31 species) to allowfurther comparisons between major plant taxa Alignmentgaps and low quality regions were manually removed usingJalview version 281 (Waterhouse et al 2009) Also speciescontaining more than 30 gaps or ambiguous sites were notkept for further analyses

De La Torre et al doi101093molbevmsx069 MBE

1372

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 5: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

four sites under positive selection were identified by the BEBanalysis (Pgt 95) and the likelihood ratio test for compar-ison between M7ndashM8 was significant (2Dlfrac14 33063Plt 0001 dffrac14 2) However none of the other tested modelsidentified sites under positive selection Weakly to moderatediversifying selection (xfrac14 189 and xfrac14 275) was found in avery small proportion of sites (11 and 05) in Fabaceaebased on the results of M3 and M8 In addition the NEBanalysis identified two sites under positive selection usingM3 and the likelihood ratio test for comparison betweenM7ndashM8 was significant (2Dlfrac14 2274 Plt 0001 dffrac14 2) InRosaceae M3 and M8 identified 02 of sites under strongdiversifying selection In addition the BEB analysis identifiedtwo sites under positive selection using M3 and the likelihoodratio test for comparison between M7ndashM8 was significant(2Dlfrac14 642 Plt 005 dffrac14 2) Finally weakly diversifyingselection (xfrac14 153 and xfrac14 185) was found in 07 and13 sites in Poaceae Four sites were identified by the BEBanalyses in both M3 and M8 and the likelihood ratio testfor M7ndashM8 was significant (2Dlfrac14 15428 Plt 0001 dffrac14 2)(table 3 see supplementary table S5 Supplementary MaterialOnline)

In contrast to angiosperm taxa evidence for moderate tostrong diversifying selection and a higher number of sitesunder positive selection was found in gymnosperm taxa InPinus M2a M3 and M8 identified 163 2313 and 2221of sites under moderate to strong diversifying selection(xfrac14 32 xfrac14 273 and xfrac14 28 respectively) This taxo-nomic family had the highest number of sites under positiveselection (240 245 and 1070) evidenced by the results of theBEB analyses under M2a and M8 and the NEB analysis underM3 Likelihood ratio tests were significant for both the M1andashM2a (2Dlfrac14 51146 Plt 0001 dffrac14 2) and the M7ndashM8(2Dlfrac14 54937 Plt 0001 dffrac14 2) comparisons Similarly inPicea M2a identified 2148 of weakly selected sites(xfrac14 119) and 325 of sites under strong positive selection(xfrac14 329) Models M3 and M8 identified 56 of sites undermoderate positive selection (xfrac14 28) The number of sitesunder positive selection was highly variable among models 1301 and 885 based on the BEB analyses under M2a and M8and by the NEB results based on M3 The LRT statistics weresignificant for both the M1andashM2a and M7ndashM8 comparisons(table 3) A small proportion (12ndash14) of sites under

moderate positive selection (xfrac14 221 and xfrac14 232) wereevidenced by the results of M3 and M8 in CuppresaceaeThree sites were found to be under positive selection basedon the NEB analysis of M3 Finally in Cycads M3 identified46 of weakly selected sites (xfrac14 171) and M8 identified19 of sited under moderate selection (xfrac14 238) The LRTstatistics for M7ndashM8 comparisons were significant for bothCuppressaceae and Cycads (table 3 see supplementary tableS5 Supplementary Material Online)

Distribution of Selection CoefficientsParameter estimates a and b from the gamma distribution ofx across sites in each taxonomic family were used to estimatethe distribution of the selection coefficients of new mutations(Yang et al 2000 Nielsen and Yang 2003) As expected wefound that a and b varied across taxonomic families resultingin varying estimates of S and f(S) For all taxonomic familiesthe distributions of selection coefficients for new mutationshad a peak at S frac14 ndash5 suggesting that the majority of sitesare under purifying selection In all angiosperms exceptBrassicaceae selection coefficients are mostly distributedwith S frac14 [ndash10 0] whereas in gymnosperms the majorityof selection coefficients are distributed within a wider rangewith S frac14 [ndash20 10] Gymnosperm families had a greater pro-portion of sites under strong purifying selection (S lt ndash15)than angiosperms (t frac14 ndash357 Plt 005) whereas angiospermfamilies had a greater proportion of neutral and mildly dele-terious (S 0 and S gt ndash15) sites than gymnosperms(tfrac14 4119 Plt 005) (fig 2) The proportion of advantageousmutations (positive selection coefficients) also varied amongfamilies with gymnosperm taxa having on average more sitesunder positive selection than angiosperm taxa (t frac14 ndash574Plt 0001) (fig 2)

Correlations with Life-History TraitsIn order to explain the variation we observe in dN dS and xbetween the major plant clades we tested whether thesevariables showed any associations with life history traitsOur results suggest that generation time (time to reach re-productive maturity measured in years) was strongly nega-tively correlated with l (family level r frac14 ndash078 Plt 0001)(table 4) Also genome size (amount of DNA contained in ahaploid nucleus and measured in picograms (1C)) wasstrongly negatively correlated with l (family level r frac14ndash067) but positively correlated with x (species levelrfrac14 056 Plt 0001 family level rfrac14 052 Plt 005) (table 4)Finally species richness (number of extant species in eachtaxonomic family) was significantly correlated with dS(rfrac14 07 Plt 0001) and dN (rfrac14 048 Plt 005) at the familylevel (not tested at species level)

We found significant differences (P lt 005) in l amonggroups with different life forms when using six groups(angiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes) Differences in x and l were alsosignificant among groups when comparing four groups (an-giosperms herbs angiosperms treesshrubs gymnospermsand Gnetophytes) and three groups (angiosperms herbs

Table 3 Results of the Likelihood Ratio Tests for Evidence of PositiveSelection in Each of the Taxonomic Families Studied

Taxa M1andashM2a (2DL) P Value M7ndashM8 (2DL) P Value

Brassicaceae 0 NS 2577926 02766Malvaceae 0 NS 33063102 000000007Fabaceae 0 NS 22744606 000001151Rosaceae 0 NS 6429714 004016Poaceae 0 NS 15428744 00004463Pinus 5114601 0 549377428 0Picea 116785146 0 116646654 0Cupressaceae 0 NS 1511157 0000523Cycadales 0 NS 6535688 003808

NOTEmdashComparisons between site models M1a and M2a and M7 and M8 for eachtaxonomic family were evaluated

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1367

angiosperms shrubstrees and gymnosperms) (see supplementary table S6 Supplementary Material Online) Numberof synonymous (dS) and non-synonymous substitutions (dN)were not significantly different in any of the groups whenassessed by P lt 005 Boxplots showing differences amonggroups when groups are divided into three (angiospermsherbs angiosperms treesshrubs and gymnosperms) are plot-ted in figure 2 In the Gnetophyte group only Gnetum specieswere included and not Welwitschia and Ephedra (seeDiscussion part)

Discussion

Slower Rates of Molecular Evolution in GymnospermsOur study clearly suggests slower rates of molecular evolutionin gymnosperm than in angiosperm protein-coding genes

evidenced by a lower number of synonymous substitutions(dS) and lower rates of silent-site divergence (l)

Based on the evolutionary and phylogenetic analyses of31968 nucleotide sites from a wide taxonomic sampling ourresults suggest l values of 41ndash14 10 10 in gymnospermtaxa and 4ndash65 10 9 in angiosperm taxa (table 1) Takentogether this corresponds to a 7-fold average variationamong gymnosperms and angiosperms (771 10 10 vs535 10 9 synonymous substitutionssiteyear respec-tively) Our results lay within the ranges obtained by previousstudies comparing a few Pinaceae species suggesting the var-iation found in this taxonomic family is a relatively goodrepresentation of the variation found in all the gymnospermtaxa with the exception of Gnetophytes For exampleWillyard et al (2007) while comparing different speciesof Pinus found absolute rates of silent-site divergence of 7ndash131 10 10 Similarly Chen et al (2012) estimated pairwise

A

B

C

FIG 2 (A) Boxplots showing differences in number of synonymous substitutions (dS) nonsynonymous substitutions (dN) absolute rate of silent-site divergence (l) and substitution rate ratio (x) among life forms defined as angiosperms herbs (green) angiosperms shrubtrees (light green)and gymnosperms (blue) Results of the statistical tests of comparisons among groups can be found in supplementary table S6 SupplementaryMaterial Online (B) Proportion of sites under negative selection when all sites are evaluated (top left) when Sltndash15 (top right) and when Sgtndash15(bottom left) and proportion of sites under positive selection (bottom right) Green boxes represent angiosperm species and blue boxesgymnosperms (C) Distribution of the selection coefficient (f(S)) of new mutations It was calculated for all species in each taxonomic familystudied assuming a gamma distribution of x among sites Parameter estimates a and b were obtained from running the site model 5 in codeml(PAML) using a concatenated alignment of 29000ndash31000 sites per taxonomic family Selection coefficients were then obtained replacing a and bin equation 1 Green dotted vertical lines are used to show the distribution of f(S) when S 0 and xfrac14 1 (neutrality) Taxonomic familiescontaining less than 5 species were not analyzed due to difficulties in constructing the phylogenetic trees in RaxML

De La Torre et al doi101093molbevmsx069 MBE

1368

sequence divergence between three Picea and one Taxus spe-cies and obtained l ranging from 55ndash124 10 10 FinallyBuschiazzo et al (2012) obtained a 25-fold difference whencomparing substitution rates between orthologs of Piceasitchensis and Pinus taeda with Arabidopsis thaliana and a4-fold difference when comparing the same Picea-Pinusorthologs with Populus trichocarpa

The pattern of slower rates of molecular evolution is con-sistent when grouping species according to their life formswith gymnosperms showing lower dS and l than angio-sperms herbs and angiosperms shrubs and trees This levelof evolutionary conservation (which probably help explainthe high levels of macrosynteny previously observed insome Pinaceae species) is surprising considering the ancientnature of the plant clade which appeared on Earth muchearlier than flowering plants

Higher Substitution Rate Ratio (x) in Gymnospermsthan in AngiospermsEqually surprising are the high substitution rate ratios (x)found in gymnosperms despite their slow rates of evolutionParameter estimates under models of variable x betweenstem branches of the angiosperm and gymnosperms lineagesbased on a concatenated alignment of 31737 nucleotide sitessupported a model of different and higher x in gymnospermsthan in angiosperms (table 2) In addition substitutionparameters based on seven different site models foundangiosperms rate ratios varied from 009ndash013 whereas gym-nosperms ratios varied from 017ndash067 Overall these esti-mates suggest a significantly higher chance of fixation ofnonsynonymous mutations than synonymous mutationsin gymnosperms than in angiosperm taxa (supplementarytable S5 Supplementary Material Online)

High x ratios could result from a high dN over a low dShowever this does not seem to be the case for gymnospermsInstead higher x ratios result from a low to moderate dNover a very low dS in gymnosperm genes Our results are incontrast with previous smaller scale studies either showingsignificantly lower dS and dN as a cause of a 4-fold higher x in

gymnosperms than in angiosperms (Buschiazzo et al 2012)or the ones that did not find significant differences in x ratiosamong gymnosperms and angiosperms (Chen et al 2012)These studies were limited by a small number of species thatwould not allow for evolutionary analyses within a phyloge-netic context When using a small number of species pairwiseestimates of dN dS and x will strongly depend on theselection of species In addition the evaluation of individualsites (site models) and the branch models have more powerto test variable x ratios than the pairwise sequence compar-isons in which x rates are averaged over all amino acidsites (Yang 2002) In the case of the Buschiazzorsquos study highlydiverged sets of species were used for comparisons(P sitchensisndashPtaeda and AthalianandashPtrichocarpa) High se-quence divergence between species is often associated withdifficulties in the alignment different codon usage biases andnucleotide compositions In addition saturation of substitu-tions may be particularly problematic when using pairwisemethods in comparison with branch methods

Positive Selection and Adaptive Evolution inGymnospermsUnder the Nearly Neutral Theory the strength and efficacy ofselection depends on the long-term effective population size(Ne) Theoretical approaches predict that as population sizesincrease the power of natural selection increases faster thanthe influx of new mutations (Akashi et al 2012) Naturalselection thus becomes more effective in removing deleteri-ous mutations and in fixing advantageous mutations whichresults in a lower substitution rate of deleterious mutationsand also a higher substitution rate of advantageous mutationsin large populations (Lanfear et al 2014)

Current estimates suggest that species with large popula-tion sizes such as Drosophila mice bacteria and rabbits showsigns of adaptive evolution (Bierne and Eyre-Walker 2004Carneiro et al 2012 Phifer-Rixey et al 2012) In contrast littleevidence of adaptive amino acid substitutions has been foundin flowering plants (mostly with modest Nelt 100000)(Gossmann et al 2010 Hough et al 2013) Few exceptions

Table 4 Correlation among Number of Synonymous (dS) and Nonsynonymous Substitutions (dN) Nonsynonymoussynonymous Rate Ratio(x) and the Absolute Rate of Silent-Site Divergence (l) with Life-History Traits Using the Phylogenetically Independent Contrast Method (PIC)

Species Correlations Taxonomic Family Correlations

Parameters Generation Time (years) Genome Size (1C) Generation Time (years) Genome Size (1C) Species Richness

dS r ndash0172 ndash0135 ndash0153 ndash0311 0703P-value 0223 0341 0558 0225 0002df 50 50 15 15 15

dN r 0005 0041 0189 ndash0026 0485P-value 0973 0771 0467 0922 0048df 50 50 15 15 15

M r ndash0183 ndash0037 ndash0781 ndash0670 0057P-value 0194 0797 0000 0003 0829df 50 50 15 15 15

X r 0212 0567 0482 0527 ndash0224P-value 0132 0000 0050 0030 0387df 50 50 15 15 15

NOTEmdashSubstitution rates and life-history traits were estimated by species pairs (species correlations table) and averaged within taxonomic family (taxonomic familycorrelations table) We reject a null hypothesis of correlation equal to zero when Plt 005 (shaded areas)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1369

have been found in angiosperms with Negt100000 such asCapsella grandiflora (Slotte et al 2010 Williamson et al 2014)Helianthus spp (Strasburg et al 2011) and Populus spp(Ingvarsson 2010 Wang et al 2016a 2016b) In gymnospermsprevious reports of Ne range from 120000ndash560000 for Piceaand Pinus species (Brown et al 2004 Bouille and Bosuquet2005 Syring et al 2007)

In our study we found a higher proportion of sites withpositive selection coefficients and evidence of stronger diver-sifying selection in gymnosperms than in angiosperms(Pfrac14 0033) Models that allow for sites under positive selec-tion such as M2a M3 and M8 all suggested the presence of avery small number (02ndash13) of weakly positively selectedsites (xfrac14 153ndash189) in angiosperm taxa in contrast to ahigher number (2313) of sites with moderate to strongdiversifying selection (xfrac14 221ndash329) in gymnosperm taxa Inaddition results of the BEB analysis using site model 2a iden-tified 241 (0759) sites under selection in gymnosperms andnone in angiosperms The same analysis using site model M8identified 546 (172) sites under selection in gymnospermsand 10 (003) in angiosperms Finally the NEB analysis usingsite model M3 identified 1960 (6175) sites in gymnospermsand only six (0018) in angiosperms Likelihood ratio tests forthe presence of diversifying selection were significant for Piceaand Pinus (M1andashM2a comparison) and for all taxa with theexception of Brassicaceae under M7 and M8 comparisonsUnder M3 Picea and Pinus showed the highest proportionsof sites under selection whereas Rosaceae showed the lowestproportion Our results are consistent with previous esti-mates of low levels of adaptive evolution in eleven angio-sperm species (Gossmann et al 2010) and with estimatesin Pinus contorta and Pinus taeda in which the proportionsof sites fixed by positive selection were 13ndash52 and 22ndash37respectively (Eckert et al 2013 Hodgins et al 2016)

There is a rich literature reporting the influence of positiveselection in Pinaceae species which in absence of reports inother gymnosperm families has frequently been used as anexample of all gymnosperm taxa The recent sequencingof some Pinaceae genomes have paved the way to test theinfluence of positive selection at the genome level (De LaTorre et al 2015a Hodgins et al 2016) which have confirmedwhat was previously found at the genetic (small number ofgenes) and population levels (Eckert et al 2010 Buschiazzoet al 2012 Pavy et al 2012 De La Torre et al 2015b)Furthermore studies on quantitative trait variation suggestnatural selection is highly efficient in producing a relativelyfast evolutionary response in contrast to the slow evolution-ary rate (Savolainen and Pyheuroajeuroarvi 2007) Examples of thisinclude the evolution of cold adapted genotypes in severallong-lived tree species in the northern hemisphere after theLast Glaciation (Mimura and Aitken 2007 Wachowiak et al2009 Holliday et al 2010 Kujala and Savolainen 2012 De LaTorre et al 2014b) In addition because of their long gener-ation times and low mutation rates gymnosperm genomesmay retain the consequences of demographic events for along time suggesting that even species that have contractedtheir ranges and currently have small distributions may re-semble those with long-term large effective population sizes

Although our results suggest an important role of positiveselection in the evolution of gymnosperm taxa we do notbelieve that Pinaceae is a good representative of other gym-nosperm taxa Pinus and Picea have indeed significantly highernumber of sites under diversifying selection than other gym-nosperm taxa Recent evolutionary radiations may have re-sulted in lower dS and higher x in comparisons within thesetaxonomic families (Palme et al 2009)

We expect our estimation of the proportion of sites underselection to be conservative because of the use of single copygenes Single copy genes are usually ubiquitously expressedgenes that encode for basic cellular functions that are pre-served across taxa For this reason they are likely to be understronger purifying selection due to functional and structuralconstraints than paralogous genes in gene families in bothangiosperms (De Smet et al 2013 Li et al 2016) andgymnosperms (De La Torre et al 2015a) These genes alsoseem to experience less frequent positive selection than paral-ogous genes which tend to be more narrowly expressed(Larracuente et al 2007) For this reason we expect a lowerproportion of conserved sites and a higher proportion ofselected sites in paralogous genes (which account for themajority of genes in plant genomes) In fact it has been sug-gested that many of the genes under diversifying selection inconifers belong to large multi-copy gene families such asLeucine Rich Repeats Cytochrome P450 among others(Pavy et al 2012 Neale et al 2014 De La Torre et al2015a) In addition some sites under positive selection maygo undetected because structural constraints may inducepurifying selection to push x to values lower than 1 in highlyconserved genes (Echave et al 2016) However the introduc-tion of paralogous genes in a phylogeny has confoundingeffects for both the phylogenetic reconstruction and alsofor the estimation of sequence divergence among taxa there-fore we believe than the use of single-copy genes is probablythe best method for comparisons among diverse taxa In ourefforts to identify a highly confident set of very well alignedorthologs across a broad taxonomic sampling the number ofgenes used in this study was significantly reduced raising con-cerns about the representativeness of these genes in agenome-wide context However comparisons with previousstudies suggest our estimates of rates of divergence are quitecomparable with estimates of rates with much larger data setsin specific taxa (Gossmann et al 2010 Eckert et al 2013Hodgins et al 2016)

Stronger and More Efficient Purifying Selection inGymnospermsWhen estimating the distribution of selection coefficientswe found that the majority of sites in both angiospermsand gymnosperms were highly conserved suggesting animportant role of purifying selection in the evolution of plantgenomes Previous studies in angiosperm species have sug-gested that the largest proportion of mutations is stronglydeleterious and that the number of slightly and mildly dele-terious mutations seems to be relatively conserved amongspecies (Gossmann et al 2010) In contrast with these studiesour study has found differences in the proportion of sites

De La Torre et al doi101093molbevmsx069 MBE

1370

under mildly and strong purifying selection Gymnospermfamilies had a greater proportion of sites under strong puri-fying selection (S lt ndash15) than angiosperms (t frac14 ndash357Plt 005) whereas angiosperm families had a greater propor-tion of neutral and mildly deleterious (S 0 and S gtndash15)sites than gymnosperms (tfrac14 4119 Plt 005) (fig 2)

This could be explained by the expectation that specieswith larger Ne generally experience stronger purifying selec-tion (Slotte et al 2010 Gossmann et al 2010 Hough et al2013 Williamson et al 2014) Slightly deleterious mutationswill be less likely to segregate at higher frequencies and fixin large populations (Akashi et al 2012) Such mutations arealso selected against by the partial low degree of selfing andhigh early inbreeding depression in conifers (Williams andSavolainen 1996 Remington and OrsquoMalley 2000) Larger pop-ulation sizes may also explain the lower number of neutralsites in gymnosperms than in angiosperms found in thisstudy as it has been shown that the number of effectivelyneutral mutations is negatively correlated with Ne in manyspecies (Piganeau and Eyre-Walker 2009 Gossmann et al2010)

Life Form Does Not Explain Differences in MolecularEvolution among Woody Angiosperms and WoodyGymnosperms or within GymnospermsIn flowering plants it has been recently suggested that lifeform (also measured as plant height) is correlated with ratesof molecular evolution with shrubs and trees showing lowerrates of molecular evolution than herbaceous plants (Gautet al 1992 Smith and Donoghue 2008 Lanfear et al 2013)Similarly in our study we found significantly lower l onaverage in angiosperm shrubstrees compared to angiospermherbs (see supplementary table S6 Supplementary MaterialOnline and fig 2) Both dS and dN showed large variationamong angiosperm herbs and differences in dS and dNamong angiosperm shrubstrees and herbs were not signifi-cant (see supplementary table S6 Supplementary MaterialOnline) When incorporating gymnosperms into the analysislife form on its own does not explain the differences in rates ofmolecular evolution among seed plants as we observe signif-icant differences in dN dS x and l also among woodyangiosperms and woody gymnosperms (fig 2)

Our analyses showed that Gnetophytes the only gymno-sperm division that contains non-woody genera (Welwitschiaand Ephedra) shows higher dN and l than what we observein other gymnosperm families (table 1) Therefore differencesin life form may drive differences in the rate of molecularevolution found in Gnetophytes However we still observethese differences when only including the woody genusGnetum (and excluding Ephedra and Welwitschia) suggestingother factors may influence the differences in rate of sequenceevolution (table 1) In fact Gnetophytes are different from therest of gymnosperms in a number of features that include butare not restricted to their morphology ecology and thepresence of angiosperm-like characteristics such as specialwater-conducting wood vessels and reproductive structuresorganized in compound strobili (Doyle and Donoghue 1986

Friedman 1998) For these reasons the position ofGnetophytes within gymnosperms is contentious (Doyle1998 Braukmann et al 2009 Cibrian-Jaramillo et al 2010Ran et al 2010 Xi et al 2013) In any case the lack of closeextant relatives to Gnetophytes suggests that this problemmight never really be solved satisfactorily regardless of theamount of sequence data we have access to Our resultssuggest that estimations of rates of sequence divergenceamong a few species from very diverged Gnetophyte taxa(such as Gnetum Welwitschia and Ephedra) may lead toinaccurate estimates of dN and dS For this reason we de-cided to exclude the more distant Welwitschia and Ephedrafrom all correlations with life history traits

Generation Times and Their Effect on LowSubstitution RatesThe differences in rates of molecular evolution between an-giosperms with different life forms have been presumed toreflect differences in generation times between herbaceousand woody plants (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013) The generation time hypothesissuggests that species with shorter generation times (eg her-baceous plants) accumulate more replication errors per unittime because they copy their genomes more often whichresults in higher mutation rates (Li et al 1996) Althoughgeneration time may not be a good indicator of the overallrate of genome replication because the number of mitotic celldivisions can vary substantially between generations andamong plant species it remains strongly associated withthe long-term rates of meiosis in plants (Petit and Hampe2006 Lanfear et al 2013) Therefore if a significant proportionof heritable mutations occur during meiosis plants with lon-ger generation times (eg gymnosperms) would have lowermutation and substitution rates per unit of time (Lanfearet al 2013) Alternatively it is been proposed that differencesin the rates of mitosis (mitotic cell divisions that occur in theapical meristem before gametogenesis) can account for theobserved differences in rates of molecular evolution amongplants of different height (Lanfear et al 2013) Lower absolutegrowth rates in long-lived woody perennials would translatein fewer cell divisions and less opportunities for DNA repli-cations errors than short-lived plants (Bromham et al 2015)This would result in a lower mutation rate per unit of time inlong-lived species Our results are consistent with this expec-tation as we found a strong negative correlation betweengeneration times and l (r frac14 ndash078 Plt 0001) (table 4)Long generation times large effective population sizes andlow recombination rates may also help explain the low syn-onymous polymorphism in gymnosperms (Savolainen andPyheuroajeuroarvi 2007 Jaramillo-Correa et al 2010)

In plants in the absence of a segregated germline gametesarise from the apical meristem late in development followingperiods of vegetative growth Because of this somatic muta-tions acquired during vegetative growth can be transmittedto the next generation (Watson et al 2016) Longer growthperiods are thought to result in a larger number of cell divi-sions increasing the opportunities for mutations to occur pergeneration (Schultz and Scofield 2009) Our results suggest

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1371

than on average for the species under study long-lived gym-nosperms accumulate four times more mutations per gener-ation than short-lived angiosperms (assuming an averagegeneration time of 20 years in gymnosperms and an averagel equal to 157 10 8 in gymnosperms and 392 10 9

in angiosperms) This is coincident with previous reports ofincreased number of somatic mutations in gymnospermsspecies (Cloutier et al 2003 OrsquoConnell and Ritland 2004)and other woody perennials (Ally et al 2010 Bobiwashet al 2013) However because the per-generation increasein mutation rate in woody perennials may be less thanpredicted from their differences in generation time (Petitand Hampe 2006) we would expect our reported valuesto be an over-estimation of the actual difference betweenplant seed clades Differences in generation times wouldpredict a mutation rategt100 times larger in mangrovetrees than in annuals when estimates were only 25 timeslarger (Klekowski and Godfrey 1989) Alternatively theremay be selection to reduce DNA replication-dependenterrors through minimizing the number of cell divisionsrequired during development as recently suggested inArabidopsis (Watson et al 2016)

Larger Genomes Are Correlated with Larger Omegasand Slower Rates of Sequence DivergenceIn our study we also explored the relationship between ge-nome size and rates of molecular evolution Gymnospermgenomes are characterized by their enormous genome sizes(20ndash40 Gb) unique genome silencing mechanisms and lowunequal recombination (Leitch and Leitch 2012 Nystedt et al2013) While angiosperm genomes are highly dynamic andhave efficient mechanisms to counteract the increase in DNAamount stemming from WGDs or transposable elements (egreplication or recombination-based errors generating indelsunequal recombination between sister chromosomes Groveland Wendel 2010) gymnosperm genomes seem to be lessdynamic and may have evolved their own epigenetic mech-anisms to silence retrotransposons (Leitch and Leitch 2012)Our results indicate that genome size is positively correlatedwith x (rfrac14 056 Plt 0001 at the species level and rfrac14 052Plt 005 at the family level) In addition genome size is neg-atively correlated with l (r frac14 ndash067 Plt 001) suggestingspecies with large genomes such as gymnosperms have slowerrates of molecular evolution than species with small genomesconfirming previous studies on angiosperms species(Bromham et al 2015) A possible explanation for this isthat if large genome sizes correlate with larger cells and areduction in growth then plants with larger genomes mighthave fewer replications and therefore less accumulated mu-tations per unit of time (Bromham et al 2015)

Our study aims to contribute to the unexplored field ofmolecular evolution in seed plants by investigating thedifferences in rates of molecular evolution among gym-nosperms and angiosperms using a phylogenetic frame-work The recent genomic and transcriptomic resourcesin gymnosperm species opened a window to understandthe evolution of this ancient and important plant cladeHowever gymnosperm resources are still limited in

comparison to their sister clade of flowering plants Wehope that with the development of new genomic re-sources studies in molecular evolution will include abroader taxonomic sampling that provides us with amore complete understanding of the evolution of seedplants

Materials and Methods

Generation of Phylogenetic MarkersTo develop our set of phylogenetic markers we performeddeep sequencing and assembly of whole transcriptomesfrom two conifer species (Pinus pinaster and Pinus sylvestris)(Li Z et al (in review) transcriptome data can be found athttpbioinformaticspsbugentbesupplementary_datazheliphylo) We integrated this data with whole-transcriptomedata from another 29 gymnosperms 34 angiosperms and 1outgroup (Physcomitrella patens) obtained from public data-bases PlantGDB oneKP TreeGenes and PLAZA v30 (Proostet al 2015) We used OrthoMCL (Li et al 2003) to buildorthologous gene families across the species To reduce thenumber of single-copy genes we selected only the gene fam-ilies that were conserved and had low copy number Later weused hidden Markov probabilistic models implemented inHMMER to build an HMM profile for each gene family basedon multiple sequence alignment (Eddy 2009) These HMMprofiles were then used to assign additional proteins to theexisting gene families Then we used HMMSEARCH to findthe best protein hit of an HMM profile in each species andselected the markers with only reciprocal best hits for phylo-genomic analysis In order to increase the spectrum of thephylogenetic markers we only selected single-copy markersthat were present in a majority of species

Multiple sequence alignments were carried out for eachgene family based on amino acid sequences using Musclev3831 (Edgar et al 2004) Trimal v14 (Capella-Gutierrezet al 2009) was used to back translate the amino acid align-ments into coding sequence alignments and to remove lowquality alignment regions and spurious sequences We de-fined a spurious sequence as a sequence having less than70 of the total alignment positions that were present in75 of 66 species Genes that lost their sequences in morethan 10 of the species after removing spurious sequenceswere not used for further analyses This way we obtained 42single-copy markers across 66 seed plant species that wereused for evolutionary and phylogenetic analyses Annotationsof all genes are reported in supplementary table S1Supplementary Material Online

Rates of Sequence DivergenceMultiple sequence alignments for each of the 42 single-copygenes were divided in two groups containing either angio-sperms (34 species) or gymnosperms (31 species) to allowfurther comparisons between major plant taxa Alignmentgaps and low quality regions were manually removed usingJalview version 281 (Waterhouse et al 2009) Also speciescontaining more than 30 gaps or ambiguous sites were notkept for further analyses

De La Torre et al doi101093molbevmsx069 MBE

1372

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 6: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

angiosperms shrubstrees and gymnosperms) (see supplementary table S6 Supplementary Material Online) Numberof synonymous (dS) and non-synonymous substitutions (dN)were not significantly different in any of the groups whenassessed by P lt 005 Boxplots showing differences amonggroups when groups are divided into three (angiospermsherbs angiosperms treesshrubs and gymnosperms) are plot-ted in figure 2 In the Gnetophyte group only Gnetum specieswere included and not Welwitschia and Ephedra (seeDiscussion part)

Discussion

Slower Rates of Molecular Evolution in GymnospermsOur study clearly suggests slower rates of molecular evolutionin gymnosperm than in angiosperm protein-coding genes

evidenced by a lower number of synonymous substitutions(dS) and lower rates of silent-site divergence (l)

Based on the evolutionary and phylogenetic analyses of31968 nucleotide sites from a wide taxonomic sampling ourresults suggest l values of 41ndash14 10 10 in gymnospermtaxa and 4ndash65 10 9 in angiosperm taxa (table 1) Takentogether this corresponds to a 7-fold average variationamong gymnosperms and angiosperms (771 10 10 vs535 10 9 synonymous substitutionssiteyear respec-tively) Our results lay within the ranges obtained by previousstudies comparing a few Pinaceae species suggesting the var-iation found in this taxonomic family is a relatively goodrepresentation of the variation found in all the gymnospermtaxa with the exception of Gnetophytes For exampleWillyard et al (2007) while comparing different speciesof Pinus found absolute rates of silent-site divergence of 7ndash131 10 10 Similarly Chen et al (2012) estimated pairwise

A

B

C

FIG 2 (A) Boxplots showing differences in number of synonymous substitutions (dS) nonsynonymous substitutions (dN) absolute rate of silent-site divergence (l) and substitution rate ratio (x) among life forms defined as angiosperms herbs (green) angiosperms shrubtrees (light green)and gymnosperms (blue) Results of the statistical tests of comparisons among groups can be found in supplementary table S6 SupplementaryMaterial Online (B) Proportion of sites under negative selection when all sites are evaluated (top left) when Sltndash15 (top right) and when Sgtndash15(bottom left) and proportion of sites under positive selection (bottom right) Green boxes represent angiosperm species and blue boxesgymnosperms (C) Distribution of the selection coefficient (f(S)) of new mutations It was calculated for all species in each taxonomic familystudied assuming a gamma distribution of x among sites Parameter estimates a and b were obtained from running the site model 5 in codeml(PAML) using a concatenated alignment of 29000ndash31000 sites per taxonomic family Selection coefficients were then obtained replacing a and bin equation 1 Green dotted vertical lines are used to show the distribution of f(S) when S 0 and xfrac14 1 (neutrality) Taxonomic familiescontaining less than 5 species were not analyzed due to difficulties in constructing the phylogenetic trees in RaxML

De La Torre et al doi101093molbevmsx069 MBE

1368

sequence divergence between three Picea and one Taxus spe-cies and obtained l ranging from 55ndash124 10 10 FinallyBuschiazzo et al (2012) obtained a 25-fold difference whencomparing substitution rates between orthologs of Piceasitchensis and Pinus taeda with Arabidopsis thaliana and a4-fold difference when comparing the same Picea-Pinusorthologs with Populus trichocarpa

The pattern of slower rates of molecular evolution is con-sistent when grouping species according to their life formswith gymnosperms showing lower dS and l than angio-sperms herbs and angiosperms shrubs and trees This levelof evolutionary conservation (which probably help explainthe high levels of macrosynteny previously observed insome Pinaceae species) is surprising considering the ancientnature of the plant clade which appeared on Earth muchearlier than flowering plants

Higher Substitution Rate Ratio (x) in Gymnospermsthan in AngiospermsEqually surprising are the high substitution rate ratios (x)found in gymnosperms despite their slow rates of evolutionParameter estimates under models of variable x betweenstem branches of the angiosperm and gymnosperms lineagesbased on a concatenated alignment of 31737 nucleotide sitessupported a model of different and higher x in gymnospermsthan in angiosperms (table 2) In addition substitutionparameters based on seven different site models foundangiosperms rate ratios varied from 009ndash013 whereas gym-nosperms ratios varied from 017ndash067 Overall these esti-mates suggest a significantly higher chance of fixation ofnonsynonymous mutations than synonymous mutationsin gymnosperms than in angiosperm taxa (supplementarytable S5 Supplementary Material Online)

High x ratios could result from a high dN over a low dShowever this does not seem to be the case for gymnospermsInstead higher x ratios result from a low to moderate dNover a very low dS in gymnosperm genes Our results are incontrast with previous smaller scale studies either showingsignificantly lower dS and dN as a cause of a 4-fold higher x in

gymnosperms than in angiosperms (Buschiazzo et al 2012)or the ones that did not find significant differences in x ratiosamong gymnosperms and angiosperms (Chen et al 2012)These studies were limited by a small number of species thatwould not allow for evolutionary analyses within a phyloge-netic context When using a small number of species pairwiseestimates of dN dS and x will strongly depend on theselection of species In addition the evaluation of individualsites (site models) and the branch models have more powerto test variable x ratios than the pairwise sequence compar-isons in which x rates are averaged over all amino acidsites (Yang 2002) In the case of the Buschiazzorsquos study highlydiverged sets of species were used for comparisons(P sitchensisndashPtaeda and AthalianandashPtrichocarpa) High se-quence divergence between species is often associated withdifficulties in the alignment different codon usage biases andnucleotide compositions In addition saturation of substitu-tions may be particularly problematic when using pairwisemethods in comparison with branch methods

Positive Selection and Adaptive Evolution inGymnospermsUnder the Nearly Neutral Theory the strength and efficacy ofselection depends on the long-term effective population size(Ne) Theoretical approaches predict that as population sizesincrease the power of natural selection increases faster thanthe influx of new mutations (Akashi et al 2012) Naturalselection thus becomes more effective in removing deleteri-ous mutations and in fixing advantageous mutations whichresults in a lower substitution rate of deleterious mutationsand also a higher substitution rate of advantageous mutationsin large populations (Lanfear et al 2014)

Current estimates suggest that species with large popula-tion sizes such as Drosophila mice bacteria and rabbits showsigns of adaptive evolution (Bierne and Eyre-Walker 2004Carneiro et al 2012 Phifer-Rixey et al 2012) In contrast littleevidence of adaptive amino acid substitutions has been foundin flowering plants (mostly with modest Nelt 100000)(Gossmann et al 2010 Hough et al 2013) Few exceptions

Table 4 Correlation among Number of Synonymous (dS) and Nonsynonymous Substitutions (dN) Nonsynonymoussynonymous Rate Ratio(x) and the Absolute Rate of Silent-Site Divergence (l) with Life-History Traits Using the Phylogenetically Independent Contrast Method (PIC)

Species Correlations Taxonomic Family Correlations

Parameters Generation Time (years) Genome Size (1C) Generation Time (years) Genome Size (1C) Species Richness

dS r ndash0172 ndash0135 ndash0153 ndash0311 0703P-value 0223 0341 0558 0225 0002df 50 50 15 15 15

dN r 0005 0041 0189 ndash0026 0485P-value 0973 0771 0467 0922 0048df 50 50 15 15 15

M r ndash0183 ndash0037 ndash0781 ndash0670 0057P-value 0194 0797 0000 0003 0829df 50 50 15 15 15

X r 0212 0567 0482 0527 ndash0224P-value 0132 0000 0050 0030 0387df 50 50 15 15 15

NOTEmdashSubstitution rates and life-history traits were estimated by species pairs (species correlations table) and averaged within taxonomic family (taxonomic familycorrelations table) We reject a null hypothesis of correlation equal to zero when Plt 005 (shaded areas)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1369

have been found in angiosperms with Negt100000 such asCapsella grandiflora (Slotte et al 2010 Williamson et al 2014)Helianthus spp (Strasburg et al 2011) and Populus spp(Ingvarsson 2010 Wang et al 2016a 2016b) In gymnospermsprevious reports of Ne range from 120000ndash560000 for Piceaand Pinus species (Brown et al 2004 Bouille and Bosuquet2005 Syring et al 2007)

In our study we found a higher proportion of sites withpositive selection coefficients and evidence of stronger diver-sifying selection in gymnosperms than in angiosperms(Pfrac14 0033) Models that allow for sites under positive selec-tion such as M2a M3 and M8 all suggested the presence of avery small number (02ndash13) of weakly positively selectedsites (xfrac14 153ndash189) in angiosperm taxa in contrast to ahigher number (2313) of sites with moderate to strongdiversifying selection (xfrac14 221ndash329) in gymnosperm taxa Inaddition results of the BEB analysis using site model 2a iden-tified 241 (0759) sites under selection in gymnosperms andnone in angiosperms The same analysis using site model M8identified 546 (172) sites under selection in gymnospermsand 10 (003) in angiosperms Finally the NEB analysis usingsite model M3 identified 1960 (6175) sites in gymnospermsand only six (0018) in angiosperms Likelihood ratio tests forthe presence of diversifying selection were significant for Piceaand Pinus (M1andashM2a comparison) and for all taxa with theexception of Brassicaceae under M7 and M8 comparisonsUnder M3 Picea and Pinus showed the highest proportionsof sites under selection whereas Rosaceae showed the lowestproportion Our results are consistent with previous esti-mates of low levels of adaptive evolution in eleven angio-sperm species (Gossmann et al 2010) and with estimatesin Pinus contorta and Pinus taeda in which the proportionsof sites fixed by positive selection were 13ndash52 and 22ndash37respectively (Eckert et al 2013 Hodgins et al 2016)

There is a rich literature reporting the influence of positiveselection in Pinaceae species which in absence of reports inother gymnosperm families has frequently been used as anexample of all gymnosperm taxa The recent sequencingof some Pinaceae genomes have paved the way to test theinfluence of positive selection at the genome level (De LaTorre et al 2015a Hodgins et al 2016) which have confirmedwhat was previously found at the genetic (small number ofgenes) and population levels (Eckert et al 2010 Buschiazzoet al 2012 Pavy et al 2012 De La Torre et al 2015b)Furthermore studies on quantitative trait variation suggestnatural selection is highly efficient in producing a relativelyfast evolutionary response in contrast to the slow evolution-ary rate (Savolainen and Pyheuroajeuroarvi 2007) Examples of thisinclude the evolution of cold adapted genotypes in severallong-lived tree species in the northern hemisphere after theLast Glaciation (Mimura and Aitken 2007 Wachowiak et al2009 Holliday et al 2010 Kujala and Savolainen 2012 De LaTorre et al 2014b) In addition because of their long gener-ation times and low mutation rates gymnosperm genomesmay retain the consequences of demographic events for along time suggesting that even species that have contractedtheir ranges and currently have small distributions may re-semble those with long-term large effective population sizes

Although our results suggest an important role of positiveselection in the evolution of gymnosperm taxa we do notbelieve that Pinaceae is a good representative of other gym-nosperm taxa Pinus and Picea have indeed significantly highernumber of sites under diversifying selection than other gym-nosperm taxa Recent evolutionary radiations may have re-sulted in lower dS and higher x in comparisons within thesetaxonomic families (Palme et al 2009)

We expect our estimation of the proportion of sites underselection to be conservative because of the use of single copygenes Single copy genes are usually ubiquitously expressedgenes that encode for basic cellular functions that are pre-served across taxa For this reason they are likely to be understronger purifying selection due to functional and structuralconstraints than paralogous genes in gene families in bothangiosperms (De Smet et al 2013 Li et al 2016) andgymnosperms (De La Torre et al 2015a) These genes alsoseem to experience less frequent positive selection than paral-ogous genes which tend to be more narrowly expressed(Larracuente et al 2007) For this reason we expect a lowerproportion of conserved sites and a higher proportion ofselected sites in paralogous genes (which account for themajority of genes in plant genomes) In fact it has been sug-gested that many of the genes under diversifying selection inconifers belong to large multi-copy gene families such asLeucine Rich Repeats Cytochrome P450 among others(Pavy et al 2012 Neale et al 2014 De La Torre et al2015a) In addition some sites under positive selection maygo undetected because structural constraints may inducepurifying selection to push x to values lower than 1 in highlyconserved genes (Echave et al 2016) However the introduc-tion of paralogous genes in a phylogeny has confoundingeffects for both the phylogenetic reconstruction and alsofor the estimation of sequence divergence among taxa there-fore we believe than the use of single-copy genes is probablythe best method for comparisons among diverse taxa In ourefforts to identify a highly confident set of very well alignedorthologs across a broad taxonomic sampling the number ofgenes used in this study was significantly reduced raising con-cerns about the representativeness of these genes in agenome-wide context However comparisons with previousstudies suggest our estimates of rates of divergence are quitecomparable with estimates of rates with much larger data setsin specific taxa (Gossmann et al 2010 Eckert et al 2013Hodgins et al 2016)

Stronger and More Efficient Purifying Selection inGymnospermsWhen estimating the distribution of selection coefficientswe found that the majority of sites in both angiospermsand gymnosperms were highly conserved suggesting animportant role of purifying selection in the evolution of plantgenomes Previous studies in angiosperm species have sug-gested that the largest proportion of mutations is stronglydeleterious and that the number of slightly and mildly dele-terious mutations seems to be relatively conserved amongspecies (Gossmann et al 2010) In contrast with these studiesour study has found differences in the proportion of sites

De La Torre et al doi101093molbevmsx069 MBE

1370

under mildly and strong purifying selection Gymnospermfamilies had a greater proportion of sites under strong puri-fying selection (S lt ndash15) than angiosperms (t frac14 ndash357Plt 005) whereas angiosperm families had a greater propor-tion of neutral and mildly deleterious (S 0 and S gtndash15)sites than gymnosperms (tfrac14 4119 Plt 005) (fig 2)

This could be explained by the expectation that specieswith larger Ne generally experience stronger purifying selec-tion (Slotte et al 2010 Gossmann et al 2010 Hough et al2013 Williamson et al 2014) Slightly deleterious mutationswill be less likely to segregate at higher frequencies and fixin large populations (Akashi et al 2012) Such mutations arealso selected against by the partial low degree of selfing andhigh early inbreeding depression in conifers (Williams andSavolainen 1996 Remington and OrsquoMalley 2000) Larger pop-ulation sizes may also explain the lower number of neutralsites in gymnosperms than in angiosperms found in thisstudy as it has been shown that the number of effectivelyneutral mutations is negatively correlated with Ne in manyspecies (Piganeau and Eyre-Walker 2009 Gossmann et al2010)

Life Form Does Not Explain Differences in MolecularEvolution among Woody Angiosperms and WoodyGymnosperms or within GymnospermsIn flowering plants it has been recently suggested that lifeform (also measured as plant height) is correlated with ratesof molecular evolution with shrubs and trees showing lowerrates of molecular evolution than herbaceous plants (Gautet al 1992 Smith and Donoghue 2008 Lanfear et al 2013)Similarly in our study we found significantly lower l onaverage in angiosperm shrubstrees compared to angiospermherbs (see supplementary table S6 Supplementary MaterialOnline and fig 2) Both dS and dN showed large variationamong angiosperm herbs and differences in dS and dNamong angiosperm shrubstrees and herbs were not signifi-cant (see supplementary table S6 Supplementary MaterialOnline) When incorporating gymnosperms into the analysislife form on its own does not explain the differences in rates ofmolecular evolution among seed plants as we observe signif-icant differences in dN dS x and l also among woodyangiosperms and woody gymnosperms (fig 2)

Our analyses showed that Gnetophytes the only gymno-sperm division that contains non-woody genera (Welwitschiaand Ephedra) shows higher dN and l than what we observein other gymnosperm families (table 1) Therefore differencesin life form may drive differences in the rate of molecularevolution found in Gnetophytes However we still observethese differences when only including the woody genusGnetum (and excluding Ephedra and Welwitschia) suggestingother factors may influence the differences in rate of sequenceevolution (table 1) In fact Gnetophytes are different from therest of gymnosperms in a number of features that include butare not restricted to their morphology ecology and thepresence of angiosperm-like characteristics such as specialwater-conducting wood vessels and reproductive structuresorganized in compound strobili (Doyle and Donoghue 1986

Friedman 1998) For these reasons the position ofGnetophytes within gymnosperms is contentious (Doyle1998 Braukmann et al 2009 Cibrian-Jaramillo et al 2010Ran et al 2010 Xi et al 2013) In any case the lack of closeextant relatives to Gnetophytes suggests that this problemmight never really be solved satisfactorily regardless of theamount of sequence data we have access to Our resultssuggest that estimations of rates of sequence divergenceamong a few species from very diverged Gnetophyte taxa(such as Gnetum Welwitschia and Ephedra) may lead toinaccurate estimates of dN and dS For this reason we de-cided to exclude the more distant Welwitschia and Ephedrafrom all correlations with life history traits

Generation Times and Their Effect on LowSubstitution RatesThe differences in rates of molecular evolution between an-giosperms with different life forms have been presumed toreflect differences in generation times between herbaceousand woody plants (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013) The generation time hypothesissuggests that species with shorter generation times (eg her-baceous plants) accumulate more replication errors per unittime because they copy their genomes more often whichresults in higher mutation rates (Li et al 1996) Althoughgeneration time may not be a good indicator of the overallrate of genome replication because the number of mitotic celldivisions can vary substantially between generations andamong plant species it remains strongly associated withthe long-term rates of meiosis in plants (Petit and Hampe2006 Lanfear et al 2013) Therefore if a significant proportionof heritable mutations occur during meiosis plants with lon-ger generation times (eg gymnosperms) would have lowermutation and substitution rates per unit of time (Lanfearet al 2013) Alternatively it is been proposed that differencesin the rates of mitosis (mitotic cell divisions that occur in theapical meristem before gametogenesis) can account for theobserved differences in rates of molecular evolution amongplants of different height (Lanfear et al 2013) Lower absolutegrowth rates in long-lived woody perennials would translatein fewer cell divisions and less opportunities for DNA repli-cations errors than short-lived plants (Bromham et al 2015)This would result in a lower mutation rate per unit of time inlong-lived species Our results are consistent with this expec-tation as we found a strong negative correlation betweengeneration times and l (r frac14 ndash078 Plt 0001) (table 4)Long generation times large effective population sizes andlow recombination rates may also help explain the low syn-onymous polymorphism in gymnosperms (Savolainen andPyheuroajeuroarvi 2007 Jaramillo-Correa et al 2010)

In plants in the absence of a segregated germline gametesarise from the apical meristem late in development followingperiods of vegetative growth Because of this somatic muta-tions acquired during vegetative growth can be transmittedto the next generation (Watson et al 2016) Longer growthperiods are thought to result in a larger number of cell divi-sions increasing the opportunities for mutations to occur pergeneration (Schultz and Scofield 2009) Our results suggest

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1371

than on average for the species under study long-lived gym-nosperms accumulate four times more mutations per gener-ation than short-lived angiosperms (assuming an averagegeneration time of 20 years in gymnosperms and an averagel equal to 157 10 8 in gymnosperms and 392 10 9

in angiosperms) This is coincident with previous reports ofincreased number of somatic mutations in gymnospermsspecies (Cloutier et al 2003 OrsquoConnell and Ritland 2004)and other woody perennials (Ally et al 2010 Bobiwashet al 2013) However because the per-generation increasein mutation rate in woody perennials may be less thanpredicted from their differences in generation time (Petitand Hampe 2006) we would expect our reported valuesto be an over-estimation of the actual difference betweenplant seed clades Differences in generation times wouldpredict a mutation rategt100 times larger in mangrovetrees than in annuals when estimates were only 25 timeslarger (Klekowski and Godfrey 1989) Alternatively theremay be selection to reduce DNA replication-dependenterrors through minimizing the number of cell divisionsrequired during development as recently suggested inArabidopsis (Watson et al 2016)

Larger Genomes Are Correlated with Larger Omegasand Slower Rates of Sequence DivergenceIn our study we also explored the relationship between ge-nome size and rates of molecular evolution Gymnospermgenomes are characterized by their enormous genome sizes(20ndash40 Gb) unique genome silencing mechanisms and lowunequal recombination (Leitch and Leitch 2012 Nystedt et al2013) While angiosperm genomes are highly dynamic andhave efficient mechanisms to counteract the increase in DNAamount stemming from WGDs or transposable elements (egreplication or recombination-based errors generating indelsunequal recombination between sister chromosomes Groveland Wendel 2010) gymnosperm genomes seem to be lessdynamic and may have evolved their own epigenetic mech-anisms to silence retrotransposons (Leitch and Leitch 2012)Our results indicate that genome size is positively correlatedwith x (rfrac14 056 Plt 0001 at the species level and rfrac14 052Plt 005 at the family level) In addition genome size is neg-atively correlated with l (r frac14 ndash067 Plt 001) suggestingspecies with large genomes such as gymnosperms have slowerrates of molecular evolution than species with small genomesconfirming previous studies on angiosperms species(Bromham et al 2015) A possible explanation for this isthat if large genome sizes correlate with larger cells and areduction in growth then plants with larger genomes mighthave fewer replications and therefore less accumulated mu-tations per unit of time (Bromham et al 2015)

Our study aims to contribute to the unexplored field ofmolecular evolution in seed plants by investigating thedifferences in rates of molecular evolution among gym-nosperms and angiosperms using a phylogenetic frame-work The recent genomic and transcriptomic resourcesin gymnosperm species opened a window to understandthe evolution of this ancient and important plant cladeHowever gymnosperm resources are still limited in

comparison to their sister clade of flowering plants Wehope that with the development of new genomic re-sources studies in molecular evolution will include abroader taxonomic sampling that provides us with amore complete understanding of the evolution of seedplants

Materials and Methods

Generation of Phylogenetic MarkersTo develop our set of phylogenetic markers we performeddeep sequencing and assembly of whole transcriptomesfrom two conifer species (Pinus pinaster and Pinus sylvestris)(Li Z et al (in review) transcriptome data can be found athttpbioinformaticspsbugentbesupplementary_datazheliphylo) We integrated this data with whole-transcriptomedata from another 29 gymnosperms 34 angiosperms and 1outgroup (Physcomitrella patens) obtained from public data-bases PlantGDB oneKP TreeGenes and PLAZA v30 (Proostet al 2015) We used OrthoMCL (Li et al 2003) to buildorthologous gene families across the species To reduce thenumber of single-copy genes we selected only the gene fam-ilies that were conserved and had low copy number Later weused hidden Markov probabilistic models implemented inHMMER to build an HMM profile for each gene family basedon multiple sequence alignment (Eddy 2009) These HMMprofiles were then used to assign additional proteins to theexisting gene families Then we used HMMSEARCH to findthe best protein hit of an HMM profile in each species andselected the markers with only reciprocal best hits for phylo-genomic analysis In order to increase the spectrum of thephylogenetic markers we only selected single-copy markersthat were present in a majority of species

Multiple sequence alignments were carried out for eachgene family based on amino acid sequences using Musclev3831 (Edgar et al 2004) Trimal v14 (Capella-Gutierrezet al 2009) was used to back translate the amino acid align-ments into coding sequence alignments and to remove lowquality alignment regions and spurious sequences We de-fined a spurious sequence as a sequence having less than70 of the total alignment positions that were present in75 of 66 species Genes that lost their sequences in morethan 10 of the species after removing spurious sequenceswere not used for further analyses This way we obtained 42single-copy markers across 66 seed plant species that wereused for evolutionary and phylogenetic analyses Annotationsof all genes are reported in supplementary table S1Supplementary Material Online

Rates of Sequence DivergenceMultiple sequence alignments for each of the 42 single-copygenes were divided in two groups containing either angio-sperms (34 species) or gymnosperms (31 species) to allowfurther comparisons between major plant taxa Alignmentgaps and low quality regions were manually removed usingJalview version 281 (Waterhouse et al 2009) Also speciescontaining more than 30 gaps or ambiguous sites were notkept for further analyses

De La Torre et al doi101093molbevmsx069 MBE

1372

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 7: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

sequence divergence between three Picea and one Taxus spe-cies and obtained l ranging from 55ndash124 10 10 FinallyBuschiazzo et al (2012) obtained a 25-fold difference whencomparing substitution rates between orthologs of Piceasitchensis and Pinus taeda with Arabidopsis thaliana and a4-fold difference when comparing the same Picea-Pinusorthologs with Populus trichocarpa

The pattern of slower rates of molecular evolution is con-sistent when grouping species according to their life formswith gymnosperms showing lower dS and l than angio-sperms herbs and angiosperms shrubs and trees This levelof evolutionary conservation (which probably help explainthe high levels of macrosynteny previously observed insome Pinaceae species) is surprising considering the ancientnature of the plant clade which appeared on Earth muchearlier than flowering plants

Higher Substitution Rate Ratio (x) in Gymnospermsthan in AngiospermsEqually surprising are the high substitution rate ratios (x)found in gymnosperms despite their slow rates of evolutionParameter estimates under models of variable x betweenstem branches of the angiosperm and gymnosperms lineagesbased on a concatenated alignment of 31737 nucleotide sitessupported a model of different and higher x in gymnospermsthan in angiosperms (table 2) In addition substitutionparameters based on seven different site models foundangiosperms rate ratios varied from 009ndash013 whereas gym-nosperms ratios varied from 017ndash067 Overall these esti-mates suggest a significantly higher chance of fixation ofnonsynonymous mutations than synonymous mutationsin gymnosperms than in angiosperm taxa (supplementarytable S5 Supplementary Material Online)

High x ratios could result from a high dN over a low dShowever this does not seem to be the case for gymnospermsInstead higher x ratios result from a low to moderate dNover a very low dS in gymnosperm genes Our results are incontrast with previous smaller scale studies either showingsignificantly lower dS and dN as a cause of a 4-fold higher x in

gymnosperms than in angiosperms (Buschiazzo et al 2012)or the ones that did not find significant differences in x ratiosamong gymnosperms and angiosperms (Chen et al 2012)These studies were limited by a small number of species thatwould not allow for evolutionary analyses within a phyloge-netic context When using a small number of species pairwiseestimates of dN dS and x will strongly depend on theselection of species In addition the evaluation of individualsites (site models) and the branch models have more powerto test variable x ratios than the pairwise sequence compar-isons in which x rates are averaged over all amino acidsites (Yang 2002) In the case of the Buschiazzorsquos study highlydiverged sets of species were used for comparisons(P sitchensisndashPtaeda and AthalianandashPtrichocarpa) High se-quence divergence between species is often associated withdifficulties in the alignment different codon usage biases andnucleotide compositions In addition saturation of substitu-tions may be particularly problematic when using pairwisemethods in comparison with branch methods

Positive Selection and Adaptive Evolution inGymnospermsUnder the Nearly Neutral Theory the strength and efficacy ofselection depends on the long-term effective population size(Ne) Theoretical approaches predict that as population sizesincrease the power of natural selection increases faster thanthe influx of new mutations (Akashi et al 2012) Naturalselection thus becomes more effective in removing deleteri-ous mutations and in fixing advantageous mutations whichresults in a lower substitution rate of deleterious mutationsand also a higher substitution rate of advantageous mutationsin large populations (Lanfear et al 2014)

Current estimates suggest that species with large popula-tion sizes such as Drosophila mice bacteria and rabbits showsigns of adaptive evolution (Bierne and Eyre-Walker 2004Carneiro et al 2012 Phifer-Rixey et al 2012) In contrast littleevidence of adaptive amino acid substitutions has been foundin flowering plants (mostly with modest Nelt 100000)(Gossmann et al 2010 Hough et al 2013) Few exceptions

Table 4 Correlation among Number of Synonymous (dS) and Nonsynonymous Substitutions (dN) Nonsynonymoussynonymous Rate Ratio(x) and the Absolute Rate of Silent-Site Divergence (l) with Life-History Traits Using the Phylogenetically Independent Contrast Method (PIC)

Species Correlations Taxonomic Family Correlations

Parameters Generation Time (years) Genome Size (1C) Generation Time (years) Genome Size (1C) Species Richness

dS r ndash0172 ndash0135 ndash0153 ndash0311 0703P-value 0223 0341 0558 0225 0002df 50 50 15 15 15

dN r 0005 0041 0189 ndash0026 0485P-value 0973 0771 0467 0922 0048df 50 50 15 15 15

M r ndash0183 ndash0037 ndash0781 ndash0670 0057P-value 0194 0797 0000 0003 0829df 50 50 15 15 15

X r 0212 0567 0482 0527 ndash0224P-value 0132 0000 0050 0030 0387df 50 50 15 15 15

NOTEmdashSubstitution rates and life-history traits were estimated by species pairs (species correlations table) and averaged within taxonomic family (taxonomic familycorrelations table) We reject a null hypothesis of correlation equal to zero when Plt 005 (shaded areas)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1369

have been found in angiosperms with Negt100000 such asCapsella grandiflora (Slotte et al 2010 Williamson et al 2014)Helianthus spp (Strasburg et al 2011) and Populus spp(Ingvarsson 2010 Wang et al 2016a 2016b) In gymnospermsprevious reports of Ne range from 120000ndash560000 for Piceaand Pinus species (Brown et al 2004 Bouille and Bosuquet2005 Syring et al 2007)

In our study we found a higher proportion of sites withpositive selection coefficients and evidence of stronger diver-sifying selection in gymnosperms than in angiosperms(Pfrac14 0033) Models that allow for sites under positive selec-tion such as M2a M3 and M8 all suggested the presence of avery small number (02ndash13) of weakly positively selectedsites (xfrac14 153ndash189) in angiosperm taxa in contrast to ahigher number (2313) of sites with moderate to strongdiversifying selection (xfrac14 221ndash329) in gymnosperm taxa Inaddition results of the BEB analysis using site model 2a iden-tified 241 (0759) sites under selection in gymnosperms andnone in angiosperms The same analysis using site model M8identified 546 (172) sites under selection in gymnospermsand 10 (003) in angiosperms Finally the NEB analysis usingsite model M3 identified 1960 (6175) sites in gymnospermsand only six (0018) in angiosperms Likelihood ratio tests forthe presence of diversifying selection were significant for Piceaand Pinus (M1andashM2a comparison) and for all taxa with theexception of Brassicaceae under M7 and M8 comparisonsUnder M3 Picea and Pinus showed the highest proportionsof sites under selection whereas Rosaceae showed the lowestproportion Our results are consistent with previous esti-mates of low levels of adaptive evolution in eleven angio-sperm species (Gossmann et al 2010) and with estimatesin Pinus contorta and Pinus taeda in which the proportionsof sites fixed by positive selection were 13ndash52 and 22ndash37respectively (Eckert et al 2013 Hodgins et al 2016)

There is a rich literature reporting the influence of positiveselection in Pinaceae species which in absence of reports inother gymnosperm families has frequently been used as anexample of all gymnosperm taxa The recent sequencingof some Pinaceae genomes have paved the way to test theinfluence of positive selection at the genome level (De LaTorre et al 2015a Hodgins et al 2016) which have confirmedwhat was previously found at the genetic (small number ofgenes) and population levels (Eckert et al 2010 Buschiazzoet al 2012 Pavy et al 2012 De La Torre et al 2015b)Furthermore studies on quantitative trait variation suggestnatural selection is highly efficient in producing a relativelyfast evolutionary response in contrast to the slow evolution-ary rate (Savolainen and Pyheuroajeuroarvi 2007) Examples of thisinclude the evolution of cold adapted genotypes in severallong-lived tree species in the northern hemisphere after theLast Glaciation (Mimura and Aitken 2007 Wachowiak et al2009 Holliday et al 2010 Kujala and Savolainen 2012 De LaTorre et al 2014b) In addition because of their long gener-ation times and low mutation rates gymnosperm genomesmay retain the consequences of demographic events for along time suggesting that even species that have contractedtheir ranges and currently have small distributions may re-semble those with long-term large effective population sizes

Although our results suggest an important role of positiveselection in the evolution of gymnosperm taxa we do notbelieve that Pinaceae is a good representative of other gym-nosperm taxa Pinus and Picea have indeed significantly highernumber of sites under diversifying selection than other gym-nosperm taxa Recent evolutionary radiations may have re-sulted in lower dS and higher x in comparisons within thesetaxonomic families (Palme et al 2009)

We expect our estimation of the proportion of sites underselection to be conservative because of the use of single copygenes Single copy genes are usually ubiquitously expressedgenes that encode for basic cellular functions that are pre-served across taxa For this reason they are likely to be understronger purifying selection due to functional and structuralconstraints than paralogous genes in gene families in bothangiosperms (De Smet et al 2013 Li et al 2016) andgymnosperms (De La Torre et al 2015a) These genes alsoseem to experience less frequent positive selection than paral-ogous genes which tend to be more narrowly expressed(Larracuente et al 2007) For this reason we expect a lowerproportion of conserved sites and a higher proportion ofselected sites in paralogous genes (which account for themajority of genes in plant genomes) In fact it has been sug-gested that many of the genes under diversifying selection inconifers belong to large multi-copy gene families such asLeucine Rich Repeats Cytochrome P450 among others(Pavy et al 2012 Neale et al 2014 De La Torre et al2015a) In addition some sites under positive selection maygo undetected because structural constraints may inducepurifying selection to push x to values lower than 1 in highlyconserved genes (Echave et al 2016) However the introduc-tion of paralogous genes in a phylogeny has confoundingeffects for both the phylogenetic reconstruction and alsofor the estimation of sequence divergence among taxa there-fore we believe than the use of single-copy genes is probablythe best method for comparisons among diverse taxa In ourefforts to identify a highly confident set of very well alignedorthologs across a broad taxonomic sampling the number ofgenes used in this study was significantly reduced raising con-cerns about the representativeness of these genes in agenome-wide context However comparisons with previousstudies suggest our estimates of rates of divergence are quitecomparable with estimates of rates with much larger data setsin specific taxa (Gossmann et al 2010 Eckert et al 2013Hodgins et al 2016)

Stronger and More Efficient Purifying Selection inGymnospermsWhen estimating the distribution of selection coefficientswe found that the majority of sites in both angiospermsand gymnosperms were highly conserved suggesting animportant role of purifying selection in the evolution of plantgenomes Previous studies in angiosperm species have sug-gested that the largest proportion of mutations is stronglydeleterious and that the number of slightly and mildly dele-terious mutations seems to be relatively conserved amongspecies (Gossmann et al 2010) In contrast with these studiesour study has found differences in the proportion of sites

De La Torre et al doi101093molbevmsx069 MBE

1370

under mildly and strong purifying selection Gymnospermfamilies had a greater proportion of sites under strong puri-fying selection (S lt ndash15) than angiosperms (t frac14 ndash357Plt 005) whereas angiosperm families had a greater propor-tion of neutral and mildly deleterious (S 0 and S gtndash15)sites than gymnosperms (tfrac14 4119 Plt 005) (fig 2)

This could be explained by the expectation that specieswith larger Ne generally experience stronger purifying selec-tion (Slotte et al 2010 Gossmann et al 2010 Hough et al2013 Williamson et al 2014) Slightly deleterious mutationswill be less likely to segregate at higher frequencies and fixin large populations (Akashi et al 2012) Such mutations arealso selected against by the partial low degree of selfing andhigh early inbreeding depression in conifers (Williams andSavolainen 1996 Remington and OrsquoMalley 2000) Larger pop-ulation sizes may also explain the lower number of neutralsites in gymnosperms than in angiosperms found in thisstudy as it has been shown that the number of effectivelyneutral mutations is negatively correlated with Ne in manyspecies (Piganeau and Eyre-Walker 2009 Gossmann et al2010)

Life Form Does Not Explain Differences in MolecularEvolution among Woody Angiosperms and WoodyGymnosperms or within GymnospermsIn flowering plants it has been recently suggested that lifeform (also measured as plant height) is correlated with ratesof molecular evolution with shrubs and trees showing lowerrates of molecular evolution than herbaceous plants (Gautet al 1992 Smith and Donoghue 2008 Lanfear et al 2013)Similarly in our study we found significantly lower l onaverage in angiosperm shrubstrees compared to angiospermherbs (see supplementary table S6 Supplementary MaterialOnline and fig 2) Both dS and dN showed large variationamong angiosperm herbs and differences in dS and dNamong angiosperm shrubstrees and herbs were not signifi-cant (see supplementary table S6 Supplementary MaterialOnline) When incorporating gymnosperms into the analysislife form on its own does not explain the differences in rates ofmolecular evolution among seed plants as we observe signif-icant differences in dN dS x and l also among woodyangiosperms and woody gymnosperms (fig 2)

Our analyses showed that Gnetophytes the only gymno-sperm division that contains non-woody genera (Welwitschiaand Ephedra) shows higher dN and l than what we observein other gymnosperm families (table 1) Therefore differencesin life form may drive differences in the rate of molecularevolution found in Gnetophytes However we still observethese differences when only including the woody genusGnetum (and excluding Ephedra and Welwitschia) suggestingother factors may influence the differences in rate of sequenceevolution (table 1) In fact Gnetophytes are different from therest of gymnosperms in a number of features that include butare not restricted to their morphology ecology and thepresence of angiosperm-like characteristics such as specialwater-conducting wood vessels and reproductive structuresorganized in compound strobili (Doyle and Donoghue 1986

Friedman 1998) For these reasons the position ofGnetophytes within gymnosperms is contentious (Doyle1998 Braukmann et al 2009 Cibrian-Jaramillo et al 2010Ran et al 2010 Xi et al 2013) In any case the lack of closeextant relatives to Gnetophytes suggests that this problemmight never really be solved satisfactorily regardless of theamount of sequence data we have access to Our resultssuggest that estimations of rates of sequence divergenceamong a few species from very diverged Gnetophyte taxa(such as Gnetum Welwitschia and Ephedra) may lead toinaccurate estimates of dN and dS For this reason we de-cided to exclude the more distant Welwitschia and Ephedrafrom all correlations with life history traits

Generation Times and Their Effect on LowSubstitution RatesThe differences in rates of molecular evolution between an-giosperms with different life forms have been presumed toreflect differences in generation times between herbaceousand woody plants (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013) The generation time hypothesissuggests that species with shorter generation times (eg her-baceous plants) accumulate more replication errors per unittime because they copy their genomes more often whichresults in higher mutation rates (Li et al 1996) Althoughgeneration time may not be a good indicator of the overallrate of genome replication because the number of mitotic celldivisions can vary substantially between generations andamong plant species it remains strongly associated withthe long-term rates of meiosis in plants (Petit and Hampe2006 Lanfear et al 2013) Therefore if a significant proportionof heritable mutations occur during meiosis plants with lon-ger generation times (eg gymnosperms) would have lowermutation and substitution rates per unit of time (Lanfearet al 2013) Alternatively it is been proposed that differencesin the rates of mitosis (mitotic cell divisions that occur in theapical meristem before gametogenesis) can account for theobserved differences in rates of molecular evolution amongplants of different height (Lanfear et al 2013) Lower absolutegrowth rates in long-lived woody perennials would translatein fewer cell divisions and less opportunities for DNA repli-cations errors than short-lived plants (Bromham et al 2015)This would result in a lower mutation rate per unit of time inlong-lived species Our results are consistent with this expec-tation as we found a strong negative correlation betweengeneration times and l (r frac14 ndash078 Plt 0001) (table 4)Long generation times large effective population sizes andlow recombination rates may also help explain the low syn-onymous polymorphism in gymnosperms (Savolainen andPyheuroajeuroarvi 2007 Jaramillo-Correa et al 2010)

In plants in the absence of a segregated germline gametesarise from the apical meristem late in development followingperiods of vegetative growth Because of this somatic muta-tions acquired during vegetative growth can be transmittedto the next generation (Watson et al 2016) Longer growthperiods are thought to result in a larger number of cell divi-sions increasing the opportunities for mutations to occur pergeneration (Schultz and Scofield 2009) Our results suggest

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1371

than on average for the species under study long-lived gym-nosperms accumulate four times more mutations per gener-ation than short-lived angiosperms (assuming an averagegeneration time of 20 years in gymnosperms and an averagel equal to 157 10 8 in gymnosperms and 392 10 9

in angiosperms) This is coincident with previous reports ofincreased number of somatic mutations in gymnospermsspecies (Cloutier et al 2003 OrsquoConnell and Ritland 2004)and other woody perennials (Ally et al 2010 Bobiwashet al 2013) However because the per-generation increasein mutation rate in woody perennials may be less thanpredicted from their differences in generation time (Petitand Hampe 2006) we would expect our reported valuesto be an over-estimation of the actual difference betweenplant seed clades Differences in generation times wouldpredict a mutation rategt100 times larger in mangrovetrees than in annuals when estimates were only 25 timeslarger (Klekowski and Godfrey 1989) Alternatively theremay be selection to reduce DNA replication-dependenterrors through minimizing the number of cell divisionsrequired during development as recently suggested inArabidopsis (Watson et al 2016)

Larger Genomes Are Correlated with Larger Omegasand Slower Rates of Sequence DivergenceIn our study we also explored the relationship between ge-nome size and rates of molecular evolution Gymnospermgenomes are characterized by their enormous genome sizes(20ndash40 Gb) unique genome silencing mechanisms and lowunequal recombination (Leitch and Leitch 2012 Nystedt et al2013) While angiosperm genomes are highly dynamic andhave efficient mechanisms to counteract the increase in DNAamount stemming from WGDs or transposable elements (egreplication or recombination-based errors generating indelsunequal recombination between sister chromosomes Groveland Wendel 2010) gymnosperm genomes seem to be lessdynamic and may have evolved their own epigenetic mech-anisms to silence retrotransposons (Leitch and Leitch 2012)Our results indicate that genome size is positively correlatedwith x (rfrac14 056 Plt 0001 at the species level and rfrac14 052Plt 005 at the family level) In addition genome size is neg-atively correlated with l (r frac14 ndash067 Plt 001) suggestingspecies with large genomes such as gymnosperms have slowerrates of molecular evolution than species with small genomesconfirming previous studies on angiosperms species(Bromham et al 2015) A possible explanation for this isthat if large genome sizes correlate with larger cells and areduction in growth then plants with larger genomes mighthave fewer replications and therefore less accumulated mu-tations per unit of time (Bromham et al 2015)

Our study aims to contribute to the unexplored field ofmolecular evolution in seed plants by investigating thedifferences in rates of molecular evolution among gym-nosperms and angiosperms using a phylogenetic frame-work The recent genomic and transcriptomic resourcesin gymnosperm species opened a window to understandthe evolution of this ancient and important plant cladeHowever gymnosperm resources are still limited in

comparison to their sister clade of flowering plants Wehope that with the development of new genomic re-sources studies in molecular evolution will include abroader taxonomic sampling that provides us with amore complete understanding of the evolution of seedplants

Materials and Methods

Generation of Phylogenetic MarkersTo develop our set of phylogenetic markers we performeddeep sequencing and assembly of whole transcriptomesfrom two conifer species (Pinus pinaster and Pinus sylvestris)(Li Z et al (in review) transcriptome data can be found athttpbioinformaticspsbugentbesupplementary_datazheliphylo) We integrated this data with whole-transcriptomedata from another 29 gymnosperms 34 angiosperms and 1outgroup (Physcomitrella patens) obtained from public data-bases PlantGDB oneKP TreeGenes and PLAZA v30 (Proostet al 2015) We used OrthoMCL (Li et al 2003) to buildorthologous gene families across the species To reduce thenumber of single-copy genes we selected only the gene fam-ilies that were conserved and had low copy number Later weused hidden Markov probabilistic models implemented inHMMER to build an HMM profile for each gene family basedon multiple sequence alignment (Eddy 2009) These HMMprofiles were then used to assign additional proteins to theexisting gene families Then we used HMMSEARCH to findthe best protein hit of an HMM profile in each species andselected the markers with only reciprocal best hits for phylo-genomic analysis In order to increase the spectrum of thephylogenetic markers we only selected single-copy markersthat were present in a majority of species

Multiple sequence alignments were carried out for eachgene family based on amino acid sequences using Musclev3831 (Edgar et al 2004) Trimal v14 (Capella-Gutierrezet al 2009) was used to back translate the amino acid align-ments into coding sequence alignments and to remove lowquality alignment regions and spurious sequences We de-fined a spurious sequence as a sequence having less than70 of the total alignment positions that were present in75 of 66 species Genes that lost their sequences in morethan 10 of the species after removing spurious sequenceswere not used for further analyses This way we obtained 42single-copy markers across 66 seed plant species that wereused for evolutionary and phylogenetic analyses Annotationsof all genes are reported in supplementary table S1Supplementary Material Online

Rates of Sequence DivergenceMultiple sequence alignments for each of the 42 single-copygenes were divided in two groups containing either angio-sperms (34 species) or gymnosperms (31 species) to allowfurther comparisons between major plant taxa Alignmentgaps and low quality regions were manually removed usingJalview version 281 (Waterhouse et al 2009) Also speciescontaining more than 30 gaps or ambiguous sites were notkept for further analyses

De La Torre et al doi101093molbevmsx069 MBE

1372

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 8: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

have been found in angiosperms with Negt100000 such asCapsella grandiflora (Slotte et al 2010 Williamson et al 2014)Helianthus spp (Strasburg et al 2011) and Populus spp(Ingvarsson 2010 Wang et al 2016a 2016b) In gymnospermsprevious reports of Ne range from 120000ndash560000 for Piceaand Pinus species (Brown et al 2004 Bouille and Bosuquet2005 Syring et al 2007)

In our study we found a higher proportion of sites withpositive selection coefficients and evidence of stronger diver-sifying selection in gymnosperms than in angiosperms(Pfrac14 0033) Models that allow for sites under positive selec-tion such as M2a M3 and M8 all suggested the presence of avery small number (02ndash13) of weakly positively selectedsites (xfrac14 153ndash189) in angiosperm taxa in contrast to ahigher number (2313) of sites with moderate to strongdiversifying selection (xfrac14 221ndash329) in gymnosperm taxa Inaddition results of the BEB analysis using site model 2a iden-tified 241 (0759) sites under selection in gymnosperms andnone in angiosperms The same analysis using site model M8identified 546 (172) sites under selection in gymnospermsand 10 (003) in angiosperms Finally the NEB analysis usingsite model M3 identified 1960 (6175) sites in gymnospermsand only six (0018) in angiosperms Likelihood ratio tests forthe presence of diversifying selection were significant for Piceaand Pinus (M1andashM2a comparison) and for all taxa with theexception of Brassicaceae under M7 and M8 comparisonsUnder M3 Picea and Pinus showed the highest proportionsof sites under selection whereas Rosaceae showed the lowestproportion Our results are consistent with previous esti-mates of low levels of adaptive evolution in eleven angio-sperm species (Gossmann et al 2010) and with estimatesin Pinus contorta and Pinus taeda in which the proportionsof sites fixed by positive selection were 13ndash52 and 22ndash37respectively (Eckert et al 2013 Hodgins et al 2016)

There is a rich literature reporting the influence of positiveselection in Pinaceae species which in absence of reports inother gymnosperm families has frequently been used as anexample of all gymnosperm taxa The recent sequencingof some Pinaceae genomes have paved the way to test theinfluence of positive selection at the genome level (De LaTorre et al 2015a Hodgins et al 2016) which have confirmedwhat was previously found at the genetic (small number ofgenes) and population levels (Eckert et al 2010 Buschiazzoet al 2012 Pavy et al 2012 De La Torre et al 2015b)Furthermore studies on quantitative trait variation suggestnatural selection is highly efficient in producing a relativelyfast evolutionary response in contrast to the slow evolution-ary rate (Savolainen and Pyheuroajeuroarvi 2007) Examples of thisinclude the evolution of cold adapted genotypes in severallong-lived tree species in the northern hemisphere after theLast Glaciation (Mimura and Aitken 2007 Wachowiak et al2009 Holliday et al 2010 Kujala and Savolainen 2012 De LaTorre et al 2014b) In addition because of their long gener-ation times and low mutation rates gymnosperm genomesmay retain the consequences of demographic events for along time suggesting that even species that have contractedtheir ranges and currently have small distributions may re-semble those with long-term large effective population sizes

Although our results suggest an important role of positiveselection in the evolution of gymnosperm taxa we do notbelieve that Pinaceae is a good representative of other gym-nosperm taxa Pinus and Picea have indeed significantly highernumber of sites under diversifying selection than other gym-nosperm taxa Recent evolutionary radiations may have re-sulted in lower dS and higher x in comparisons within thesetaxonomic families (Palme et al 2009)

We expect our estimation of the proportion of sites underselection to be conservative because of the use of single copygenes Single copy genes are usually ubiquitously expressedgenes that encode for basic cellular functions that are pre-served across taxa For this reason they are likely to be understronger purifying selection due to functional and structuralconstraints than paralogous genes in gene families in bothangiosperms (De Smet et al 2013 Li et al 2016) andgymnosperms (De La Torre et al 2015a) These genes alsoseem to experience less frequent positive selection than paral-ogous genes which tend to be more narrowly expressed(Larracuente et al 2007) For this reason we expect a lowerproportion of conserved sites and a higher proportion ofselected sites in paralogous genes (which account for themajority of genes in plant genomes) In fact it has been sug-gested that many of the genes under diversifying selection inconifers belong to large multi-copy gene families such asLeucine Rich Repeats Cytochrome P450 among others(Pavy et al 2012 Neale et al 2014 De La Torre et al2015a) In addition some sites under positive selection maygo undetected because structural constraints may inducepurifying selection to push x to values lower than 1 in highlyconserved genes (Echave et al 2016) However the introduc-tion of paralogous genes in a phylogeny has confoundingeffects for both the phylogenetic reconstruction and alsofor the estimation of sequence divergence among taxa there-fore we believe than the use of single-copy genes is probablythe best method for comparisons among diverse taxa In ourefforts to identify a highly confident set of very well alignedorthologs across a broad taxonomic sampling the number ofgenes used in this study was significantly reduced raising con-cerns about the representativeness of these genes in agenome-wide context However comparisons with previousstudies suggest our estimates of rates of divergence are quitecomparable with estimates of rates with much larger data setsin specific taxa (Gossmann et al 2010 Eckert et al 2013Hodgins et al 2016)

Stronger and More Efficient Purifying Selection inGymnospermsWhen estimating the distribution of selection coefficientswe found that the majority of sites in both angiospermsand gymnosperms were highly conserved suggesting animportant role of purifying selection in the evolution of plantgenomes Previous studies in angiosperm species have sug-gested that the largest proportion of mutations is stronglydeleterious and that the number of slightly and mildly dele-terious mutations seems to be relatively conserved amongspecies (Gossmann et al 2010) In contrast with these studiesour study has found differences in the proportion of sites

De La Torre et al doi101093molbevmsx069 MBE

1370

under mildly and strong purifying selection Gymnospermfamilies had a greater proportion of sites under strong puri-fying selection (S lt ndash15) than angiosperms (t frac14 ndash357Plt 005) whereas angiosperm families had a greater propor-tion of neutral and mildly deleterious (S 0 and S gtndash15)sites than gymnosperms (tfrac14 4119 Plt 005) (fig 2)

This could be explained by the expectation that specieswith larger Ne generally experience stronger purifying selec-tion (Slotte et al 2010 Gossmann et al 2010 Hough et al2013 Williamson et al 2014) Slightly deleterious mutationswill be less likely to segregate at higher frequencies and fixin large populations (Akashi et al 2012) Such mutations arealso selected against by the partial low degree of selfing andhigh early inbreeding depression in conifers (Williams andSavolainen 1996 Remington and OrsquoMalley 2000) Larger pop-ulation sizes may also explain the lower number of neutralsites in gymnosperms than in angiosperms found in thisstudy as it has been shown that the number of effectivelyneutral mutations is negatively correlated with Ne in manyspecies (Piganeau and Eyre-Walker 2009 Gossmann et al2010)

Life Form Does Not Explain Differences in MolecularEvolution among Woody Angiosperms and WoodyGymnosperms or within GymnospermsIn flowering plants it has been recently suggested that lifeform (also measured as plant height) is correlated with ratesof molecular evolution with shrubs and trees showing lowerrates of molecular evolution than herbaceous plants (Gautet al 1992 Smith and Donoghue 2008 Lanfear et al 2013)Similarly in our study we found significantly lower l onaverage in angiosperm shrubstrees compared to angiospermherbs (see supplementary table S6 Supplementary MaterialOnline and fig 2) Both dS and dN showed large variationamong angiosperm herbs and differences in dS and dNamong angiosperm shrubstrees and herbs were not signifi-cant (see supplementary table S6 Supplementary MaterialOnline) When incorporating gymnosperms into the analysislife form on its own does not explain the differences in rates ofmolecular evolution among seed plants as we observe signif-icant differences in dN dS x and l also among woodyangiosperms and woody gymnosperms (fig 2)

Our analyses showed that Gnetophytes the only gymno-sperm division that contains non-woody genera (Welwitschiaand Ephedra) shows higher dN and l than what we observein other gymnosperm families (table 1) Therefore differencesin life form may drive differences in the rate of molecularevolution found in Gnetophytes However we still observethese differences when only including the woody genusGnetum (and excluding Ephedra and Welwitschia) suggestingother factors may influence the differences in rate of sequenceevolution (table 1) In fact Gnetophytes are different from therest of gymnosperms in a number of features that include butare not restricted to their morphology ecology and thepresence of angiosperm-like characteristics such as specialwater-conducting wood vessels and reproductive structuresorganized in compound strobili (Doyle and Donoghue 1986

Friedman 1998) For these reasons the position ofGnetophytes within gymnosperms is contentious (Doyle1998 Braukmann et al 2009 Cibrian-Jaramillo et al 2010Ran et al 2010 Xi et al 2013) In any case the lack of closeextant relatives to Gnetophytes suggests that this problemmight never really be solved satisfactorily regardless of theamount of sequence data we have access to Our resultssuggest that estimations of rates of sequence divergenceamong a few species from very diverged Gnetophyte taxa(such as Gnetum Welwitschia and Ephedra) may lead toinaccurate estimates of dN and dS For this reason we de-cided to exclude the more distant Welwitschia and Ephedrafrom all correlations with life history traits

Generation Times and Their Effect on LowSubstitution RatesThe differences in rates of molecular evolution between an-giosperms with different life forms have been presumed toreflect differences in generation times between herbaceousand woody plants (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013) The generation time hypothesissuggests that species with shorter generation times (eg her-baceous plants) accumulate more replication errors per unittime because they copy their genomes more often whichresults in higher mutation rates (Li et al 1996) Althoughgeneration time may not be a good indicator of the overallrate of genome replication because the number of mitotic celldivisions can vary substantially between generations andamong plant species it remains strongly associated withthe long-term rates of meiosis in plants (Petit and Hampe2006 Lanfear et al 2013) Therefore if a significant proportionof heritable mutations occur during meiosis plants with lon-ger generation times (eg gymnosperms) would have lowermutation and substitution rates per unit of time (Lanfearet al 2013) Alternatively it is been proposed that differencesin the rates of mitosis (mitotic cell divisions that occur in theapical meristem before gametogenesis) can account for theobserved differences in rates of molecular evolution amongplants of different height (Lanfear et al 2013) Lower absolutegrowth rates in long-lived woody perennials would translatein fewer cell divisions and less opportunities for DNA repli-cations errors than short-lived plants (Bromham et al 2015)This would result in a lower mutation rate per unit of time inlong-lived species Our results are consistent with this expec-tation as we found a strong negative correlation betweengeneration times and l (r frac14 ndash078 Plt 0001) (table 4)Long generation times large effective population sizes andlow recombination rates may also help explain the low syn-onymous polymorphism in gymnosperms (Savolainen andPyheuroajeuroarvi 2007 Jaramillo-Correa et al 2010)

In plants in the absence of a segregated germline gametesarise from the apical meristem late in development followingperiods of vegetative growth Because of this somatic muta-tions acquired during vegetative growth can be transmittedto the next generation (Watson et al 2016) Longer growthperiods are thought to result in a larger number of cell divi-sions increasing the opportunities for mutations to occur pergeneration (Schultz and Scofield 2009) Our results suggest

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1371

than on average for the species under study long-lived gym-nosperms accumulate four times more mutations per gener-ation than short-lived angiosperms (assuming an averagegeneration time of 20 years in gymnosperms and an averagel equal to 157 10 8 in gymnosperms and 392 10 9

in angiosperms) This is coincident with previous reports ofincreased number of somatic mutations in gymnospermsspecies (Cloutier et al 2003 OrsquoConnell and Ritland 2004)and other woody perennials (Ally et al 2010 Bobiwashet al 2013) However because the per-generation increasein mutation rate in woody perennials may be less thanpredicted from their differences in generation time (Petitand Hampe 2006) we would expect our reported valuesto be an over-estimation of the actual difference betweenplant seed clades Differences in generation times wouldpredict a mutation rategt100 times larger in mangrovetrees than in annuals when estimates were only 25 timeslarger (Klekowski and Godfrey 1989) Alternatively theremay be selection to reduce DNA replication-dependenterrors through minimizing the number of cell divisionsrequired during development as recently suggested inArabidopsis (Watson et al 2016)

Larger Genomes Are Correlated with Larger Omegasand Slower Rates of Sequence DivergenceIn our study we also explored the relationship between ge-nome size and rates of molecular evolution Gymnospermgenomes are characterized by their enormous genome sizes(20ndash40 Gb) unique genome silencing mechanisms and lowunequal recombination (Leitch and Leitch 2012 Nystedt et al2013) While angiosperm genomes are highly dynamic andhave efficient mechanisms to counteract the increase in DNAamount stemming from WGDs or transposable elements (egreplication or recombination-based errors generating indelsunequal recombination between sister chromosomes Groveland Wendel 2010) gymnosperm genomes seem to be lessdynamic and may have evolved their own epigenetic mech-anisms to silence retrotransposons (Leitch and Leitch 2012)Our results indicate that genome size is positively correlatedwith x (rfrac14 056 Plt 0001 at the species level and rfrac14 052Plt 005 at the family level) In addition genome size is neg-atively correlated with l (r frac14 ndash067 Plt 001) suggestingspecies with large genomes such as gymnosperms have slowerrates of molecular evolution than species with small genomesconfirming previous studies on angiosperms species(Bromham et al 2015) A possible explanation for this isthat if large genome sizes correlate with larger cells and areduction in growth then plants with larger genomes mighthave fewer replications and therefore less accumulated mu-tations per unit of time (Bromham et al 2015)

Our study aims to contribute to the unexplored field ofmolecular evolution in seed plants by investigating thedifferences in rates of molecular evolution among gym-nosperms and angiosperms using a phylogenetic frame-work The recent genomic and transcriptomic resourcesin gymnosperm species opened a window to understandthe evolution of this ancient and important plant cladeHowever gymnosperm resources are still limited in

comparison to their sister clade of flowering plants Wehope that with the development of new genomic re-sources studies in molecular evolution will include abroader taxonomic sampling that provides us with amore complete understanding of the evolution of seedplants

Materials and Methods

Generation of Phylogenetic MarkersTo develop our set of phylogenetic markers we performeddeep sequencing and assembly of whole transcriptomesfrom two conifer species (Pinus pinaster and Pinus sylvestris)(Li Z et al (in review) transcriptome data can be found athttpbioinformaticspsbugentbesupplementary_datazheliphylo) We integrated this data with whole-transcriptomedata from another 29 gymnosperms 34 angiosperms and 1outgroup (Physcomitrella patens) obtained from public data-bases PlantGDB oneKP TreeGenes and PLAZA v30 (Proostet al 2015) We used OrthoMCL (Li et al 2003) to buildorthologous gene families across the species To reduce thenumber of single-copy genes we selected only the gene fam-ilies that were conserved and had low copy number Later weused hidden Markov probabilistic models implemented inHMMER to build an HMM profile for each gene family basedon multiple sequence alignment (Eddy 2009) These HMMprofiles were then used to assign additional proteins to theexisting gene families Then we used HMMSEARCH to findthe best protein hit of an HMM profile in each species andselected the markers with only reciprocal best hits for phylo-genomic analysis In order to increase the spectrum of thephylogenetic markers we only selected single-copy markersthat were present in a majority of species

Multiple sequence alignments were carried out for eachgene family based on amino acid sequences using Musclev3831 (Edgar et al 2004) Trimal v14 (Capella-Gutierrezet al 2009) was used to back translate the amino acid align-ments into coding sequence alignments and to remove lowquality alignment regions and spurious sequences We de-fined a spurious sequence as a sequence having less than70 of the total alignment positions that were present in75 of 66 species Genes that lost their sequences in morethan 10 of the species after removing spurious sequenceswere not used for further analyses This way we obtained 42single-copy markers across 66 seed plant species that wereused for evolutionary and phylogenetic analyses Annotationsof all genes are reported in supplementary table S1Supplementary Material Online

Rates of Sequence DivergenceMultiple sequence alignments for each of the 42 single-copygenes were divided in two groups containing either angio-sperms (34 species) or gymnosperms (31 species) to allowfurther comparisons between major plant taxa Alignmentgaps and low quality regions were manually removed usingJalview version 281 (Waterhouse et al 2009) Also speciescontaining more than 30 gaps or ambiguous sites were notkept for further analyses

De La Torre et al doi101093molbevmsx069 MBE

1372

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 9: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

under mildly and strong purifying selection Gymnospermfamilies had a greater proportion of sites under strong puri-fying selection (S lt ndash15) than angiosperms (t frac14 ndash357Plt 005) whereas angiosperm families had a greater propor-tion of neutral and mildly deleterious (S 0 and S gtndash15)sites than gymnosperms (tfrac14 4119 Plt 005) (fig 2)

This could be explained by the expectation that specieswith larger Ne generally experience stronger purifying selec-tion (Slotte et al 2010 Gossmann et al 2010 Hough et al2013 Williamson et al 2014) Slightly deleterious mutationswill be less likely to segregate at higher frequencies and fixin large populations (Akashi et al 2012) Such mutations arealso selected against by the partial low degree of selfing andhigh early inbreeding depression in conifers (Williams andSavolainen 1996 Remington and OrsquoMalley 2000) Larger pop-ulation sizes may also explain the lower number of neutralsites in gymnosperms than in angiosperms found in thisstudy as it has been shown that the number of effectivelyneutral mutations is negatively correlated with Ne in manyspecies (Piganeau and Eyre-Walker 2009 Gossmann et al2010)

Life Form Does Not Explain Differences in MolecularEvolution among Woody Angiosperms and WoodyGymnosperms or within GymnospermsIn flowering plants it has been recently suggested that lifeform (also measured as plant height) is correlated with ratesof molecular evolution with shrubs and trees showing lowerrates of molecular evolution than herbaceous plants (Gautet al 1992 Smith and Donoghue 2008 Lanfear et al 2013)Similarly in our study we found significantly lower l onaverage in angiosperm shrubstrees compared to angiospermherbs (see supplementary table S6 Supplementary MaterialOnline and fig 2) Both dS and dN showed large variationamong angiosperm herbs and differences in dS and dNamong angiosperm shrubstrees and herbs were not signifi-cant (see supplementary table S6 Supplementary MaterialOnline) When incorporating gymnosperms into the analysislife form on its own does not explain the differences in rates ofmolecular evolution among seed plants as we observe signif-icant differences in dN dS x and l also among woodyangiosperms and woody gymnosperms (fig 2)

Our analyses showed that Gnetophytes the only gymno-sperm division that contains non-woody genera (Welwitschiaand Ephedra) shows higher dN and l than what we observein other gymnosperm families (table 1) Therefore differencesin life form may drive differences in the rate of molecularevolution found in Gnetophytes However we still observethese differences when only including the woody genusGnetum (and excluding Ephedra and Welwitschia) suggestingother factors may influence the differences in rate of sequenceevolution (table 1) In fact Gnetophytes are different from therest of gymnosperms in a number of features that include butare not restricted to their morphology ecology and thepresence of angiosperm-like characteristics such as specialwater-conducting wood vessels and reproductive structuresorganized in compound strobili (Doyle and Donoghue 1986

Friedman 1998) For these reasons the position ofGnetophytes within gymnosperms is contentious (Doyle1998 Braukmann et al 2009 Cibrian-Jaramillo et al 2010Ran et al 2010 Xi et al 2013) In any case the lack of closeextant relatives to Gnetophytes suggests that this problemmight never really be solved satisfactorily regardless of theamount of sequence data we have access to Our resultssuggest that estimations of rates of sequence divergenceamong a few species from very diverged Gnetophyte taxa(such as Gnetum Welwitschia and Ephedra) may lead toinaccurate estimates of dN and dS For this reason we de-cided to exclude the more distant Welwitschia and Ephedrafrom all correlations with life history traits

Generation Times and Their Effect on LowSubstitution RatesThe differences in rates of molecular evolution between an-giosperms with different life forms have been presumed toreflect differences in generation times between herbaceousand woody plants (Gaut et al 1992 Smith and Donoghue2008 Lanfear et al 2013) The generation time hypothesissuggests that species with shorter generation times (eg her-baceous plants) accumulate more replication errors per unittime because they copy their genomes more often whichresults in higher mutation rates (Li et al 1996) Althoughgeneration time may not be a good indicator of the overallrate of genome replication because the number of mitotic celldivisions can vary substantially between generations andamong plant species it remains strongly associated withthe long-term rates of meiosis in plants (Petit and Hampe2006 Lanfear et al 2013) Therefore if a significant proportionof heritable mutations occur during meiosis plants with lon-ger generation times (eg gymnosperms) would have lowermutation and substitution rates per unit of time (Lanfearet al 2013) Alternatively it is been proposed that differencesin the rates of mitosis (mitotic cell divisions that occur in theapical meristem before gametogenesis) can account for theobserved differences in rates of molecular evolution amongplants of different height (Lanfear et al 2013) Lower absolutegrowth rates in long-lived woody perennials would translatein fewer cell divisions and less opportunities for DNA repli-cations errors than short-lived plants (Bromham et al 2015)This would result in a lower mutation rate per unit of time inlong-lived species Our results are consistent with this expec-tation as we found a strong negative correlation betweengeneration times and l (r frac14 ndash078 Plt 0001) (table 4)Long generation times large effective population sizes andlow recombination rates may also help explain the low syn-onymous polymorphism in gymnosperms (Savolainen andPyheuroajeuroarvi 2007 Jaramillo-Correa et al 2010)

In plants in the absence of a segregated germline gametesarise from the apical meristem late in development followingperiods of vegetative growth Because of this somatic muta-tions acquired during vegetative growth can be transmittedto the next generation (Watson et al 2016) Longer growthperiods are thought to result in a larger number of cell divi-sions increasing the opportunities for mutations to occur pergeneration (Schultz and Scofield 2009) Our results suggest

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1371

than on average for the species under study long-lived gym-nosperms accumulate four times more mutations per gener-ation than short-lived angiosperms (assuming an averagegeneration time of 20 years in gymnosperms and an averagel equal to 157 10 8 in gymnosperms and 392 10 9

in angiosperms) This is coincident with previous reports ofincreased number of somatic mutations in gymnospermsspecies (Cloutier et al 2003 OrsquoConnell and Ritland 2004)and other woody perennials (Ally et al 2010 Bobiwashet al 2013) However because the per-generation increasein mutation rate in woody perennials may be less thanpredicted from their differences in generation time (Petitand Hampe 2006) we would expect our reported valuesto be an over-estimation of the actual difference betweenplant seed clades Differences in generation times wouldpredict a mutation rategt100 times larger in mangrovetrees than in annuals when estimates were only 25 timeslarger (Klekowski and Godfrey 1989) Alternatively theremay be selection to reduce DNA replication-dependenterrors through minimizing the number of cell divisionsrequired during development as recently suggested inArabidopsis (Watson et al 2016)

Larger Genomes Are Correlated with Larger Omegasand Slower Rates of Sequence DivergenceIn our study we also explored the relationship between ge-nome size and rates of molecular evolution Gymnospermgenomes are characterized by their enormous genome sizes(20ndash40 Gb) unique genome silencing mechanisms and lowunequal recombination (Leitch and Leitch 2012 Nystedt et al2013) While angiosperm genomes are highly dynamic andhave efficient mechanisms to counteract the increase in DNAamount stemming from WGDs or transposable elements (egreplication or recombination-based errors generating indelsunequal recombination between sister chromosomes Groveland Wendel 2010) gymnosperm genomes seem to be lessdynamic and may have evolved their own epigenetic mech-anisms to silence retrotransposons (Leitch and Leitch 2012)Our results indicate that genome size is positively correlatedwith x (rfrac14 056 Plt 0001 at the species level and rfrac14 052Plt 005 at the family level) In addition genome size is neg-atively correlated with l (r frac14 ndash067 Plt 001) suggestingspecies with large genomes such as gymnosperms have slowerrates of molecular evolution than species with small genomesconfirming previous studies on angiosperms species(Bromham et al 2015) A possible explanation for this isthat if large genome sizes correlate with larger cells and areduction in growth then plants with larger genomes mighthave fewer replications and therefore less accumulated mu-tations per unit of time (Bromham et al 2015)

Our study aims to contribute to the unexplored field ofmolecular evolution in seed plants by investigating thedifferences in rates of molecular evolution among gym-nosperms and angiosperms using a phylogenetic frame-work The recent genomic and transcriptomic resourcesin gymnosperm species opened a window to understandthe evolution of this ancient and important plant cladeHowever gymnosperm resources are still limited in

comparison to their sister clade of flowering plants Wehope that with the development of new genomic re-sources studies in molecular evolution will include abroader taxonomic sampling that provides us with amore complete understanding of the evolution of seedplants

Materials and Methods

Generation of Phylogenetic MarkersTo develop our set of phylogenetic markers we performeddeep sequencing and assembly of whole transcriptomesfrom two conifer species (Pinus pinaster and Pinus sylvestris)(Li Z et al (in review) transcriptome data can be found athttpbioinformaticspsbugentbesupplementary_datazheliphylo) We integrated this data with whole-transcriptomedata from another 29 gymnosperms 34 angiosperms and 1outgroup (Physcomitrella patens) obtained from public data-bases PlantGDB oneKP TreeGenes and PLAZA v30 (Proostet al 2015) We used OrthoMCL (Li et al 2003) to buildorthologous gene families across the species To reduce thenumber of single-copy genes we selected only the gene fam-ilies that were conserved and had low copy number Later weused hidden Markov probabilistic models implemented inHMMER to build an HMM profile for each gene family basedon multiple sequence alignment (Eddy 2009) These HMMprofiles were then used to assign additional proteins to theexisting gene families Then we used HMMSEARCH to findthe best protein hit of an HMM profile in each species andselected the markers with only reciprocal best hits for phylo-genomic analysis In order to increase the spectrum of thephylogenetic markers we only selected single-copy markersthat were present in a majority of species

Multiple sequence alignments were carried out for eachgene family based on amino acid sequences using Musclev3831 (Edgar et al 2004) Trimal v14 (Capella-Gutierrezet al 2009) was used to back translate the amino acid align-ments into coding sequence alignments and to remove lowquality alignment regions and spurious sequences We de-fined a spurious sequence as a sequence having less than70 of the total alignment positions that were present in75 of 66 species Genes that lost their sequences in morethan 10 of the species after removing spurious sequenceswere not used for further analyses This way we obtained 42single-copy markers across 66 seed plant species that wereused for evolutionary and phylogenetic analyses Annotationsof all genes are reported in supplementary table S1Supplementary Material Online

Rates of Sequence DivergenceMultiple sequence alignments for each of the 42 single-copygenes were divided in two groups containing either angio-sperms (34 species) or gymnosperms (31 species) to allowfurther comparisons between major plant taxa Alignmentgaps and low quality regions were manually removed usingJalview version 281 (Waterhouse et al 2009) Also speciescontaining more than 30 gaps or ambiguous sites were notkept for further analyses

De La Torre et al doi101093molbevmsx069 MBE

1372

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 10: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

than on average for the species under study long-lived gym-nosperms accumulate four times more mutations per gener-ation than short-lived angiosperms (assuming an averagegeneration time of 20 years in gymnosperms and an averagel equal to 157 10 8 in gymnosperms and 392 10 9

in angiosperms) This is coincident with previous reports ofincreased number of somatic mutations in gymnospermsspecies (Cloutier et al 2003 OrsquoConnell and Ritland 2004)and other woody perennials (Ally et al 2010 Bobiwashet al 2013) However because the per-generation increasein mutation rate in woody perennials may be less thanpredicted from their differences in generation time (Petitand Hampe 2006) we would expect our reported valuesto be an over-estimation of the actual difference betweenplant seed clades Differences in generation times wouldpredict a mutation rategt100 times larger in mangrovetrees than in annuals when estimates were only 25 timeslarger (Klekowski and Godfrey 1989) Alternatively theremay be selection to reduce DNA replication-dependenterrors through minimizing the number of cell divisionsrequired during development as recently suggested inArabidopsis (Watson et al 2016)

Larger Genomes Are Correlated with Larger Omegasand Slower Rates of Sequence DivergenceIn our study we also explored the relationship between ge-nome size and rates of molecular evolution Gymnospermgenomes are characterized by their enormous genome sizes(20ndash40 Gb) unique genome silencing mechanisms and lowunequal recombination (Leitch and Leitch 2012 Nystedt et al2013) While angiosperm genomes are highly dynamic andhave efficient mechanisms to counteract the increase in DNAamount stemming from WGDs or transposable elements (egreplication or recombination-based errors generating indelsunequal recombination between sister chromosomes Groveland Wendel 2010) gymnosperm genomes seem to be lessdynamic and may have evolved their own epigenetic mech-anisms to silence retrotransposons (Leitch and Leitch 2012)Our results indicate that genome size is positively correlatedwith x (rfrac14 056 Plt 0001 at the species level and rfrac14 052Plt 005 at the family level) In addition genome size is neg-atively correlated with l (r frac14 ndash067 Plt 001) suggestingspecies with large genomes such as gymnosperms have slowerrates of molecular evolution than species with small genomesconfirming previous studies on angiosperms species(Bromham et al 2015) A possible explanation for this isthat if large genome sizes correlate with larger cells and areduction in growth then plants with larger genomes mighthave fewer replications and therefore less accumulated mu-tations per unit of time (Bromham et al 2015)

Our study aims to contribute to the unexplored field ofmolecular evolution in seed plants by investigating thedifferences in rates of molecular evolution among gym-nosperms and angiosperms using a phylogenetic frame-work The recent genomic and transcriptomic resourcesin gymnosperm species opened a window to understandthe evolution of this ancient and important plant cladeHowever gymnosperm resources are still limited in

comparison to their sister clade of flowering plants Wehope that with the development of new genomic re-sources studies in molecular evolution will include abroader taxonomic sampling that provides us with amore complete understanding of the evolution of seedplants

Materials and Methods

Generation of Phylogenetic MarkersTo develop our set of phylogenetic markers we performeddeep sequencing and assembly of whole transcriptomesfrom two conifer species (Pinus pinaster and Pinus sylvestris)(Li Z et al (in review) transcriptome data can be found athttpbioinformaticspsbugentbesupplementary_datazheliphylo) We integrated this data with whole-transcriptomedata from another 29 gymnosperms 34 angiosperms and 1outgroup (Physcomitrella patens) obtained from public data-bases PlantGDB oneKP TreeGenes and PLAZA v30 (Proostet al 2015) We used OrthoMCL (Li et al 2003) to buildorthologous gene families across the species To reduce thenumber of single-copy genes we selected only the gene fam-ilies that were conserved and had low copy number Later weused hidden Markov probabilistic models implemented inHMMER to build an HMM profile for each gene family basedon multiple sequence alignment (Eddy 2009) These HMMprofiles were then used to assign additional proteins to theexisting gene families Then we used HMMSEARCH to findthe best protein hit of an HMM profile in each species andselected the markers with only reciprocal best hits for phylo-genomic analysis In order to increase the spectrum of thephylogenetic markers we only selected single-copy markersthat were present in a majority of species

Multiple sequence alignments were carried out for eachgene family based on amino acid sequences using Musclev3831 (Edgar et al 2004) Trimal v14 (Capella-Gutierrezet al 2009) was used to back translate the amino acid align-ments into coding sequence alignments and to remove lowquality alignment regions and spurious sequences We de-fined a spurious sequence as a sequence having less than70 of the total alignment positions that were present in75 of 66 species Genes that lost their sequences in morethan 10 of the species after removing spurious sequenceswere not used for further analyses This way we obtained 42single-copy markers across 66 seed plant species that wereused for evolutionary and phylogenetic analyses Annotationsof all genes are reported in supplementary table S1Supplementary Material Online

Rates of Sequence DivergenceMultiple sequence alignments for each of the 42 single-copygenes were divided in two groups containing either angio-sperms (34 species) or gymnosperms (31 species) to allowfurther comparisons between major plant taxa Alignmentgaps and low quality regions were manually removed usingJalview version 281 (Waterhouse et al 2009) Also speciescontaining more than 30 gaps or ambiguous sites were notkept for further analyses

De La Torre et al doi101093molbevmsx069 MBE

1372

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 11: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

Pairwise estimates (runmodefrac14ndash2 seqtypefrac14 1 modelfrac14 0NSsitesfrac14 0) of the number of synonymous substitutionsper site (dS) nonsynonymous substitutions per site (dN)and nonsynonymoussynonymous rate ratio (also calledldquoomegardquo and denoted by x or dNdS) were calculatedbetween species in terminal branches for each of the tax-onomic families of the angiosperm and gymnosperm phy-logeny using the maximum likelihood method of Goldmanand Yang (1994) in the codeml program from the PAMLpackage (version 48 Yang 2007) The analysis was repeatedtwice for each of the 42 genes For each sequence pair onlythe results with the higher lnL (log likelihood) were re-tained We discarded genes with dS values lower than001 as these values may result in inaccurate estimates ofx and also genes with dS or dNgt 3 which suggest satu-ration of substitutions Abnormally high omega rates(xgt 10) were also discarded (Villanueva-Ca~nas et al2013) To evaluate the differences in synonymous (dS)and non-synonymous substitutions (dN) and their ratio(x) between angiosperms and gymnosperms we used asign test in the R package PASWR (Ugarte et al 2008) totest the null hypothesis that the median for the differ-ences between the pairs equals zero We tested the alter-native hypotheses of a higher x in gymnosperms than inangiosperms and a lower dN and dS in gymnospermsthan in angiosperm species for each of the genesSignificance values were calculated for all genes for eachof the variables measured using a one-sided exact bino-mial test in R In addition to the pairwise estimates wealso calculated dS dN and x for each terminal branch ofthe phylogeny using the free-ratios branch model (Model1) implemented in codeml Differences in dS dN and xbetween plant lineages were tested using a Welch two-sample t-test in R

Absolute rates of silent-site divergence were estimated foreach taxonomic family assuming the ldquomolecular clockrdquo hy-pothesis (Kimura 1968 Kimura and Ohta 1971) in which thechange at the molecular level occurs constantly through timeacross evolutionary lineages Synonymous substitutions (dS)for each gene were averaged for all pairs of species in each ofthe six taxonomic families within gymnosperms (PinaceaeCuppresaceae Taxaceae Podocarpaceae Cycadaceae andGnetophytes) and in eight taxonomic families in angio-sperms (Brassicaceae Malvaceae Euphorbiaceae FabaceaeCucurbitaceae Rosaceae Solanaceae and Poaceae) Poaceaewas further divided in two groups (each containing a differentancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1 Supplementary Material Online) Absolute ratesof silent-site divergence were calculated for each of the genesusing the formula lfrac14 dS2T where l is the synonymousdivergence rate per site per year dS is the mean of synony-mous substitutions per site and T is the time of divergencebetween two species in years (Gaut et al 2011) Overall ratesof silent-site divergence for each family were calculated usingthe same formula where dS is the mean of synonymoussubstitutions per synonymous site across all 42 genes andT is the time of divergence in years Estimations of divergencetimes (T) were based on fossil records and previously

published divergence times (see supplementary table S2Supplementary Material Online) Whenever possible we se-lected the median time of divergence between the species ineach taxonomic group making sure the estimates are consis-tent with previous estimates for each taxa

Estimation of the Variation in Selective Pressures (x)among BranchesTo understand the variation in selective pressures among thegymnosperm and angiosperm plant lineages we formulatedthe following hypotheses H0 homogeneous selective pres-sure in both lineages (x0frac14xGfrac14xA) H1 selective pressurein the angiosperm clade (x0frac14xG xA) H2 selective pres-sure in the gymnosperm clade (x0frac14xA xG) H3 long-termshift in selective pressure resulting in both clades having dif-ferent omegas (x0 xG xA) and H4 all terminal branches inboth clades have different omegas All estimated omegas forhypotheses H1 to H3 were calculated on the stem branches ofthe angiosperm (xA) and the gymnosperm (xG) clades ac-cording to the phylogenetic tree in figure 1 The estimatedomega in unselected branches (background) is representedby x0

We tested the different hypotheses using branch models(Yang 1998 Yang and Nielsen 1998) in PAML 48 (Yang 1997Yang 2007) that allow x to vary among branches in the treeThe one-ratio model (runmodefrac14 0 seqtypefrac14 1 modelfrac14 0)which assumes one x ratio for all branches was first used toestimate the branch lengths from a concatenated sequencealignment of all 42 single-copy genes (31968 nucleotide sites)and a phylogenetic tree (see below for details on tree build-ing) This model was also used to test the null hypothesis(H0) The resulting branch lengths were then used as initialvalues to run the program three times with model 2(modelfrac14 2 NSSitesfrac14 0) to test hypotheses H1 H2 andH3 This model allows different branch groups to have differ-ent omegas according to the different branch labels assignedOmega in selected branches (foreground) was then com-pared with omega in unselected branches (background)Finally we tested hypothesis 4 by fitting model 1 (modelfrac14 1NSsitesfrac14 0) also called the free-ratios model A likelihoodratio test was used to assess deviations from the null model(both angiosperms and gymnosperms had the same x) forthe gene set Corrections for multiple testing were done usingthe Benjamini and Hochberg method (1995) with a false dis-covery rate threshold of 005

Phylogenetic trees used in all branch models were con-structed applying a posteriori partitioning inferred fromBayesian searches of the substitution rate matrix using a mix-ture model approach to the concatenated alignment by car-rying out BayesPhylogenies (Pagel and Meade 2004) with 10million generations of Markov chain Monte Carlo analysisWe finally obtained 12 partitions by the Perl script providedin Xi et al (2012) The phylogeny of seed plants was inferredby RAxML (82) with the concatenated alignment and thepartitions and edited for publication with FigTree v142(httptreebioedacuk)

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1373

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 12: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

Estimation of Substitution Parameters and Inferenceof the Proportion of Sites under Selection for EachTaxonomic FamilyWe tested different site models in codeml to estimate sub-stitution parameters infer omega distributions and assess theproportion of conserved (xlt 1) neutral (xfrac14 1) and se-lected (xgt 1) nucleotide sites in each taxonomic familyWe used the following parameters runmodefrac14 0 (usertree) modelfrac14 0 NSsitesfrac14 012357810 kappa (transi-tiontransversion rate ratio) to be estimated from the dataModels M1a (nearly neutral) M2a (positive selection) M3(discrete) M5 (gamma) M7 (beta) M8 (beta amp x) andM10 (beta amp gammathorn 1) detailed in Yang et al (2000)were tested for each taxonomic family independently (allterminal branches in each clade) using a concatenated align-ment of 42 genes By testing all species that share a mostrecent common ancestor we make sure that species had asimilar amount of time to accrue genetic differences and thattherefore any difference in the amount of genetic changeaccumulated through time represents a difference in the evo-lutionary rate For this analysis three species were removeddue to problems in the concatenated alignment (Beta vulga-ris Cryptomerica japonica and Welwitschia mirabilis) Inputphylogenetic trees were constructed for each taxonomic fam-ily using the GTRthornGAMMA model for maximum likelihoodof phylogenetic inference in RAxML (Stamatakis 2014) Weused the Akaike Information Criteria (AIC) to rank modelsand estimate the most probable distribution of omega foreach taxonomic family Runs were repeated to ensure theconvergence of results

In addition we also estimated the number of sites underpositive selection using two different analyses In the first onewe compared models M1andashM2a and M7ndashM8 using likeli-hood ratio tests In the second one we used the BayesEmpirical Bayes (BEB) (Yang et al 2005) implemented insite models M2a and M8 to calculate the posterior probabil-ities for site classes and to identify sites under positive selec-tion when the likelihood ratio test was significant We alsoused the results of the Naıve Empirical Bayes (NEB) but onlywhen BEB results were not available (site model M3)

Distribution of Selection CoefficientsWhile omega (x) as formulated in the codon models is aproperty of a particular site or collection of sites in the DNAsequence the selective coefficient (S) is a property of a par-ticular allele or mutation (Nielsen and Yang 2003) Thereforeit is possible to infer the distribution of S from the distributionof x if some assumptions are made in the mutation modelWe assume no interference in the fixation process of multiplemutations at different sites and further that there are nomore than two alleles segregating at the same nucleotidesite We also assume that all non-synonymous mutations atthe same amino acid site have equal selection coefficients S Ifthese assumptions are not met we may under-estimate theselection coefficients We use the inverse mapping from x toS to obtain the distribution of the selective coefficient ofnew mutations from the distribution of x Model 5

(Yang et al 2000) assumes that omega (x) is gamma distrib-uted among codon sites with parameters a and b and there-fore the probability density function for S is obtained by

fethSTHORN frac14 ethbesS=hethSTHORNTHORNaebesS=hethSTHORNethhethSTHORN STHORN=ShethSTHORNCethaTHORN1 lt S lt 1

(1)

where h(S)frac14 (es ndash 1) (Nielsen and Yang 2003) We assumed aconstant population size among lineages within each taxo-nomic family

Correlations with Life-History TraitsIn order to explain the variation in substitution rates and xbetween major plant clades we evaluated the associations ofthese with several biological and life history traits Generationtime (time to reach maturity) genome size (amount of DNAcontained in a haploid nucleus measured in picograms) andspecies richness (number of extant species in each taxonomicgroup or family) were obtained from public databasesBecause phylogenetic non-independence of data points canlead to an inflation of the significance of observed relation-ships between substitution rates and traits (Lanfear et al2010) we used the phylogenetically independent contrasts(PIC) method (Felsenstein 1985) implemented in the R pack-age APE to estimate the correlations between variables(Paradis et al 2004) This analysis was done twice either usingall species in the phylogeny or using only taxonomic familiesFor the latter analysis a phylogenetic tree with averagebranch lengths for each family and family-average estimatesof all variables were used as input files in APE Phylogenetictrees were imported in Newick format and then saved withFigTree v142 (Rambaut 2006 httptreebioedacuk) beforePIC analysis

We also tested for significant differences in dN dS x andl among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test) The anal-ysis was repeated three times taking in consideration differentgroups In the first one there were six groups as followsangiosperms dicots herbs angiosperms dicots shrubs angio-sperms dicots trees angiosperms monocots herbs gymno-sperms and Gnetophytes In the second one there were fourgroups angiosperms herbs angiosperms treesshrubs gym-nosperms and Gnetophytes And in the last test three groupswere considered (angiosperms herbs angiosperms shrubstrees and gymnosperms) while Gnetophytes were excluded

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

AcknowledgmentsWe thank Joseph Bielawski for valuable suggestions on dataanalyses and Outi Savolainen and Jeff Ross-Ibarra for com-ments on previous versions of the manuscript This work wassupported by the European 7th Framework Programme un-der the ProCoGen (Promoting Conifer Genomic Resources)

De La Torre et al doi101093molbevmsx069 MBE

1374

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 13: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

project YVdP would like to acknowledge theMultidisciplinary Research Partnership ldquoBioinformatics fromnucleotides to networksrdquo Project (no 01MR0310W) of GhentUniversity and the European Union Seventh FrameworkProgramme (FP72007-2013) under European ResearchCouncil Advanced Grant Agreement 322739ndashDOUBLEUP

ReferencesAkashi H Osada N Ohta T 2012 Weak selection and protein evolution

Genetics 19215ndash31Ally D Ritland K Otto SP 2010 Aging in a long-lived clonal tree Plos Biol

8(8)e1000454Benjamini Y Hochberg Y 1995 Controlling the false discovery rate-a

practical and powerful approach to multiple testing J R Stat SocSeries B 57(1)289ndash300

Bierne N Eyre-Walker A 2004 The genomic rate of adaptive amino acidsubstitutions in Drosophila Mol Biol Evol 211350ndash1360

Bobiwash K Schultz ST Schoen DJ 2013 Somatic deleterious mutationrate in a woody plant estimation from phenotypic data Heredity111(4)338ndash344

Bouille M Bousquet J 2005 Trans-species shared polymorphisms atorthologous nuclear gene loci among distant species in the coniferPicea (Pinaceae) implications for the long-term maintenance ofgenetic diversity in trees Am J Bot 9263ndash73

Braukmann TW Kuzmina M Stefanovic S 2009 Loss of all plastid ndhgenes in Gnetales and conifers extent and evolutionary significancefor the seed plant phylogeny Curr Gene 55323ndash337

Bromham L Hua X Lanfear R Cowman PF 2015 Exploring the relation-ships between mutation rates life history genome size environ-ment and species richness in flowering plants Am Nat185(4)507ndash524

Brown GR Gill GP Kuntz RJ Langley CH Neale DB 2004 Nucleotidediversity and linkage disequilibrium in loblolly pine PNAS101(42)15255ndash15260

Buschiazzo E Ritland C Bohlmann J Ritland K 2012 Slow but not lowgenomic comparisons reveal slower evolutionary rate and higherdNdS in conifers compared to angiosperms BMC Evol Biol 128

Capella-Gutierrez S Silla-Martinez JM Gabaldon T 2009 TrimAl a toolfor automated alignment trimming in large-scale phylogenetic anal-yses Bioinformatics 251972ndash1973

Carneiro M Albert FW Melo-Ferreira J Glatier N Gayral P Blanco-Aguiar JA Villafuerte R Nachman MW Ferrand N 2012 Evidencefor widespread positive and purifying selection across the Europeanrabbit (Oryctolagus cuniculus) genome Mol Biol Evol29(7)1837ndash1849

Chen J Uebbing S Gyllenstrand N Lagercrantz U Lascoux M Kallman T2012 Sequencing of the needle transcriptome from Norway spruce(Picea abies Karst L) reveals lower substitution rates but similarselective constraints in gymnosperms and angiosperms BMCGenom 13589

Cibrian-Jaramillo A De La Torre-Barcena JE Lee EK Katari MS Little DPStevenson DW Martiensen R Coruzzi GM DeSalle R 2010 Usingphylogenomic patterns and gene ontology to identify proteins ofimportance in plant evolution Gen Biol Evol 2225ndash239

Cloutier D Rioux D Beaulieu J Schoen DJ 2003 Somatic stability ofmicrosatellite loci in Eastern white pine Pinus strobus L Heredity90(3)247ndash252

Crisp MD Cook LG 2011 Cenozoic extinctions account for the lowdiversity of extant gymnosperms compared with angiosperms NewPhytol 192997ndash1009

De La Torre AR Birol I Bousquet J Ingvarsson PK Jansson S Jones SJMKeeling CI MacKay J Nilsson O Ritland K et al 2014a Insights intoConifer Giga-genomes Plant Physiol 1661ndash9

De La Torre AR Roberts DR Aitken SN 2014b Genome-wide admixtureand ecological niche modeling reveal the maintenance of speciesboundaries despite long history of interspecific gene flow Mol Ecol23(8)2046ndash2059

De La Torre AR Ling YC Van de Peer Y Ingvarsson PK 2015a Genome-wide analysis reveals diverged patterns of codon bias gene expres-sion and rates of sequence evolution in Picea gene families Gen BiolEvol 7(4)1002ndash1015

De La Torre AR Ingvarsson PK Aitken SN 2015b Genetic architectureand genomic patterns of gene flow between hybridizing species ofPicea Heredity 001ndash12 doi 101038hdy201519

De Smet R Adams KL Vandepoele K Van Montagu MCE Maere S Vande Peer Y 2013 Convergent gene loss following gene and genomeduplications creates single-copy families in flowering plants PNAS110(8)2898ndash2903

Doyle JA 1998 Phylogeny of vascular plants Annu Rev Ecol Syst20567ndash599

Doyle JA Donoghue MJ 1986 Seed plant phylogeny and the origin ofangiosperms An experimental cladistic approach M J Bot Rev52321

Echave J Spielman SJ Wike CO 2016 Causes of evolutionary rate var-iation among protein sites Nat Rev Genet 17109ndash121

Eckert AJ Bower AD Gonzalez-Martinez SC Wegrzyn JL Coop G NealeDB 2010 Back to nature ecological genomics of loblolly pine (Pinustaeda Pinaceae) Mol Ecol 19(17)3789ndash3805

Eckert AJ Wegrzyn JL Liechty JD Lee JM Cumbie P Davis JM Goldfarb BLoopstra CA Palle SR Quesada T et al 2013 The evolutionarygenetics of the genes underlying phenotypic associations forLoblolly pine (Pinus taeda Pinaceae) Genetics 1951353ndash1372

Eddy SR 2009 A new generation of homology search tools based onprobabilistic inference Genome Inform 23(1)205ndash211

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nuc Acids Res 32(5)1792ndash1797

Felsenstein J 1985 Phylogenies and the comparative method Am Nat1251ndash15

Gaut B Muse SV Clark WD Clegg MT 1992 Relative rates of nucleotidesubstitution at the rbcL locus of monocotyledonous plants J MolEvol 35292ndash303

Gaut B Yang L Takuno S Eguiarte LE 2011 The patterns and causes ofvariation in plant nucleotide substitution rates Annu Rev Ecol EvolSyst 42245ndash266

Goldman N Yang Z 1994 A Codon-based model of nucleotidesubstitution for protein-coding DNA sequences Mol Biol Evol11(5)725ndash736

Gossmann TI Song BH Windsor AJ Mitchell-Olds T Dixon CJ KapralovMV Filatov DA Eyre-Walker A 2010 Genome wide analyses reveallittle evidence for adaptive evolution in many plant species Mol BiolEvol 27(8)1822ndash1832

Grover C Wendel JF 2010 Recent insights into mechanisms of genomesize change in plants J Botany 2010article ID 382732 1ndash8

Hodgins KA Yeaman S Nurkowski K Rieseberg LH Aitken SN 2016Expression divergence is correlated with sequence evolution but notpositive selection in conifers Mol Biol Evol 33(6)1502ndash1516

Holliday JA Yuen M Ritland K Aitken SN 2010 Postglacial history of awidespread conifer produces inverse clines in selective neutralitytests Mol Ecol 19(18)3857ndash3864

Hough J Williamson RJ Wright SI 2013 Patterns of selection in plantgenomes Annu Rev Ecol Evol Syst 4431ndash49

Ingvarsson PK 2010 Natural selection on synonymous and nonsynon-ymous mutations shapes patterns of polymorphism in Populustremula Mol Biol Evol 27650ndash660

Jaramillo-Correa JP Verdu M Gonzalez-Martinez SC 2010 The contri-bution of recombination to heterozygosity differs among plant evo-lutionary lineages and life-forms BMC Evol Biol 1022

Kimura M 1968 Evolutionary rate at the molecular level Nature217624ndash626

Kimura M Ohta T 1971 Protein polymorphism as a phase of molecularevolution Nature 229467ndash469

Klekowski EJ Godfrey PJ 1989 Aging and mutation in plants Nature340(6232)389ndash391

Kujala ST Savolainen O 2012 Sequence variation patterns along a lat-itudinal cline in Scots pine (Pinus sylvestris) signs of clinal adapta-tion Tree Genet Genomes 81451ndash1467

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1375

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 14: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

Lanfear R Welch JJ Bromham L 2010 Watching the clock studyingvariation in rates of molecular evolution between species Tree25495ndash503

Lanfear R Ho SYW Davies TJ Moles AT Aarssen L Swenson NGWarman L Zanne AE Allen AP 2013 Taller plants have lower ratesof molecular evolution Nature Commun 41879

Lanfear R Kokko H Eyre-Walker A 2014 Population size and the rate ofevolution Tree 29(1)33ndash41

Larracuente AM Sackton TB Greenberg AJ Wong A Singh ND SturgillD Zhang Y Oliver B Clark AG 2007 Evolution of protein-codinggenes in Drosophila Trends Genet 24(3)114ndash123

Leitch AR Leitch IJ 2012 Ecological and genetic factors linked to con-trasting genome dynamics in seed plants New Phyto 194629ndash646

Leslie AB Beaulieu JM Rai HS Crane PR Donoghue MJ Mathews S 2012Hemisphere-scale differences in conifer evolutionary dynamicsPNAS 109(40)16217ndash16221

Li WH Ellsworth DL Krushkal J Chang BHJ Hewett-Emmett D 1996Rates of nucleotide substitution in primates and rodents and thegeneration-time effect hypothesis Mol Phylo Evol 5(1)182ndash187

Li L Stoeckert CJ Roos DS 2003 OrthoMCL identification of orthologgroups for eukaryotic genomes Genome Res 132178ndash2189

Li Z Defoort J Tasdighian S Maere S Van de Peer Y De Smet R 2016Gene duplicability of core genes is highly consistent across all an-giosperms Plant Cell 28(2)326ndash344

Lynch M 2010 Evolution of the mutation rate Trends Genet26(8)345ndash352

Magallon S Hilu KW Quandt D 2013 Land plant evolutionary timelinegene effects are secondary to fossil constraints in relaxed clock esti-mation of age and substitution rates Am J Bot 100556ndash573

Mimura M Aitken SN 2007 Adaptive gradients and isolation-by-distance with postglacial migration in Picea sitchensis Heredity99224ndash232

Neale DB Kremer A 2011 Forest tree genomics growing resources andapplications Nature Rev Genet 12111

Neale DB Wegrzyn JL Stevens KA Zimin AV Puiu D Crepeau MWCardeno C Koriabine M Holtz-Morris AE Liechty JD et al 2014Decoding the massive genome of loblolly pine using haploid DNAand novel assembly strategies Genome Biol 15R59

Nielsen R Yang Z 2003 Estimating the distribution of selection coeffi-cients from phylogenetic data with applications to mitochondrialand viral DNA Mol Biol Evol 20(8)1231ndash1239

Nielsen R 2005 Molecular signatures of natural selection Annu Rev Gen39197ndash218

Nystedt B et al 2013 The Norway spruce genome sequence and conifergenome evolution Nature 497579ndash584

OrsquoConnell LM Ritland K 2004 Somatic mutations at microsatellite lociin western Redcedar (Thuja plicata Cupressaceae) J Hered95(2)172ndash176

Ohta T 1992 The nearly neutral theory of molecular evolution AnnuRev Ecol Syst 23263ndash286

Pagel M Meade A 2004 A phylogenetic mixture model for detectingpattern-heterogeneity in gene sequence or character-state data SystBiol 53571ndash581

Palme AE Pyheuroajeuroarvi T Wachowiak W Savolainen O 2009 Selection onnuclear genes in a Pinus phylogeny Mol Biol Evol 26(4)893ndash905

Paradis E Claude J Strimmer K 2004 APE analyses of phylogenetics andevolution in R language Bioinformatics 20(2)289ndash290

Pavy N Pelgas B Laroche J Rigault P Isabel N Bousquet J 2012 A sprucegene map infers ancient plant genome reshuffling and subsequentslow evolution in the gymnosperm lineage leading to extant conifersBMC Biol 1084

Petit RJ Hampe A 2006 Some evolutionary consequences of being atree Annu Rev Ecol Evol Syst 37187ndash214

Phifer-Rixey M Bonhomme F Boursot P Churchill GA Pialek J TuckerPK Nachman MW 2012 Adaptive evolution and effective popula-tion size in wild house mice Mol Biol Evol 29(10)2949ndash2955

Piganeau G Eyre-Walker A 2009 Evidence for variation in the effectivepopulation size of animal mitochondrial DNA PLoS One 4e4396

Proost S Van Bel M Vaneechoutte D Van de Peer Y Inze D Mueller-Roeber B Vanderpoele K 2015 PLAZA30 an access point for plantcomparative genomics Nucl Acid Res 43(D1)D974ndashD981

Ran JH Gao H Wang XQ 2010 Fast evolution of the retroprocessedmitochondrial rps3 gene in Conifer II and further evidence for thephylogeny of gymnosperms Mol Phylogenet Evol 54136ndash149

Remington DL OrsquoMalley DM 2000 Whole-genome characterization ofembryonic stage inbreeding depression in a selfed Loblolly Pine fam-ily Genetics 155(1)337ndash348

Savolainen O Pyheuroajeuroarvi T 2007 Genomic diversity in forest trees CurrOpin Plant Biol 10162ndash167

Schultz ST Scofield DG 2009 Mutation accumulation in real branchesfitness assays for genomic deleterious mutation rate and effect inlarge-statured plants Am Nat 174163ndash175

Slotte T Foxe JP Hazzouri KM Wright SI 2010 Genome-wide evidencefor efficient positive and purifying selection in Capsella grandiflora aplant species with large effective population size Mol Biol Evol27(8)1813ndash1821

Smith SA Donoghue MJ 2008 Rates of molecular evolution are linked tolife history traits in flowering plants Science 322(5898)86ndash89

Stamatakis A 2014 RAxML Version 8 a tool for phylogenetic analysisand post-analysis of large phylogenies Bioinformatics30(9)1312ndash1313

Strasburg JL Kane NC Raduski AR Bonin A Michelmore R Rieseberg LH2011 Effective population size is positively correlated with levels ofadaptive divergence among annual sunflowers Mol Biol Evol28(5)1569ndash1580

Syring J Farrell K Businsky R Cronn R Liston A 2007 Widespreadgenealogical nonmonophyly in species of Pinus subgenus StrobusSyst Biol 56163ndash181

Ugarte MD Militino AF Arnholt AT 2008 Probability and statistics withR Chaoman amp Hall CRC

Villanueva-Ca~nas JL Laurie S Alba MM 2013 Improving genome-widescans of positive selection using protein isoforms of similar lengthGenome Biol Evol 52(2)457ndash467

Wachowiak W Balk PA Savolainen O 2009 Search for nucleotide di-versity patterns of local adaptation in dehydrins and other cold-related candidate genes in Scots pine (Pinus sylvestris L) TreeGenet Genomes 5117ndash132

Wang J Street NR Scofield DG Ingvarsson PK 2016a Natural selectionand recombination rate variation shape nucleotide polymorphismacross the genomes of three related Populus species Genetics202(3)1185ndash1200

Wang J Street NR Scofield DG Ingvarsson PK 2016b Variation in linkedselection and recombination drive genomic divergence during allo-patric speciation of European and American aspens Mol Biol Evol33(7)1754ndash1767

Wang XQ Ran JH 2014 Evolution and biogeography of gymnospermsMol Phylo Evol 7524ndash40

Waterhouse AM Procter JB Martin DMA Clamp M Barton GJ 2009Jalview version 2 a multiple sequence alignment and analysis work-bench Bioinformatics 25(9)1189ndash1191

Watson JM Platzer A Kazda A Akimcheva S Valuchova S Nizhynska VNordborg M Riha K 2016 Germline replications and somatic mu-tation accumulation are independent of vegetative life span inArabidopsis PNAS 113(43)12226ndash12231

Williams CG Savolainen O 1996 Inbreeding depression in conifers im-plications for breeding strategy Forest Sci 42(1)102ndash117

Williamson RJ Josephs EB Platts AE Hazzouri KM Haudry A BlanchetteM Wright SI 2014 Evidence for widespread positive and negativeselection in coding and conserved noncoding regions of Capsella gran-difloraPLoSGenet10(9)e1004622DOI101371journalpgen1004622

Willyard A Syring J Gernandt DS Liston A Cronn R 2007 Fossil cali-bration of molecular divergence infers a moderate mutation rateand recent radiations in Pinus Mol Biol Evol 24(1)90ndash101

Won H Renner S 2006 Dating dispersal and radiation in the gymno-sperm Gnetum (Gnetales)-Clock calibration when outgroup rela-tionships are uncertain Syst Biol 55610ndash622

De La Torre et al doi101093molbevmsx069 MBE

1376

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4
Page 15: Contrasting Rates of Molecular Evolution and … Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants Amanda R. De La Torre,*,1,2 Zhen Li,3,4

Xi Z Rest JS Davis CC 2013 Phylogenomics and coalescentanalyses resolve extant seed plant relationships PLoS ONE8(11)e80870

Yang Z 1997 PAML a program package for phylogenetic analysis bymaximum likelihood Comput Appl BioSci 13555ndash556

Yang Z 1998 Likelihood ratio tests for detecting positive selectionand application to primate lysozyme evolution Mol Biol Evol15568ndash573

Yang Z Nielsen R 1998 Synonymous and nonsynonymous rate varia-tion in nuclear genes of mammals J Mol Evol 46409ndash418

Yang Z Nielsen R Goldman N Krabbe-Pedersen AM 2000 Codon-substitution models for heterogeneous selection pressure at aminoacid sites Genetics 155431ndash449

Yang Z 2002 Inference of selection from multiple species alignmentsCurr Op Gen Dev 12688ndash694

Yang Z Nielsen R 2002 Codon-substitution models for detecting mo-lecular adaptation at individual sites along specific lineages Mol BiolEvol 19908ndash917

Yang Z Wong WSW Nielsen R 2005 Bayes empirical Bayes inference ofamino acid sites under positive selection Mol Biol Evol221107ndash1118

Yang Z 2007 PAML 4 a program package for phylogenetic analysis bymaximum likelihood Mol Biol Evol 241586ndash1591

Evolution of Gymnosperms and Flowering Plants doi101093molbevmsx069 MBE

1377

  • msx069-TF1
  • msx069-TF2
  • msx069-TF3
  • msx069-TF4