16
Molecular Phylogenetics and Evolution 43 (2007) 124–139 www.elsevier.com/locate/ympev 1055-7903/$ - see front matter © 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.ympev.2006.08.013 Structural partitioning, paired-sites models and evolution of the ITS transcript in Syzygium and Myrtaceae E. BiYn a,b,¤ , M.G. Harrington c , M.D. Crisp b , L.A. Craven a , P.A. Gadek c a Australian National Herbarium, CPBR, CSIRO Plant Industry, GPO Box 1600, Canberra, ACT 2601, Australia b School of Botany and Zoology, The Australian National University, Canberra, ACT 0200, Australia c School of Tropical Biology, James Cook University, Cairns, Qld 4870, Australia Received 17 March 2006; revised 31 July 2006; accepted 14 August 2006 Available online 22 August 2006 Abstract The internal transcribed spacers (ITS) of nuclear ribosomal DNA are widely used for phylogenetic inference. Several characteristics, including the inXuence of RNA secondary structure on the mutational dynamics of ITS, may impact on the accuracy of phylogenies esti- mated from these regions. Here, we develop RNA secondary structure predictions for representatives of the angiosperm family Myrta- ceae. On this basis, we assess the utility of structural (stem vs. loop) partitioning, and RNA-speciWc (paired-sites) models for a 76 taxon Syzygium alignment, and for a broader, family-wide Myrtaceae ITS data set. We use a permutation approach to demonstrate that struc- tural partitioning signiWcantly improves the likelihood of the data. Similarly, models that account for the non-independence of stem-pairs in RNA structure have a higher likelihood than those that do not. The best-Wt RNA models for ITS are those that exclude simultaneous double substitutions in stem-pairs, which suggests an absence of strong selection against non-canonical (G·U/U·G) base-pairs at a high proportion of stem-paired sites. We apply the RNA-speciWc models to the phylogeny of Syzygium and Myrtaceae and contrast these with hypotheses derived using standard 4-state models. There is little practical diVerence amongst relationships inferred for Syzygium although for Myrtaceae, there are several diVerences. The RNA-speciWc approach Wnds topologies that are less resolved but are more consistent with conventional views of myrtaceous relationships, compared with the 4-state models. © 2006 Elsevier Inc. All rights reserved. Keywords: ITS1; ITS2; Secondary structure; Stems; Loops; Mutational dynamics; RNA-speciWc models; Model selection; Maximum likelihood; Bayesian analysis 1. Introduction Sequences of the internal transcribed spacer (ITS) regions of nuclear ribosomal DNA (rDNA) are a widely used molecular tool for inferring evolutionary relationships amongst eukaryotes (e.g., Hershkovitz and Lewis, 1996; Hershkovitz and Zimmer, 1996; Hershkovitz et al., 1998; Alvarez and Wendel, 2003; Schultz et al., 2005). Several fac- tors, such as high copy number, universality of primer sequences, and the relatively small size of the spacers make data from these regions relatively easy to obtain. In addition, the expectation of high inter-speciWc and low intra-genomic variability, and bi-parental mode of inheri- tance has driven the popularity of ITS sequencing (Hers- hkovitz et al., 1998; Alvarez and Wendel, 2003). While sequencing of ITS has undoubtedly made substantial con- tributions to phylogenetics, several factors including vari- able (incomplete) rates of concerted evolution, the presence of divergent pseudogene copies, and highly complex pat- terns of sequence evolution may confound the reconstruc- tion of historical relationships inferred from these regions (see Alvarez and Wendel, 2003, and references therein). A speciWc concern is the inXuence of RNA secondary structure on the mutational dynamics of ITS, which has important implications for phylogenetic inference (Alvarez and Wendel, 2003). rDNAs encode RNA genes, which are * Corresponding author. E-mail address: Ed.BiY[email protected] (E. BiYn).

Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

Molecular Phylogenetics and Evolution 43 (2007) 124–139www.elsevier.com/locate/ympev

Structural partitioning, paired-sites models and evolution of the ITS transcript in Syzygium and Myrtaceae

E. BiYn a,b,¤, M.G. Harrington c, M.D. Crisp b, L.A. Craven a, P.A. Gadek c

a Australian National Herbarium, CPBR, CSIRO Plant Industry, GPO Box 1600, Canberra, ACT 2601, Australiab School of Botany and Zoology, The Australian National University, Canberra, ACT 0200, Australia

c School of Tropical Biology, James Cook University, Cairns, Qld 4870, Australia

Received 17 March 2006; revised 31 July 2006; accepted 14 August 2006Available online 22 August 2006

Abstract

The internal transcribed spacers (ITS) of nuclear ribosomal DNA are widely used for phylogenetic inference. Several characteristics,including the inXuence of RNA secondary structure on the mutational dynamics of ITS, may impact on the accuracy of phylogenies esti-mated from these regions. Here, we develop RNA secondary structure predictions for representatives of the angiosperm family Myrta-ceae. On this basis, we assess the utility of structural (stem vs. loop) partitioning, and RNA-speciWc (paired-sites) models for a 76 taxonSyzygium alignment, and for a broader, family-wide Myrtaceae ITS data set. We use a permutation approach to demonstrate that struc-tural partitioning signiWcantly improves the likelihood of the data. Similarly, models that account for the non-independence of stem-pairsin RNA structure have a higher likelihood than those that do not. The best-Wt RNA models for ITS are those that exclude simultaneousdouble substitutions in stem-pairs, which suggests an absence of strong selection against non-canonical (G·U/U·G) base-pairs at a highproportion of stem-paired sites. We apply the RNA-speciWc models to the phylogeny of Syzygium and Myrtaceae and contrast these withhypotheses derived using standard 4-state models. There is little practical diVerence amongst relationships inferred for Syzygium althoughfor Myrtaceae, there are several diVerences. The RNA-speciWc approach Wnds topologies that are less resolved but are more consistentwith conventional views of myrtaceous relationships, compared with the 4-state models.© 2006 Elsevier Inc. All rights reserved.

Keywords: ITS1; ITS2; Secondary structure; Stems; Loops; Mutational dynamics; RNA-speciWc models; Model selection; Maximum likelihood; Bayesiananalysis

1. Introduction

Sequences of the internal transcribed spacer (ITS)regions of nuclear ribosomal DNA (rDNA) are a widelyused molecular tool for inferring evolutionary relationshipsamongst eukaryotes (e.g., Hershkovitz and Lewis, 1996;Hershkovitz and Zimmer, 1996; Hershkovitz et al., 1998;Alvarez and Wendel, 2003; Schultz et al., 2005). Several fac-tors, such as high copy number, universality of primersequences, and the relatively small size of the spacers makedata from these regions relatively easy to obtain. In

* Corresponding author.E-mail address: [email protected] (E. BiYn).

1055-7903/$ - see front matter © 2006 Elsevier Inc. All rights reserved.doi:10.1016/j.ympev.2006.08.013

addition, the expectation of high inter-speciWc and lowintra-genomic variability, and bi-parental mode of inheri-tance has driven the popularity of ITS sequencing (Hers-hkovitz et al., 1998; Alvarez and Wendel, 2003). Whilesequencing of ITS has undoubtedly made substantial con-tributions to phylogenetics, several factors including vari-able (incomplete) rates of concerted evolution, the presenceof divergent pseudogene copies, and highly complex pat-terns of sequence evolution may confound the reconstruc-tion of historical relationships inferred from these regions(see Alvarez and Wendel, 2003, and references therein).

A speciWc concern is the inXuence of RNA secondarystructure on the mutational dynamics of ITS, which hasimportant implications for phylogenetic inference (Alvarezand Wendel, 2003). rDNAs encode RNA genes, which are

Page 2: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139 125

single stranded but develop secondary structure (helicalregions, or stems, formed by intra-molecular base pairing)as part of the formation and functioning of ribosomes(Noller, 1984). Many RNA molecules are subject to evolu-tionary constraint related to the maintenance of speciWcsecondary structures that provide functionality. However, itis frequently observed that homologous stable stem struc-tures are maintained despite extensive nucleotide diver-gence, because stem pairing regions of conserved RNAsevolve via selectively neutral mutations in the form of com-pensatory (or hemi-compensatory) base pair change (CBC).Mutations at diVerent (often–distant) sites in a moleculecan be correlated because a change in one nucleotide in astem pair must be compensated by a change in the oppos-ing member, in order to preserve an energetically stable sec-ondary structure (Higgs, 2000). Most phylogenyreconstruction methods assume independence amongstsites and may therefore be unsuitable for RNAs with con-served secondary structure (Wheeler and Honeycutt, 1988;Dixon and Hillis, 1993; Tillier and Collins, 1998; Higgs,2000; Savill et al., 2001; Telford et al., 2005). More gener-ally, there is an expectation that the patterns of evolutionmay vary substantially between stem-paired and singlestranded (loop) regions, and for example, helical regions ofRNA molecules tend to be G–C rich suggesting selection tomaintain thermodynamically stable stem structures (Hers-hkovitz et al., 1998; Higgs, 2000; Savill et al., 2001). In sin-gle stranded regions, there may be a pronounced biastowards adenine nucleotides, which are associated with sev-eral well-characterised RNA structural motifs, some ofwhich are implicated in higher-level (tertiary) structuralinteractions (Gutell et al., 2000).

The secondary structure of the ITS regions has beenestimated for a number of phylogenetic studies, althoughthe focus, primarily, has been the potential of structuralinformation to facilitate homology-based sequence align-ment amongst divergent sequences (e.g., Gottschling et al.,2001; Goertzen et al., 2003) or the identiWcation of puta-tive pseudogene copies of the ITS transcript (e.g., Bucklerand Holtsford, 1996; Bailey et al., 2003). However, thereare no ITS-based studies to date which have attempted todirectly incorporate secondary structure information intomodels of RNA sequence evolution, perhaps in partreXecting the widespread assumption that the ITS areunder low functional constraint, and therefore approxi-mate a neutral evolutionary model. Furthermore, and incontrast to rDNA coding regions (such as the 5.8S rDNAgene, see Hershkovitz and Zimmer, 1996), the ITS lack abroad conservation of sequence (e.g., Baldwin, 1992;Hershkovitz and Zimmer, 1996; Hershkovitz et al., 1998)and this is believed to limit the accuracy of conventionalapproaches to RNA structure prediction (Alvarez andWendel, 2003). Nevertheless, there is strong evidence for agenerally conserved functional role for ITS that is medi-ated at the sequence and structural level (e.g., Josephet al., 1999; Côté and Peculis, 2001; Lalev and Nazar,1999, 2001). The ITS are sequentially cleaved from the

large precursor (pre-RNA) molecule (80-90S nucleolarparticle) and digested. However, there are close interde-pendencies in the cleavage pathway, reXecting the need forhigher order structure in the pre-RNA, including the ITS,that may be necessary to organise the cleavage sites inclose spatial proximity (Lalev et al., 2000). Key structuralelements, including cleavage sites and binding sites fornucleolar proteins (including those associated with thespliceosome-like protein complex referred to as the ribo-some assembly chaperone, see Lalev et al., 2000), may beessentially conserved across eukaryotes (e.g., van Nueset al., 1994; Mai and Coleman, 1997; Joseph et al., 1999;Lalev and Nazar, 1999; Coleman, 2003; Schultz et al.,2005). Therefore, the concerns relating RNA secondarystructural constraints to phylogenetic analysis could rea-sonably apply to sequences of the ITS regions.

The study of Harrington and Gadek (2004) used ITSsequences to infer evolutionary relationships within theangiosperm genus Syzygium and its allies (Myrtaceae)although the hypothesis they present is based upon rela-tively simple evolutionary models, including maximumparsimony, and a Bayesian analysis employing a modelthat allows for diVerential rates of transitions and trans-versions (HKY85). Phylogenetic studies have demon-strated that structural partitioning and the use ofcomplex evolutionary models may better account for themutational processes occurring in RNA sequences (Wil-genbusch and De Querioz, 2000; Savill et al., 2001; Jowet al., 2002; Hudelot et al., 2003; Kjer, 2004; Telford et al.,2005). In particular, maximum likelihood (ML)approaches to phylogeny reconstruction have facilitatedthe development of models of RNA sequence evolutionwhich treat stem nucleotides as paired sites, and thusaccount for the possible non-independence of sites withinstem-pairing regions (e.g., Tillier and Collins, 1998;Schöniger and von Haeseler, 1999; Higgs, 2000; Savillet al., 2001; Jow et al., 2002). Three classes of models(RNA16, RNA7, and RNA6, in the terminology of Savillet al., 2001) provide rates for the commonly observedbase-pairs in secondary structure (i.e., Watson–Crick,G·C/C·G/A·U/U·A, and ‘wobble’, G·U/U·G, pairs) butdiVer in the treatment of mismatch pairs. RNA16includes a rate class for each of the possible mismatchpairings (i.e., 16£ 16 rate matrix), RNA7 includes a sin-gle mismatch class (i.e., 7£ 7 rate matrix; RNA7) whilefor RNA6, mismatches are completely excluded (i.e.,6£ 6 rate matrix) from the analysis. Restrictions of thesegeneralised models include those that exclude the possi-bility of double substitutions (i.e., all double transitionspass through a GU intermediate, and all double transver-sions pass through a mismatch pair) or enforce base-pairreversal symmetry (e.g., the rate for AU is equal to therate for UA). Recently, Savill et al. (2001) compared thevariants for each class of RNA model for a small-subunit(SSU) rRNA alignment and concluded that the most gen-eralised model from each class best reXects the complex-ity of RNA evolution.

Page 3: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

126 E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139

In the present study, we derive predictions of ITS rRNAsecondary structure for Syzygium and the Myrtaceae. Sub-sequently, we explore the utility of structural partitioningand RNA-speciWc evolutionary models, which treat stemnucleotides as paired sites, for the phylogenetic analysis ofthe ITS sequence data.

2. Methods

2.1. Sequence data

We assembled a ‘Syzygium’ data set, including 76 ITSsequences, these being representative of key syzygioid lin-eages identiWed by Harrington and Gadek (2004), and byBiYn et al. (2006) in their phylogenetic analysis of chloro-plast DNA sequence data (cpDNA). A broader ‘Myrta-ceae’ alignment was developed to include a representationof the major lineages in the family, to assist with secondarystructure prediction for the Syzygium group, but also toexplore patterns of evolution amongst more divergentsequences. Relative to the recent tribal classiWcation ofMyrtaceae (Wilson et al., 2005 , based upon cpDNA matKsequences), the 45 taxon ‘Myrtaceae’ sample includes repre-sentatives of the sub-family Myrtoideae tribes Syzygieae(Acmena, Acmenosperma, Piliocalyx, and Syzygium); Trist-anieae (Tristania, Thaleropia, and Xanthomyrtus); Metro-sidereae (Metrosideros); Myrteae (Myrtus, Eugenia, Luma,Calyptranthes, Decaspermum, Psidium, Rhodamnia, Rhodo-myrtus, and Myrciaria); Backhousieae (Backhousia andChoricarpia); Kanieae (Tristaniopsis); Eucalypteae (Euca-lyptus, Corymbia, Eucalyptopsis, Angophora, Arillastrum,and Allosyncarpia); Chamelaucieae (Chamelaucium);Leptospermeae (Leptospermum, Kunzea, Asteromyrtus, andPericalymma); Melaleuceae (Melaleuca, Calothamnus, andCallistemon); Osbornieae (Osbornia); Lophostemoneae(Lophostemon); and Xanthostemoneae (Xanthostemon).Psiloxylon is included as a representative of Myrtaceae sub-family Psiloxyloideae (Table 1).

For the novel sequences, ampliWcation and sequencingprimers used were ITS5M (Liston et al., 1996), ITS25R(Nickrent et al., 1994) and ITS2/3/4/5 (White et al., 1990).DNA extraction used the hot CTAB protocol of Doyle andDoyle (1990). PCR ampliWcations were performed using aHybaid PCR Express thermocycler, under standard reac-tion conditions with an annealing temperature of 55 °C.The ampliWed double-stranded template puriWed using aQiaquick PCR cleanup kit (Qiagen). Sequences wereobtained using Xuorescent dye-labelled terminators (Big-Dye v.2.0, 2.1, 3.1; Perkin-Elmer) on an ABI Prism 377DNA sequencer. In all cases forward and reverse strandswere sequenced, so as to check for possible sequence mis-reads. Electropherograms were processed using Sequen-cher™ (Gene Codes Corporation).

The data sets developed here are derived entirely fromdirect sequencing of PCR products. The electrophero-grams were screened for potential paralogues, as indicatedby multiple peaks of equal strength (occasional polymor-

phic sites were found). We used secondary structural crite-ria to screen for potential pseudogenes. SpeciWcally, theconservation of well-characterised ‘core’ structural motifsis theoretically consistent with the maintenance of func-tion (e.g., Buckler and Holtsford, 1996; Bailey et al., 2003).In the absence of pseudogenes, paralogy is unlikely toimpact upon the Myrtaceae-wide analyses because thesampling is above species level and paralogues areunlikely to be suYciently diVerentiated to support incor-rect topologies (e.g., Hershkovitz et al., 1998). For theSyzygium alignment, we have compared topologies fromITS with those from unlinked data (chloroplast DNA, seeBiYn et al., 2006), which suggests a level of conXict that isconsistent with ‘noise’ and sampling error, i.e., there is noevidence of strong, conXicting resolution of taxa amongstthe independent sources of data (E. BiYn, unpublishedresults).

2.2. Sequence alignment and secondary structure prediction

We performed a multiple sequence alignment ‘by-eye’for Syzygium and outgroups, and separately for thebroader comparison of Myrtaceae. The alignment was sub-sequently adjusted with reference to secondary structuralinformation. Sequence alignment was not problematic forSyzygium and, in most instances, was readily achievable forthe included sample of myrtaceous sequences. However, afew regions associated with length-mutation could not bemeaningfully aligned across all taxa and were thereforeexcluded from the multiple sequence alignment. Theseinclude the hairpin loop of ITS1 stem I, the central portion(proximal and distal strands) of ITS1 stem II, and the hair-pin loop of ITS2 stem I (see Results). The alignment hasbeen supplied to the journal as Supplementary data.

The Pfold algorithm (Knudsen and Hein, 2003; http://www.daimi.au.dk/~compbio/pfold) was used to deWne a setof reasonable starting constraints for input to minimumfree energy (MFE) folding. Pfold uses a ‘stochastic contextfree grammar’ approach to produce a ‘prior probabilitydistribution of RNA structures’ for an input RNA align-ment (Knudsen and Hein, 2003). In a practical sense, Pfoldreturns an alignment which indicates bases with a highprobability either of pairing or of occurring in an unpairedstate (here, the signiWcance level was set at 0.95). In a recentcomparison of RNA folding algorithms, Pfold was foundto be generally accurate (as determined by the ability topredict experimentally veriWed secondary structures fromseveral RNA alignments), particularly with respect to rela-tively short, well-aligned sequences (Gardner and Giege-rich, 2004), as is the case for the ITS alignments consideredhere.

The RNAstructure software (version 4.2, Mathewset al., 2004) was used for MFE structure prediction, usingdefault parameters, with and without input constraints,as determined by the Pfold approach (above). MFE pre-dictions were performed for the majority of includedsequences. Highly similar sequences were not subject to

Page 4: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139 127

(continued on next page)

Table 1Taxon, voucher details and GenBank accession numbers for Myrtaceae ITS sequences

Species GenBank Accession No. Voucher

Acmena acuminatissima (Blume) Merr. & Perry EF026611 Gadek s.n.-JCTAcmena divaricata Merr. & Perry AY187160Acmena graveolens (F.M. Bail.) L.S. Smith AY187163Acmena ingens (F. Muell ex C. Moore) Guymer & B. Hyland EF026611 Beasley and Ollerenshaw 1018-CANBAcmena mackinnoniana B. Hyland AY187165Acmena smithii (Poir.) Merr. & Perry AY187168Acmenosperma claviXorum (Roxb.) E. Kausel AY187169Allosyncarpia ternata S.T. Blake AFO58453Anetholea anisata (Vickery) P.G. Wilson AY187225Angophora costata (Gaertn.) Britten AF058455Arillastrum gummiferum Pancher ex. Bail. AF058454Asteromyrtus arnhemica (Byrnes) Craven EF026603Asteromyrtus symphyocarpa (F.Muell.) Craven EF041509 C.Chong s.n.-JCTBackhousia myrtifolia Hook. EF026609 CBG8501263-CANBBlepharocalyx salicifolia O.Berg AM234084Callistemon viminalis (Sol. ex Gaertn.) G.Don EF041510 C.Chong s.n.-JCTCalothamnus quadriWdus R.Br. EF041511 C.Chong s.n.-JCTCalyptranthes concinna DC. AM234103Chamelaucium uncinatum Schauer EF026605Choricarpia subargentea (C.T. White) L. Johnson EF026610 Telford and Butler 9041-CANBCleistocalyx seemanii (A.C. Sm.) Merr. & Perry EF026613 BiYn and Craven 65-CANBCleistocalyx sp. EBC58 EF026614 BiYn and Craven 58-CANBCorymbia maculata Hook. AF058461Decaspermum humile (G. Don) A.J. Scott AM234128Eucalyptopsis papuana C.T.White AF190354Eucalyptus gunnii Hook.f. AF058469Eucalyptus urophylla S.T.Blake AF390492Eugenia reinwardtiana (Blume) D.C. AY487201Eugenia uniXora L. AY487284Kunzea sinclairii (Kirk) W.Harris AY772399Leptospermum scoparium J.R.Forst & G.Forst AY772398Lophostemon confertus (R.Br.) Peter G.Wilson & J.T.Waterh. AF390444Luma apiculata (DC.) Burret AM234101Melaleuca citrolens Barlow EF041512 C.Chong s.n.-JCTMelaleuca deanei F.Muell. EF041513 C.Chong s.n.-JCTMetrosideros diVusa (G. Forst.) Sm. AF211500Metrosideros nervulosa C. Moore & F. Muell. EF026607 BiYn 34-CANBMyrciaria cauliXora O.Berg AM234093Myrtus communis L. AM234101Osbornia octadonta F. Muell. EF041844 Lyne 36-CANBPericalymma ellipticum (Endl.) Schauer EF026604Pilioclayx bullatus Brong. & Gris EF026617 BiYn and Craven 121-CANBPiliocalyx concinnus A.C. Sm. EF026615 BiYn and Craven 61-CANBPiliocalyx francii Guillaumin EF026616 BiYn and Craven 114-CANBPiliocalyx robustus Brongn. & Gris EF026618 BiYn and Craven 133-CANBPimenta racemosa (Mill.) J.W. MoorePsidium cattelianum Mart. ex DC. AM234080 De Silva & Farias 4535-KPsiloxylon mauritanium Baill. EF026606Rhodamnia argentea Benth. AY487302Rhodomyrtus psidioides (G.Don) Benth. AM234134Syzygium acre (Pancher ex Guillaumin) J.W. Dawson EF026619 BiYn and Craven 107-CANBSyzygium amplifolium Perry EF026620 BiYn and Craven 1-CANBSyzygium angophoroides (F. Muell.) B. Hyland AY187172Syzygium apodophyllum (F. Muell.) B. Hyland AY187173Syzygium aqueum (Burm. f.) Alston AY187174Syzygium arboreum (Baker f.) J.W. Dawson EF026621 BiYn and Craven 111-CANBSyzygium aromaticum (L.) Merr. & Perry EF026622 Brown and Craven 130-CANBSyzygium australe (Wendl. ex Link) B. Hyland AY187177Syzygium austrocaledonicum (Seem.) Guillaumin EF026623 Percy s.n.-CANBSyzygium bamagense B.Hyland AY187178Syzygium branderhorstii Lauterb. AY187181Syzygium bungadinnia (F.M. Bail.) B. Hyland AY187182Syzygium buxifolium Hook. & Arn. EF026624 Brown and Craven 134-CANB

Page 5: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

128 E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139

separate MFE predictions. In all instances, there weremultiple sub-optimal structures, although the constrainedset was generally nested within the set of structuresreturned from the unconstrained predictions. Sub-opti-mal structures were searched for commonly occurringhelices, which were considered well determined when

found at high frequency for predictions of each individ-ual sequence (as a guideline, 80% of structures within 2%of the thermodynamic stability of the MFE prediction;Zuker and Jacobson, 1995), but also found in the major-ity of sequences for which MFE predictions wereperformed.

Table 1 (continued)

Location of vouchers: CANB Australian National Herbarium; JCT James Cook University Herbarium; PRU Pretoria University Herbarium.

Species GenBank Accession No. Voucher

Syzygium canicortex B. Hyland AY187183Syzygium cordatum Hochst. ex C. Krauss EF026625 van der Merwe 500-PRUSyzygium cormiXorum (F. Muell.) B. Hyland AY187184Syzygium corynanthum (F. Muell.) B. Hyland EF026626 BiYn 39-CANBSyzygium crebrinerve (C.T. White) L. Johnson EF026627 BiYn 40-CANBSyzygium erythrocalyx (C.T. White) B. Hyland AY187187Syzygium Wbrosum (F.M. Bail.) Hartley & Perry AY187189Syzygium francisii (F.M. Bail.) L. Johnson AY187182Syzygium fullagarii (F. Muell.) Craven AY187193Syzygium glenum Craven AY187162Syzygium guineense Guill. & Perr. EF026628 van der Merwe 501-PRUSyzygium gustavioides (F.M. Bail.) B. Hyland AY187194Syzygium jambos (L.) Alston EF026629 BiYn 42-CANBSyzygium lateriXorum Brong. & Gris EF026630 BiYn and Craven 110-CANBSyzygium laxeracemosum (Guillaumin) J.W. Dawson EF026631 BiYn and Craven 148-CANBSyzygium leuhmannii (F.Muell.) L. Johnson AY187197Syzygium macilwraithianum B. Hyland AY187198Syzygium maire (A. Cunn.) Sykes & P.J. Garnock-Jones EF026632 Gardner 8470-CANBSyzygium malaccense (L.) Merr. & Perry AY187199Syzygium monimioides Craven AY187166Syzygium moorei (F. Muell.) L. Johnson EF026632 BiYn 50-CANBSyzygium muellerii Miq. EF026634 Brown and Craven 136-CANBSyzygium multipetalum Pancher ex Brongn. & Gris EF026635 BiYn and Craven 75-CANBSyzygium nervosum D.C. EF026636 Slee et al 2386-CANBSyzygium ngyonense (Schltr.) Guillaumin EF026637 Percy s.n.-CANBSyzygium oleosum (F. Muell.) B. Hyland AY187203Syzygium paniculatum Gaertn. AY187204Syzygium pondoense Engl. EF026638 van der Merwe 502-PRUSyzygium psuedofastigiatum B. Hyland AY187206Syzygium puberulum Hartley & Perry AY187207Syzygium purpureum (Perr.) A.C. Sm. EF026639 BiYn and Craven 19-CANBSyzygium pycnanthum Merr. & Perry EF026640 Brown and Craven 139-CANBSyzygium racemosum D.C. EF026641 Brown and Craven 138-CANBSyzygium sayeri (F. Muell.) B. Hyland AY187209Syzygium seemannianum Merr. & Perry EF026642 BiYn and Craven 32-CANBSyzygium sexangulatum (Miq.) AmshoV EF026643 Brown and Craven 141-CANBSyzygium sp. ‘Sulawesi 1’ EF026644 Brown and Craven 8-CANBSyzygium sp. ‘Sulawesi 2’ EF026645 Brown and Craven 90-CANBSyzygium sp. ‘Sulawesi 3’ EF026646 Brown and Craven 92-CANBSyzygium sp. ‘Sumatra 1’ EF026647 Brown and Craven 140-CANBSyzygium tenuiXorum Brong. & Gris EF026648 BiYn and Craven 121-CANBSyzygium tetrapterum (Miq.) Chantaranothai & J. Parn. EF026649 Brown and Craven 135-CANBSyzygium tierneyanum (F. Muell.) Hartley & Perry AY187213Syzygium wesa B. Hyland AY187216Syzygium wilsonii (F. Muell.) B. Hyland subsp. wilsonii AY187217Syzygium zeylanicum D.C. EF026650 SBG 5-CANBThaleropia queenslandica P.G. Wilson AY264945 C.Chong s.n.-JCTTristania neriifolia (Sims) R. Br. EF026608 Telford 10900-CANBTristaniopsis laurina (Sm.) Peter G.Wilson & J.T.Waterh. EF041514 C.Chong s.n.-JCTWaterhousea Xoribunda (F. Muell.) B. Hyland AY187221Waterhousea hedraiophylla (F. Muell.) B. Hyland AY187222Waterhousea mulgraveana B. Hyland AY187223Waterhousea unipunctata B. Hyland AY187224Xanthomyrtus motivaga A.J. Scott AM234147Xanthostemon chrysanthus (F.Muell.)Benth. EF041515 C.Chong s.n.-JCT

Page 6: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139 129

The well-determined structures were included in themultiple sequence alignment, and we used covariation anal-ysis (e.g., Juan and Wilson, 1999; Gutell et al., 2002; Cole-man, 2003) to strengthen MFE predictions. Well-determined helices were considered ‘proven’ when sup-ported by one or more full CBC and the alignment was thuspartitioned into ‘stems’ and ‘loops’. While noting that thelatter could be reasonably partitioned into various classesof non-pairing bases (e.g., hairpin loops vs. internal bulges)this was not attempted, given the relatively small sequencelength, and the potential for high variance in model param-eter estimates associated with several small data partitions.Mismatch base-pairs occurring at high frequencies (within750% of sequences, for the alignment of Myrtaceae), wereconsidered eVectively non-pairing, and were thereforeincluded in the ‘loop’ partition for subsequent analyses.This approach is consistent with the derivation of a 50%majority-rule consensus structure (e.g., Gardner and Giege-rich, 2004), which, although somewhat arbitrary, providesan estimate of the relatively conserved elements of theRNA structure.

2.3. Phylogenetic analysis

2.3.1. Partitioning strategyA key question to be addressed in the present study is

whether secondary structural information can contribute,in a practical sense, to the analysis of ITS sequence data.The likelihood function was used to explore this issue, andwe adopted the methods recently implemented by Telfordet al. (2005) to test the utility of structural partitioning foran SSU-rRNA alignment.

The approach was to compare likelihood scores for anun-partitioned, versus structurally partitioned, versus ran-domly repartitioned data, the null hypothesis being thatstructural partitioning does not signiWcantly improve thelikelihood of the data. The best-Wt model (from the set ofmodels included in Mr.AIC 1.4; Nylander, 2004) for the un-partitioned Syzygium ITS data was selected using the sec-ond-order Akaike Information Criterion (AICc; Sugiura,1978). An ML estimate of phylogeny was derived inPhyML (Guindon and Gascuel, 2003) using a proWle ofparsimony trees (estimated in the Phylip-3.5 (Felsenstein,1993) software module DNAPars) as starting trees. ThePhyML topology was used as an input into the Optimizermodule included in the Phase software package (Phase ver-sion 2.0b, http://www.bioinf.man.ac.uk/resources/phase,hereafter referred to as Phase). Given a sequence alignment,an evolutionary model, and a starting tree, Optimizerreturns an estimate of the ML score and model parametervalues.

The ML score for the un-partitioned alignment wascompared (using the AICc and the hierarchical likelihoodratio test, hLRT) to the likelihood estimated on the sametopology, using the best Wt model as selected above, withmodel parameters estimated separately for each data parti-tion under the structural partitioning scheme. To determine

whether the ML scores for the latter signiWcantly exceededthose which could be expected by chance, the maximumlikelihood was estimated for 100 randomly repartitioneddata sets (i.e., nucleotide positions were randomly reparti-tioned, without disrupting the alignment), with two parti-tions of equal size relative to the structurally partitioneddata.

2.3.2. Model selection-RNA-speciWc modelsWe explored the utility of several RNA-speciWc (paired-

sites) models, as implemented in Phase. SpeciWcally, thefocus was the RNA7 and RNA16 class of models, whichhave the advantage of including all of the data (RNA6models exclude mismatch pairs). The RNA16 models pro-vide separate frequencies for each of the possible mismatchpairs, while the RNA7 models treat mismatch pairs as asingle state.

The speciWc models considered are RNA7A, RNA7C,and RNA7D (RNA7 class), and R16A, RNA16I, andRNA16K (RNA16 class). Of these, RNA7A and 16A arethe most generalised models in their class and we includedthe other variants because we wished to determine whethersimpler models could Wt the data equally well. OnlyRNA7A, 7D and 16A permit simultaneous substitutions(e.g., G·CM A·U) while for the RNA7C, RNA16I and 16Kdouble substitutions must Wrst pass through a mismatchstate (i.e., rates are modeled for single nucleotides withinstem-pairs). Of the RNA16 models, 16A has a separate rateparameter for each of double transitions, double transver-sions, single site changes, substitutions to and from a mis-match state, and changes between mismatch pairs. The16Imodel has a GTR-like rate matrix (including a rate for eachpossible single site nucleotide change) and 16K has anHKY85-like rate matrix (grouping single site rates for eachof transitions and transversions). RNA7A includes rates toand from each of the commonly occurring (i.e., Watson–Crick, and wobble) stem-pairs, and to and from a mismatchstate. For RNA7C, the rate of double substitutions is set tozero, and for RNA7D double transitions, double transver-sions, single site changes, and substitutions to and from themismatch state are ‘lumped’ The reader is referred to Higgs(2000), Savill et al. (2001), and the Phase v.2.0b manual, forfurther details of RNA-speciWc models. We estimatedgamma distributed rate variation (�) and the proportion ofinvariant sites (I) for all of the models considered.

For a Myrtaceae ‘stems only’ alignment, the likelihoodscore for each of these models was estimated on a test treeconstrained to the ML topology of Sytsma et al. (2004, S93)which is, arguably, the current best estimate of higher levelrelationships for the Myrtaceae. For the Syzygium alignment(stems only), RNA model likelihoods were estimated on thePhyML tree (above). For each analysis, the best-Wt model,from the set of included models, was determined using theAICc, noting that valid comparisons can be made onlyamongst the RNA7 models, and between RNA16I and 16K.By contrast, log-likelihoods cannot be compared betweenRNA7 and RNA16 models, or between RNA16A and 16I or

Page 7: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

130 E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139

16K, because the parameters are diVerent, and the likeli-hoods are thus, derived in a diVerent fashion (Savill et al.,2001). For the nested models, the hLRT was also performed.

While it is invalid to compare likelihoods between RNA-speciWc and standard 4-state models, Telford et al. (2005)describe a permutation approach which can be used to deter-mine whether a correlation between paired sites signiWcantlyimproves the likelihood of the data. The test is achieved byrandomly re-ordering the characters (columns) in the struc-tural alignment, while maintaining the number, and relativeposition of stem-pairs, thereby removing any correlationbetween the nucleotides that form a stem pair. The estimatedlikelihoods of the permuted data are then compared withthat for the intact structural alignment with likelihoodscores, in both cases, estimated for the same RNA-speciWcmodel. In order to perform this test likelihood scores wereestimated for 100 permuted matrices and compared with thelikelihood for the original alignment, using the best-WtRNA16 model. Likelihood estimates were obtained using thePhase software for a stems only alignment of the Syzygiumdata set. Note that of the various classes of RNA models,only RNA16, in providing separate rates for each mismatchcategory, is appropriate for this permutation test.

2.3.3. ITS phylogenyBayesian phylogenies for the structurally partitioned

Syzygium and ‘Myrtaceae’ ITS data sets were constructedunder the best-Wt RNA7 (stems) and 4-state (loops) substitu-tion model using the MCMCPhase module in the Phase soft-ware package. Parameter values were estimated directly fromthe data. The analysis was run over 1,200,000 generations,sampling every 150 generations, with the Wrst 200,000 genera-tions discarded as burn-in, which was suYcient to allow log-likelihoods to plateau. We used three independent runs inorder to check for convergence in topology and parameterestimates. A 50% majority rule consensus topology was con-structed from the 20,001 (6667 samples from each of threeruns) sampled topologies using Paup¤4.08b (SwoVord, 1998).By way of comparison with the RNA-speciWc models, eachdata set was analysed under the best-Wt 4-state model usingthe Bayesian inference approach just described.

3. Results and discussion

3.1. Secondary structure of the ITS transcripts

Representative secondary structure models for ITS1 andITS2 are shown in Fig. 1a and b, respectively. These Wguresalso highlight putative core structural motifs including theITS1 motif GGCRY-(4–7n)-GYGYCAAGGAA (stem V,Fig. 1a), which was noted by Liu and Schardl (1994) and isthought to be conserved across plants. The general struc-ture of ITS1, a circular molecule with several stems, is con-sistent with the Wndings of other studies of angiosperms(e.g., Gottschling et al., 2001; Mayol and Rosselló, 2001;Albach and Chase, 2003; Goertzen et al., 2003). Key struc-tural motifs in the ITS2 (Fig. 1b) include the four-stemmed

structure itself, with stem III the longest; the U–U mis-match loop-hole in stem II; and the UGGU motif slightlyup-stream of the stem III hair pin loop, all of which areconsidered ‘hall-marks’ of ITS2 structure for eukaryotes(Schultz et al., 2005). The conservation of these structuresin all sequences included in the present study supports theview that these are functional copies of the ITS transcripts.

There are other indications that the secondary structurepredictions are biologically reasonable. First, the majorityof individual structures were found with high frequencyamongst sub-optimal predictions (data not shown), andputatively homologous structures were found consistentlyacross the broad taxonomic comparison. As an indication,the 50% majority-rule Myrtaceae consensus structure(Fig. 1) includes, for ITS1, 83% of the total pairings pre-dicted for Acmena acuminatissima (Fig. 1a); and 84% of thetotal pairings predicted for A. acuminatissima ITS2(Fig. 1b). Poorly determined structures (i.e., those found in<50% of MFE predictions for taxa included in the Myrta-ceae alignment) include the ITS1 stem II, and the ITS2 stemIV. While putatively homologous structures were predictedin all instances, the exact positioning of these two stems isnot apparently conserved. Second, several of the stems aresupported by at least one CBC. In ITS1 stem VII and ITS2stem II, for example, virtually all of the inferred nucleotidechanges are double transitions between Watson–Crickpairs, while in ITS1 stem IV, the central portion of the stemincludes fully compensated indels in several sequences (e.g.,Angophora, Eucalyptus, and Lophostemon). In theseregions, at least, the occurrence of multiple putatively com-pensated changes supports stem-pairing relationships overalternative explanations for apparent covariation such as‘phylogenetic coincidence’ (e.g., Hershkovitz and Zimmer,1996). Fig. 2 plots the number of CBCs for each pair-wisecomparison (Myrtaceae alignment) against maximum like-lihood distance (GTR+I+�). There is evidence for satura-tion of CBCs at moderate to high divergence levels: atmoderate divergence (ca. 35% pair-wise diVerence), thenumber of observed CBCs reaches a maximum of 9, whileamongst the most highly divergent sequences (ca. 50% pair-wise diVerence), there are 2–4 observed CBCs (Fig. 2).Inferred CBCs occur at relatively few sites across the entiremolecule, and these, presumably, are subject to unobservedsubstitutions amongst evolutionarily distant comparisons.

3.2. Structural partitioning

The view that the mutational dynamics of ITS may beinXuenced by RNA secondary structure (Alvarez and Wen-del, 2003) is strongly supported by the present study. Spe-ciWcally, there are marked deviations from the expectationsof the theory of neutral evolution (i.e., all base-pairs shouldoccur in equal frequencies and transversions should occurat approximately twice the rate of transitions) for the entiremolecule, while ‘stem’ and ‘loop’ partitions show patternsof bias that contrast strongly (Fig. 3). The signiWcance ofthese contrasts is supported by the Wnding that an

Page 8: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139 131

evolutionary model that allows for separate substitutionmodel parameters for stems and loops results in a signiW-cant improvement in likelihood (Table 2). The log likeli-hood for the randomly repartitioned data was notsigniWcantly higher than for the unpartitioned alignmentand therefore, the improved likelihood associated with thestem and loop partitioning scheme is unlikely to occur bychance. In the context of ‘stems’ and ‘loops’, there are indi-cations of evolutionary constraints on ITS rRNA second-ary structure, including the high G–C content and hightransition bias in the stems (particularly C M U changes),while loop regions show a strong bias towards adenine anda more balanced ratio of transitions/transversions (Fig. 3).Similar patterns have been observed across a range ofRNAs and are theoretically consistent with the operationof selection on structure to maintain function (e.g., Gutellet al., 2000; Higgs, 2000; Savill et al., 2001; Telford et al.,2005).

3.3. Paired-sites models

In addition to structural partitioning, the results of thisstudy support the use of paired-sites models over simpler 4-state models for the phylogenetic analysis of ITS. For the

unpermuted Syzygium stems-only alignment, the log likeli-hood (¡lnLD 1309.77) of the best Wt RNA16 model (Table3) is, on average, 112.75 log likelihood units higher thanthat for data sets where the stem-pairs are randomly re-ordered (¡ln LD1422.08§ 39.3). In other words, there isevidence for signiWcant covariation (i.e., non-independence)between nucleotides that form stem-pairs, as has been pre-viously demonstrated for an alignment of sequences fromthe SSU-rRNA region (Telford et al., 2005).

In some respects, the ITS alignments show mutationalpatterns that are consistent with those reported for rRNAgenes: for example, G·C and C·G stem-pairs have the highestfrequency, and the lowest mutability, while intermediate andmismatch pairs have low persistence (i.e., high mutability) inITS stems (Table 5). However, the best-Wtting RNA paired-sites models for both the Syzygium and Myrtaceae ITS align-ments were those which treat all changes as single substitu-tions. The RNA7A model returned a slightly higherlikelihood than the simpler RNA7C, although the improve-ment in likelihood was not signiWcant using the hLRT andthe latter model was favoured by the AICc (Table 3). Simi-larly, there was only a slight improvement in log-likelihoodfor the RNA16A, which permits double substitutions, overRNA16I and RNA16K, which do not (Table 3). Previously,

Fig. 1. (a) RNA secondary structure of the ITS1 transcript for Acmena acuminatissima. The dashed line highlights the conserved structural motif of Liuand Schardl (1994). The boxed section of stem IV indicates a region of length mutation, which in several sequences, includes fully compensated indels. (b)

a

RNA secondary structure of the ITS2 transcript for Acmena acuminatissima. Conserved structural motifs discussed by Schultz et al. (2005) are highlighted.Stem nucleotides in bold typeface are included in the majority-rule consensus structure for Myrtaceae.

Page 9: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

132 E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139

models which allow double substitutions (e.g., RNA7A, 7D,and RNA16A) have been identiWed as providing the highestlikelihood for the data for rRNA regions with conserved sec-ondary structure (Savill et al., 2001; Telford et al., 2005).

Theoretically, double substitutions can arise within apopulation where the intermediate state (G·U/U·G, ormismatch pair) is deleterious, and single substitutions aremaintained at low frequency by selection. A second sub-

Fig. 1 (continued)

b

Fig. 2. The number of fully compensated base pair changes, for all pairwise sequence comparisons in the Myrtaceae alignment, plotted against maximum-likelihood distance (GTR+I+�).

Page 10: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139 133

stitution may occur by chance within the low frequencyallele, and being selectively neutral relative to the domi-nant Watson-Crick pair, may rise to high frequency viarandom drift. From the perspective of single-sequence,

Fig. 3. (a) Base frequencies, estimated using GTR+I+�, for the completealignment of Syzygium ITS compared with stem and loop partitions; (b)exchangeability, for each class of nucleotide change (GTR+I+�, withG M U changes set to 1, as the reference rate) for the complete alignmentof Syzygium ITS compared with stem and loop partitions; (c) Ti:Tv ratio(HKY85+I+�) for the complete alignment of Syzygium ITS comparedwith stem and loop partitions.

a

b

c

species-level comparisons, this is a simultaneous doublesubstitution, because the intermediate is generally notobserved in the population consensus sequence. Alterna-tively, a two-step process, via Wxation of a less stableintermediate (e.g., G·C M G·U M A·U), could account forcompensatory base change. For the two-step model, theintermediate is assumed to be slightly or not deleterious,and can therefore occur at high frequency, as it is more-or-less selectively neutral relative to a Watson-Crick pair(Higgs, 2000; Savill et al., 2001). As indicated by themodel selection process (Table 3), a two-step-like model,which is approximated by RNA7C, 16I and 16K, appearsto adequately describe the inferred mutational dynamicsof Syzygium and Myrtaceae ITS stems.

Table 5 summarises key elements of the RNA7A ratematrix for the Myrtaceae, Syzygium and a small-subunit(SSU) rRNA alignment (see Higgs, 2000). These are theaverage rate of double transitions (rd), the average rate ofdouble transversions (rv), the average ‘forward’ rate forchanges from Watson-Crick pairs to G·U intermediates(rf) and the ‘reverse’ rate for G·U to Watson-Crick pairs(rr). The SSU shows a high rate of simultaneous doublesubstitutions (rd and rv), which are similar to (rv),orexceed (rd) the rate of single site transitions (rf and rr) bya factor of 3. Furthermore, the ratio of rr to rf is approxi-mately 13 for the SSU alignment, which shows that non-canonical pairings are infrequently formed (low rf) andhave a low persistence (high rr) in the stems. In otherwords, there is evidence for strong selection against G·U/U·G stem-pairs, and not surprisingly, the mutationaldynamics of the SSU is reasonably approximated bymodels that permit simultaneous double substitutions(Higgs, 2000; Savill et al., 2001; Telford, 2005). In con-trast, the ITS alignments for Syzygium and Myrtaceaehave a rate of single transitions which is far in excess ofthe double substitution rates (Table 5). Relative to theSSU, the ratio rf/rr is also more balanced (1.89 and 3.31for Syzygium and Myrtaceae, respectively), the implica-tion being that the ITS experience lower selective con-straint for energetic stability of stem-pairs, and therefore,the intermediate state is ‘tolerated’ at a high proportionof paired sites.

Table 2Comparison of likelihood scores for the un-partitioned Syzygium align-ment, and various partitioning schemes

The likelihood score (¡ln L) for the permuted (stems + loops) data is themean (§SE) based upon 100 randomly repartitioned matrices. The likeli-hood ratio statistic (�D 2£ diV. in likelihood scores) is based upon com-parison of the relevant 2 £GTR+I+� model with the simpler GTR+I+�and the signiWcance [P (LRT)] calculated using the �2 test with 11 degreesof freedom (n, number of model parameters).

Partition Model n ¡ln L � P (LRT)

None GTR+I+� 11 ¡3445.08Stems + loops 2 £ GTR+I+� 22 ¡3393.18 103.81 <0.001Permuted data ¡3437.04 § 2.41 16.08 ns(stems + loops)

Page 11: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

134 E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139

We considered the possibility that at least some G·U/U·G pairs in our data are positively selected for (Gautheretet al., 1995), which could inXuence our model selection andspeciWcally, the ratio rf/rr (Higgs, 2000). For the Myrtaceaealignment, we optimised the data, as described above,under RNA7A and RNA7C after removing 13 stem-pairsthat were predominantly (>50% of included sequences)G·U/U·G. This analysis favours the RNA7C(¡lnLD¡1476.06) as there was no signiWcant improve-ment in likelihood for the RNA7A model(¡lnLD¡1473.69) using the hLRT. With the G·U/U·Gpositions removed the ratio rr/rf is 5.26 (rfD 0.443, rrD 2.32),which remains appreciably lower than the equivalent ratiofor the SSU data of Higgs (2000).

From a ‘Wxed’ intermediate (G·U/U·G) a transition ineither paired site will restore a Watson-Crick pair, and ifthe four types of Watson-Crick pair are approximately

Table 4Substitution rate matrix for the RNA7A+I+� model, for the Myrtaceaeand Syzygium stem data

AU GU GC UA UG CG MM

Myrtaceae RNA7AAU — 0.5097 0.2141 0.0249 0.0000 0.0000 0.9840GU 0.5962 — 2.5746 0.0000 0.0000 0.0000 0.8666GC 0.0377 0.3878 — 0.0000 0.0000 0.0000 0.1275UA 0.0235 0.0000 0.0000 — 0.5540 0.1303 0.4204UG 0.0000 0.0000 0.0000 0.6818 — 1.5789 0.2104CG 0.0000 0.0000 0.0000 0.0179 0.1763 — 0.1886MM 0.5013 0.3774 0.3685 0.2274 0.0925 0.7423 —

Syzygium RNA7AAU — 0.5104 0.0001 0.0000 0.0000 0.0000 0.6419GU 0.4057 — 1.4890 0.0000 0.0000 0.0000 0.1672GC 0.0000 0.4751 — 0.0000 0.0000 0.0000 0.1770UA 0.0000 0.0000 0.0000 — 0.6402 0.0094 0.5975UG 0.0000 0.0000 0.0000 0.5283 — 1.1917 0.1685CG 0.0000 0.0000 0.0000 0.0016 0.2473 — 0.1403MM 0.5783 0.1895 0.6286 0.4812 0.1645 0.6596 —

equivalent in terms of selective advantage we wouldexpect that G·U/U·G ! G·C/C·G changes wouldapproximate the rate of G·U/U·G ! A·U/U·A. In ourITS comparisons, there is considerable bias toward theformer (4.1535, calculated from the Myrtaceae RNA7Arate matrix, Table 4) relative to the latter (1.1502), whichsuggests a stronger selective pressure for the maintenanceof G·C/C·G. Similarly, the equivalent ‘forward’ rates (i.e.,Watson-Crick pair to intermediate) are 0.564 and 1.064,respectively, i.e., non-canonical pairs are approximatelytwice as likely to result from a transition in an A·U/U·Apairing than a transition in a G·C/C·G. However, for a

Table 5Base-pair frequencies, mutabilities and substitution rate parameters forMyrtaceae, Syzygium and the rRNA1 (SSU) data-set of Higgs (2000),inferred under the RNA7A+I+� model

Syzygium Myrtaceae rRNA1

Base-pair frequenciesfAU 0.07 0.05 0.12fGU 0.09 0.04 0.02fGC 0.27 0.29 0.35fUA 0.06 0.06 0.17fUG 0.07 0.05 0.01fCG 0.36 0.40 0.26fMM 0.07 0.10 0.01

MutabilityAU 1.15 1.73 1.40GU 2.06 4.04 3.92GC 0.65 0.53 0.55UA 1.25 1.13 0.93UG 1.89 2.47 4.36CG 0.39 0.38 0.66MM 2.70 2.31 7.84

Substitution ratesrd 0.003 0.09 0.34rv 0.00 0.006 0.11rr 0.90 1.36 1.83rf 0.47 0.41 0.14

Table 3Likelihood scores (¡ln L), AIC, AICc and likelihood ratio test (LRT) for RNA substitution models estimated on the Myrtaceae and Syzygium test trees

K, number of estimated model parameters, n, sample size (number of positions in the alignment), �D 2£ diV. in likelihood scores. Note that the RNA16Acannot be compared to RNA16I or 16K on the basis of likelihood scores, and similarly, the RNA7 models cannot be compared with RNA16.

RNA model (+I+�) K ¡ln L AIC AICc LRT comparison df � P(LRT)

Syzygium (n D 232)RNA7A 28 ¡1213.28 2482.56 2490.56 7A/7C 11 0.051 nsRNA7C 17 ¡1213.30 2460.61 2463.46 7C/7D 6 23.38 <0.001RNA7D 11 ¡1224.99 2471.98 2473.17 7A/7D 17 23.43 ns

RNA16A 21 ¡1303.45 2648.91 2653.31RNA16I 23 ¡1308.22 2662.45 2667.76 16I/16K 5 1.885 nsRNA16K 18 ¡1309.17 2654.33 2657.55

Myrtaceae (n D 230)RNA7A 28 ¡1872.98 3801.96 3810.04 7A/7C 11 19.34 nsRNA7C 17 ¡1882.65 3799.30 3802.19 7C/7D 6 30.81 <0.001RNA7D 11 ¡1898.06 3818.11 3819.31 7A/7D 17 50.15 <0.001

RNA16A 21 ¡2129.01 4300.03 4304.47RNA16I 23 ¡2134.11 4314.22 4319.58 16I/16K 3.974 nsRNA16K 18 ¡2136.10 4308.20 4311.44

Page 12: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139 135

given observation, a G·U/U·G pair is more likely theresult of a mutation in a G·C/C·G, simply because thefrequency of G·C/C·G stem-pairs is several times higherthan that of A·U/U·A (Table 5). One consequence is thatdouble substitutions (i.e., CBCs) may occur at lowfrequency in ITS, and conversely, we would expect a highrate of hemi-compensated changes, particularly G·C/

C·G M G·U/U·G. This contrasts the expectations of‘typical’ RNA stems, where there is a generally high rateof double substitutions, but nevertheless, exempliWes ourthesis that constraints on secondary structure cangenerate strongly non-random mutational patterns inITS which should be considered for phylogeneticanalysis.

Fig. 4. Bayesian maximum-likelihood topology for Syzygium estimated using RNA7C+I+� (stems) and GTR+I+� (loops). Branch lengths are propor-tional to the number of changes. The indicated groups within Syzygium s.l. are those found by BiYn et al. (2006) in their analysis of chloroplast data. Boldbranches received a Bayesian posterior probability of 795%.

Page 13: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

136 E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139

3.4. Phylogenetic hypotheses

The phylogeny inferred for Syzygium from a structur-ally partitioned model using RNA7C+I+� for the stempartition is shown in Fig. 4. A phylogeny was constructedseparately for the un-partitioned matrix using GTR+I+�(tree not shown). Without exception, diVerences in topol-ogy and support are found amongst relationships thatare weakly supported (i.e., PP 6 95%) regardless of anal-ysis model, while well-supported branches are largelyconsistent with those inferred from cpDNA (BiYn et al.,2006). Therefore, we infer that the Syzygium ITS align-ment contains robust support for some nodes but insuY-

cient signal to resolve relationships amongst shortinternal branches, regardless of the evolutionary modelused. A further indication is that parsimony analyses,which use a relatively simple evolutionary model, Wndrelationships within Syzygium s.l. that are generally con-sistent with the BI topology in Fig. 4 (E. BiYn, unpub-lished results).

For the Myrtaceae-wide alignment, there are severaldiVerences in the topologies estimated under the RNA-spe-ciWc (Fig. 5) and standard 4-state (Fig. 6) models. SpeciW-cally, the 50% majority-rule topology for the GTR analysisis more resolved (77% of nodes retained) than that for thestructurally partitioned paired-sites model (66% of nodes

Fig. 5. Phylogeny of the Myrtaceae estimated from 45 ITS sequences using the RNA7C+I+� (stems) and GTR+I+� (loops) model. Branch lengths areproportional to the number of changes. The classiWcation of Myrtaceae follows Wilson et al. (2005), and with the exception of Psiloxylon, all included taxaare referable to their sub-family Myrtoideae. Branches receiving Bayesian posterior probability 795% are indicated in bold, and numbers to the left of theclade indicate the frequency of bipartitions retained in the 50% majority-rule consensus topology. Circled nodes indicate those that collapse in the major-ity-rule consensus tree.

Page 14: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139 137

retained) and, on average, the 4-state model shows highersupport for retained nodes (82%, vs. 71% for the RNA-spe-ciWc model). A possible explanation for these contrasts is anincreased variance in parameter estimates associated withthe more complex model (Buckley and Cunningham, 2002),although alternatively, an inadequately parameterisedmodel may be prone to a high rate of Type I error, particu-larly within a Bayesian framework (Buckley, 2002).

The phylogeny constructed using RNA7C+GTR (Fig. 5)Wnds groupings that are generally consistent with the tribalclassiWcation of Wilson et al. (2005), while some of the groupsresolved in the GTR topology are at odds with conventionalviews of myrtaceous relationships. In particular, the 4-statemodel resolves the Eucalypteae within a clade including Cha-melaucieae and Leptospermeae (PP 97%) and as paraphyletic,

with Arillastrum, Eucalyptopsis and Allosyncarpia resolvedas sister to a well supported clade (PP 96%) including Euca-lyptus, Corymbia, Angophora, and Chamelaucieae+ Lepto-spermeae (Fig. 6). Previous Myrtaceae-wide analyses ofcpDNA sequences have strongly supported the monophyly ofEucalypteae (Sytsma et al., 2004; Wilson et al., 2005) and thisrelationship is not controversial. A further diVerence fromother studies is the separation of Xanthostemoneae and Lop-hostemoneae, which, in the analysis of Sytsma et al. (2004) arestrongly supported as sister taxa (ML bootstrapD92%). OurGTR topology (Fig. 6) shows Xanthostemon and Lophoste-mon to be separated by supported nodes, though in the RNA-speciWc, structurally partitioned analysis, the monophyly ofXanthostemon+Lophostemon is neither supported norrejected because the relevant nodes are collapsed in the 50%

Fig. 6. Phylogeny of the Myrtaceae estimated from 45 ITS sequences using the GTR+I+�model, without partitioning of the data. Labels are as in Fig. 5.

Page 15: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

138 E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139

majority rule consensus tree (Fig. 5). In light of the above, therelationships inferred under the structurally partitioned,paired-sites model seem more credible than those that werefound when structural information was not included in theanalysis. Presumably, the structurally partitioned, paired-sitesmodel more accurately describes the mutational dynamics ofthe included sequences, and is therefore less prone to theimpact of systematic error relative to the un-partitionedGTR.

There is ongoing interest in the potential of ITS torecover relationships at higher taxonomic levels (e.g., Got-tschling et al., 2001; Huang and Shi, 2002; Goertzen et al.,2003) where the inXuence of systematic error may be moresevere than for phylogenetic inference amongst closelyrelated taxa. A speciWc concern is signal saturation whichsuggests a strategy of breaking up long-branches in order todisperse homoplasy (Simmons and Freudenstein, 2003).The expedient of including structurally partitioned, paired-sites models may also extend the phylogenetic utility of ITS.A general concern in the analysis of RNA sequences is theissue of homoplasy as a consequence of character non-inde-pendence. Homoplasy is likely because structural con-straints limit the number of possible substitution types at aparticular site, and therefore, unobserved substitutions aremore likely to accrue at that site. The structurally parti-tioned, RNA-speciWc models include more character stateswhen compared to standard nucleotide substitution mod-els, and it follows that convergent changes, which can gen-erate long-branch artifacts, are more likely to be detectedusing our approach. For example, we have modeled ratesfor several classes of C M U changes (Tables 4 and 5) whichprovide a more precise interpretation of our data than theconclusion that transitions, and particularly C MUchanges, are in excess amongst the ITS sequences, whichfollows from the un-partitioned GTR analysis (Fig. 3).

4. Conclusions

Phylogenetic methods can be substantially misled by theapplication of evolutionary models that fail to account forstructure-linked mutational bias (e.g., Kelchner, 2002;Alvarez and Wendel, 2003; Telford et al., 2005). We havedemonstrated that the use of structural partitioning andstem pair models provides substantial improvements interms of model-Wt over standard 4-state models such asGTR, and that in at least some instances, the complex mod-els provide more credible inferences of relationship fromthe phylogenetic analysis of ITS. The general conclusion isthat structural partitioning and paired-sites models shouldbe considered for the phylogenetic analysis of ITS sequencedata, although a limitation is thought to be the diYculty inobtaining realistic secondary structure predictions for ITS(e.g., Alvarez and Wendel, 2003). The results of the presentstudy show that ITS structure can be largely conserved atleast to family level, and provided the data can be reason-ably aligned, it should be possible to derive meaningful sec-ondary structure predictions through a combination of

MFE folding and covariation analysis (e.g., Gottschlinget al., 2001; Goertzen et al., 2003). Although we havefocused on angiosperm taxa, the Wndings of the presentstudy may be general for Eukaryotes, given the universalityof ITS and the functional equivalence implied by the broadconservation of several structural elements (Mai and Cole-man, 1997; Joseph et al., 1999; Schultz et al., 2005).

Acknowledgments

The authors wish to thank Eve Lucas (Royal BotanicGardens, Kew, UK), Caroline Chong (James Cook Univer-sity, Townsville, Australia), Peter de Lange (TerrestrialConservation Unit, Department of Conservation, NZ) andPeter Wilson (Royal Botanic Gardens, Sydney, Australia)for allowing the use of their unpublished sequences. Wewish to thank Lyn Cook and three anonymous reviewersfor helpful comments on the manuscript. This work was, inpart, funded by an Australian Biological Resources StudyPostgraduate scholarship awarded to E.B. and an Austra-lian Postgraduate Award scholarship held by M.G.H. Thissupport is gratefully acknowledged.

Appendix A. Supplementary data

Supplementary data associated with this article can befound, in the online version, at doi:10.1016/j.ympev.2006.08.013.

References

Albach, D.C., Chase, M.W., 2003. Incongruence in Veroniceae (Plantagin-aceae): evidence from two plastid and a nuclear ribosomal DNAregion. Mol. Phys. Evol. 32, 183–197.

Alvarez, I., Wendel, J.F., 2003. Ribosomal ITS sequences and plant phylo-genetic inference. Mol. Phylogenet. Evol. 29, 417–434.

Bailey, C.D., Carr, T.G., Harris, S.A., Hughes, C.E., 2003. Characterizationof angiosperm nrDNA polymorphism, paralogy, and pseudogenes.Mol. Phylogenet. Evol. 29, 435–455.

Baldwin, B.G., 1992. Phylogenetic utility of the internal transcribed spac-ers of nuclear ribosomal DNA in plants: an example from the Com-positae. Mol. Phylogenet. Evol. 1, 3–16.

BiYn, E., Craven, L.A., Crisp, M.D., Gadek, P.A., 2006. Molecular system-atics of Syzygium and allied genera (Myrtaceae): evidence from thechloroplast genome. Taxon 55.

Buckler, E.S., Holtsford, T.P., 1996. Zea ribosomal repeat evolution andsubstitution patterns. Mol. Biol. Evol. 13, 623–632.

Buckley, T.R., 2002. Model misspeciWcation and probabilistic tests oftopology: evidence from empirical data sets. Syst. Biol. 51, 509–523.

Buckley, T.R., Cunningham, C.W., 2002. The eVects of nucleotide substitu-tion model assumptions on estimates of nonparametric bootstrap sup-port. Mol. Biol. Evol. 19, 394–405.

Coleman, A.W., 2003. ITS2 is a double-edge tool for Eukaryote evolution-ary comparisons. Trends Genet. 19, 370–375.

Côté, C.A., Peculis, B.A., 2001. Role of the ITS2-proximal stem and evi-dence for indirect recognition of processing sites in pre-rRNA process-ing in yeast. Nucleic Acid Res. 29, 2106–2116.

Dixon, M.T., Hillis, D.M., 1993. Ribosomal RNA secondary structure:compensatory mutations and implications for phylogenetic analysis.Mol. Biol. Evol. 1, 256–267.

Doyle, J.J., Doyle, J.L., 1990. The isolation of plant DNA from fresh tissue.Search 12, 13–15.

Page 16: Structural partitioning, paired-sites models and evolution ...biology-assets.anu.edu.au/hosted_sites/Crisp/pdfs/Biffin2007_paired_sites_ITS.pdftive pseudogene copies of the ITS transcript

E. BiYn et al. / Molecular Phylogenetics and Evolution 43 (2007) 124–139 139

Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c.Distributed by the author. Department of Genetics, University ofWashington, Seattle.

Gardner, P.P., Giegerich, R., 2004. A comprehensive comparison of compar-ative RNA structure prediction approaches. BMC Bioinform. 5, 140.

Gautheret, D., Konings, D., Gutell, R.R., 1995. G·U base pairing motifs inribosomal RNA. RNA 1, 807–814.

Goertzen, L.R., Cannone, J.J., Gutell, R.R., Jansen, R.K., 2003. ITS sec-ondary structure derived from comparative analysis: implications forsequence alignment and phylogeny of the Asteraceae. Mol. Phylogenet.Evol. 29, 216–234.

Gottschling, M., Hilger, H.H., Wolf, M., Diane, N., 2001. Secondary struc-ture of the ITS1 transcript and its application in a reconstruction of thePhylogeny of Boraginales. Plant Biol. 3, 629–636.

Guindon, S., Gascuel, O., 2003. A simple, fast, and accurate algorithm toestimate phylogenies by maximum likelihood. Syst. Biol. 52, 696–704.

Gutell, R.R., Cannone, J.J., Shang, Z., Du, Y., Serra, M.J., 2000. A story:unpaired adenosine bases in ribosomal RNAs. J. Mol. Biol. 304, 335–354.

Gutell, R.R., Lee, J.C., Cannone, J.J., 2002. The accuracy of RNA compar-ative structure models. Curr. Opin. Struct. Biol. 12, 301–310.

Harrington, M.A., Gadek, P.A., 2004. Molecular systematics of the Acm-ena Alliance (Myrtaceae): phylogenetic analyses and evolutionaryimplications with respect to Australian taxa. Aust. Syst. Bot. 17, 63–72.

Hershkovitz, M.A., Lewis, L.A., 1996. Deep-level diagnostic value of therDNA-ITS region. Mol. Biol. Evol. 13, 1276–1295.

Hershkovitz, M.A., Zimmer, E.A., 1996. Conservation patterns in angio-sperm ITS2 sequences. Nucleic Acid Res. 24, 2857–2867.

Hershkovitz, M.A., Zimmer, E.A., Hahn, W.J., 1998. Ribosomal DNAsequences and angiosperm systematics. In: Hollingsworth, P.M., Bat-eman, R.M., Gornall, R.J. (Eds.), Molecular Systematics and PlantEvolution, The Systematics Association Special Volume Series 57, Tay-lor and Francis. pp. 269–326.

Higgs, P.G., 2000. RNA secondary structure: physical and computationalaspects. Q. Rev. Biophys. 33, 199–253.

Huang, Y.L., Shi, S.H., 2002. Phylogenetics of the Lytheraceae Sensu.Lato: a preliminary analysis based upon chloroplast rbcL gene, psA–ycf3 spacer, and nuclear rDNA internal transcribed spacer (ITS)sequences. Int. J. Plant Sci. 163, 215–225.

Hudelot, C., Gowri-Shankar, V., Jow, H., Rattray, M., Higgs, P., 2003.RNA-based phylogenetic methods: application to mammalian mito-chondrial RNA sequences. Mol. Phylogenet. Evol. 28, 241–252.

Joseph, N., Krauskopf, E., Vera, M.I., Michot, B., 1999. Ribosomal inter-nal transcribed spacer 2 (ITS2) exhibits a common core of secondarystructure in vertebrates and yeast. Nucleic Acid Res. 27, 4533–4540.

Jow, H., Hudelot, C., Rattray, M., Higgs, P.G., 2002. Bayesian phylogenet-ics using an RNA substitution model applied to early Mammalian evo-lution. Mol. Biol. Evol. 19, 1591–1601.

Juan, V., Wilson, C., 1999. RNA secondary structure prediction based onfree energy and phylogenetic analysis. J. Mol. Biol. 289, 935–947.

Kelchner, S.A., 2002. Group II introns as phylogenetic tools: structure,function and evolutionary constraints. Am. J. Bot. 89, 1651–1669.

Kjer, K.M., 2004. Aligned 18S and insect phylogeny. Syst. Biol. 53, 506–514.Knudsen, B., Hein, J., 2003. Pfold: RNA secondary structure prediction using

stochastic context-free grammars. Nucleic Acid Res. 31, 3423–3428.Lalev, A.I., Nazar, R.N., 1999. Structural equivalence in the transcribed

spacers of pre-rRNA transcripts in Schizosaccharomyces pombe.Nucleic Acid Res. 27, 3071–3078.

Lalev, A.I., Nazar, R.N., 2001. A chaperone for ribosome maturation. JBC276, 16655–16659.

Lalev, A.I., Abeyrathne, P.D., Nazar, R.N., 2000. Ribosomal RNA matu-ration in Schizosaccharomyces pombe is dependent on a large Ribonu-cleoprotein complex of the Internal Transcribed Spacer 1. Nucleic AcidRes. 27, 3071–3078.

Liston, A., Robinson, W.A., Oliphant, J.M., Alvarez-Buylla, E.R., 1996.Length variation in the nuclear ribosomal DNA internal transcribedspacer region of non-Xowering seed plants. Syst. Bot. 21, 109–120.

Liu, J.S., Schardl, C.L., 1994. A conserved sequence in internal transcribedspacer 1 of plant nuclear rRNA genes. Plant Mol. Biol. 26, 775–778.

Mai, J.C., Coleman, A.W., 1997. The internal transcribed spacer 2 exhibitsa common secondary structure in green algae and Xowering plants. J.Mol. Evol. 44, 258–271.

Mathews, D.H., Disney, M.D., Childs, J.L., Schroeder, S.J., Zuker, M.,Turner, D.H., 2004. Incorporating chemical modiWcation constraintsinto a dynamic programming algorithm for prediction of RNA sec-ondary structure. Proc. Natl. Acad. Sci. USA 101, 7287–7292.

Mayol, M., Rosselló, J.A., 2001. Why nuclear ribosomal DNAspacers (ITS) tell diVerent stories inQuercus. Mol. Phylogenet. Evol. 19,167–176.

Noller, H.F., 1984. Structure of ribosomal RNA. Annu. Rev. Biochem. 53,119–162.

Nickrent, D.L., Schuette, K.P., Starr, E.M., 1994. A molecular phylogenyof Arceuthobium (Viscaceae) based on nuclear ribosomal DNA inter-nal transcribed spacer sequences. Am. J. Bot. 81, 1149–1160.

van Nues, R.W., Rientjes, J.M.J., van der Sande, C.A.F.M., Zerp, S.F., Slu-iter, C., Venema, J., Planta, R.J., Raue, H.A., 1994. Separate structuralelements within internal transcribed spacer 1 of Saccharomyces cerevi-siae precursor ribosomal RNA direct the formation of 17S and 26SrRNA. Nucleic Acid Res. 22, 912–919.

Nylander, J.A.A., 2004. MrAIC.pl. Program distributed by the author.Evolutionary Biology Centre, Uppsala University.

Savill, N.J., Hoyle, D.C., Higgs, P.G., 2001. RNA sequence evolution withsecondary structure constraints: comparison of substitution rate mod-els using maximum likelihood methods. Genetics 157, 399–411.

Schöniger, M., von Haeseler, A., 1999. Toward assigning helical regions inalignments of ribosomal RNA and testing the appropriateness of evo-lutionary models. J. Mol. Evol. 49, 691–698.

Schultz, J., Maisel, S., Gerlach, D., Müller, T., Wolf, M., 2005. A commoncore of secondary structure of the internal transcribed spacer 2 (ITS2)throughout the Eukaryota. RNA 11, 361–364.

Simmons, M.P., Freudenstein, J.V., 2003. The eVects of increasing geneticdistance on alignment of, and tree construction from, rDNA internaltranscribed spacer sequences. Mol. Phylogenet. Evol. 26, 444–451.

Sugiura, N., 1978. Further analysis of the data by Akaike’s informationcriterion and the Wnite corrections. Commun. Stat. Theory MethodsA7, 13–26.

SwoVord, D.L., 1998. PAUP*: Phylogenetic Analysis Using Parsimony(*and other methods). Version 4.08b. Sinauer, Massachusetts.

Sytsma, K.J., Litt, A., Zjhra, M.L., Pires, J.C., NepokroeVk, M., Conti, E.,Walker, J., Wilson, P.G., 2004. Clades, clocks, and continents: historicaland biogeographical analysis of Myrtaceae, Vochysiaceae, andrelatives in the southern hemisphere. Int. J. Plant Sci. 165 (4 Suppl.),S85–S105.

Telford, M.J., Wise, M.J., Gowri-Shankar, V., 2005. Consideration of RNAsecondary structure signiWcantly improves likelihood-based estimates ofphylogeny: examples from the bilateria. Mol. Biol. Evol. 22, 1129–1136.

Tillier, E.R.M., Collins, R.A., 1998. High apparent rate of simultaneouscompensatory base-pair substitutions in ribosomal RNA. Genetics148, 1993–2002.

Wheeler, W.C., Honeycutt, R.L., 1988. Paired sequence diVerence in ribo-somal RNAs: evolution and phylogenetic implications. Mol. Biol. Evol.5, 90–96.

White, T.J., Bruns, T., Lee, S., Taylor, J., 1990. AmpliWcation and directsequencing of fungal ribosomal RNA genes for phylogenetics. In:Innis, M.D., Gelfand, D., Sninsky, J., White, T. (Eds.), PCR Protocols:A Guide to Methods and Applications. Academic Press, San Diego,pp. 315–322.

Wilgenbusch, J., De Querioz, K., 2000. Phylogenetic relationships amongthe Phrynosomatid Sand Lizards inferred from mitochondrial DNAsequences generated by heterogeneous evolutionary processes. Syst.Biol. 49, 592–612.

Wilson, P.G., O’Brien, M.M., Heslewood, M.M., Quinn, C.J., 2005. Rela-tionships within Myrtaceae sensu lato based on a matK phylogeny.Plant Syst. Evol. 251, 3–19.

Zuker, M., Jacobson, A.B., 1995. ‘ Well-determined’ regions in RNA sec-ondary structure prediction: analysis of small sub-unit ribosomalRNA. Nucleic Acid Res. 23, 2791–2798.