13
Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae) Andrew J. Alverson,* ,1 XiaoXin Wei, ,1 Danny W. Rice, 1 David B. Stern, 2 Kerrie Barry, 3 and Jeffrey D. Palmer 1 1 Department of Biology, Indiana University 2 Boyce Thompson Institute for Plant Research, Tower Road, Ithaca, NY 3 DOE Joint Genome Institute, Walnut Creek, CA  Present address: State Key Laboratory of Systematics and Evolutionary Botany, Institute of Botany, The Chinese Academy of Sciences, Xiangshan, Beijing 100093, China. *Corresponding author: E-mail: [email protected]. Associate editor: Charles Delwiche Abstract The mitochondrial genomes of seed plants are unusually large and vary in size by at least an order of magnitude. Much of this variation occurs within a single family, the Cucurbitaceae, whose genomes range from an estimated 390 to 2,900 kb in size. We sequenced the mitochondrial genomes of Citrullus lanatus (watermelon: 379,236 nt) and Cucurbita pepo (zucchini: 982,833 nt)—the two smallest characterized cucurbit mitochondrial genomes—and determined their RNA editing content. The relatively compact Citrullus mitochondrial genome actually contains more and longer genes and introns, longer segmental duplications, and more discernibly nuclear-derived DNA. The large size of the Cucurbita mitochondrial genome reflects the accumulation of unprecedented amounts of both chloroplast sequences (.113 kb) and short repeated sequences (.370 kb). A low mutation rate has been hypothesized to underlie increases in both genome size and RNA editing frequency in plant mitochondria. However, despite its much larger genome, Cucurbita has a significantly higher synonymous substitution rate (and presumably mutation rate) than Citrullus but comparable levels of RNA editing. The evolution of mutation rate, genome size, and RNA editing are apparently decoupled in Cucurbitaceae, reflecting either simple stochastic variation or governance by different factors. Key words: Cucurbitaceae, genome size, mitochondria, mutation rate, RNA editing. Introduction Seed plant mitochondrial genomes are exceptional for their generally very low mutation rate (Wolfe et al. 1987; Palmer and Herbon 1989), relatively high incidence of RNA editing and trans-splicing of coding sequences (Hiesel et al. 1989; Knoop 2004), frequent uptake of foreign DNA by intracel- lular and horizontal gene transfer (Stern and Lonsdale 1982; Richardson and Palmer 2007), dynamic structure (Lonsdale et al. 1988; Palmer and Herbon 1989), and, historically first and perhaps foremost, their extraordinarily large and highly variable sizes (Quetier and Vedel 1977; Ward et al. 1981; Hsu and Mullin 1989). Seed plants house by far the largest known mitochondrial genomes, with sizes ranging from 222 to 773 kb among the 16 genomes sequenced to date (http://www.ncbi.nlm.nih.gov/Genomes/). The full range of sizes well exceeds an order of magnitude, however, and most of this variation occurs within a single family, the Cucurbitaceae. In a landmark study, Ward et al. (1981) showed, based on analysis of reassociation kinetics, that cucurbit mitochondrial genomes vary in size from an estimated 390 kb in Citrullus lanatus (watermelon) to an astounding 2.9 Mb in Cucumis melo (muskmelon) (fig. 1). For perspective, the muskmelon mitochondrial ge- nome is bigger than the genomes of many free-living bac- teria (Moran 2002). The limited number of follow-up studies to Ward et al. (1981) has largely come up short in identifying the sources of the ‘‘extra’’ DNA in the largest cucurbit genomes. For example, size differences do not appear to reflect large- scale genome duplications (Havey et al. 1998) or major dif- ferences in gene content (Stern and Newton 1985; Adams et al. 2002). Although chloroplast-derived sequences are common in plant mitochondrial genomes, the larger cucur- bit genomes do not appear to contain disproportionate amounts of captured chloroplast DNA either (Stern et al. 1983; Havey et al. 1998). Small repetitive sequences can also comprise a substantial portion of plant mitochon- drial genomes (Andre ´ et al.1992), including in cucurbits (Ward et al. 1981; Lilly and Havey 2001). Indeed, a handful of small repetitive motifs account for as much as 13% of the large (;1.8 Mb) Cucumis sativus (cucumber) mitochon- drial genome (Lilly and Havey 2001), but reassociation ki- netics did not show a positive correlation between genome size and proportion of repetitive DNA across Cucurbita- ceae (Ward et al. 1981). Thus, the sources underlying major expansions in mitochondrial genome size in cucurbits, and © The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] 1436 Mol. Biol. Evol. 27(6):1436–1448. 2010 doi:10.1093/molbev/msq029 Advance Access publication January 29, 2010 Research article

Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

Insights into the Evolution of Mitochondrial Genome Sizefrom Complete Sequences of Citrullus lanatus and Cucurbitapepo (Cucurbitaceae)

Andrew J. Alverson,*,1 XiaoXin Wei,�,1 Danny W. Rice,1 David B. Stern,2 Kerrie Barry,3 andJeffrey D. Palmer1

1Department of Biology, Indiana University2Boyce Thompson Institute for Plant Research, Tower Road, Ithaca, NY3DOE Joint Genome Institute, Walnut Creek, CA

�Present address: State Key Laboratory of Systematics and Evolutionary Botany, Institute of Botany, The Chinese Academy ofSciences, Xiangshan, Beijing 100093, China.

*Corresponding author: E-mail: [email protected].

Associate editor: Charles Delwiche

Abstract

The mitochondrial genomes of seed plants are unusually large and vary in size by at least an order of magnitude. Much ofthis variation occurs within a single family, the Cucurbitaceae, whose genomes range from an estimated 390 to 2,900 kb insize. We sequenced the mitochondrial genomes of Citrullus lanatus (watermelon: 379,236 nt) and Cucurbita pepo (zucchini:982,833 nt)—the two smallest characterized cucurbit mitochondrial genomes—and determined their RNA editingcontent. The relatively compact Citrullus mitochondrial genome actually contains more and longer genes and introns,longer segmental duplications, and more discernibly nuclear-derived DNA. The large size of the Cucurbita mitochondrialgenome reflects the accumulation of unprecedented amounts of both chloroplast sequences (.113 kb) and shortrepeated sequences (.370 kb). A low mutation rate has been hypothesized to underlie increases in both genome size andRNA editing frequency in plant mitochondria. However, despite its much larger genome, Cucurbita has a significantlyhigher synonymous substitution rate (and presumably mutation rate) than Citrullus but comparable levels of RNA editing.The evolution of mutation rate, genome size, and RNA editing are apparently decoupled in Cucurbitaceae, reflecting eithersimple stochastic variation or governance by different factors.

Key words: Cucurbitaceae, genome size, mitochondria, mutation rate, RNA editing.

IntroductionSeed plant mitochondrial genomes are exceptional for theirgenerally very low mutation rate (Wolfe et al. 1987; Palmerand Herbon 1989), relatively high incidence of RNA editingand trans-splicing of coding sequences (Hiesel et al. 1989;Knoop 2004), frequent uptake of foreign DNA by intracel-lular and horizontal gene transfer (Stern and Lonsdale 1982;Richardson and Palmer 2007), dynamic structure (Lonsdaleet al. 1988; Palmer and Herbon 1989), and, historically firstand perhaps foremost, their extraordinarily large and highlyvariable sizes (Quetier and Vedel 1977; Ward et al. 1981;Hsu and Mullin 1989). Seed plants house by far the largestknown mitochondrial genomes, with sizes ranging from222 to 773 kb among the 16 genomes sequenced to date(http://www.ncbi.nlm.nih.gov/Genomes/). The full rangeof sizes well exceeds an order of magnitude, however,and most of this variation occurs within a single family,the Cucurbitaceae. In a landmark study, Ward et al.(1981) showed, based on analysis of reassociation kinetics,that cucurbit mitochondrial genomes vary in size from anestimated 390 kb in Citrullus lanatus (watermelon) to anastounding 2.9 Mb in Cucumis melo (muskmelon)(fig. 1). For perspective, the muskmelon mitochondrial ge-

nome is bigger than the genomes of many free-living bac-teria (Moran 2002).

The limited number of follow-up studies to Ward et al.(1981) has largely come up short in identifying the sourcesof the ‘‘extra’’ DNA in the largest cucurbit genomes. Forexample, size differences do not appear to reflect large-scale genome duplications (Havey et al. 1998) or major dif-ferences in gene content (Stern and Newton 1985; Adamset al. 2002). Although chloroplast-derived sequences arecommon in plant mitochondrial genomes, the larger cucur-bit genomes do not appear to contain disproportionateamounts of captured chloroplast DNA either (Sternet al. 1983; Havey et al. 1998). Small repetitive sequencescan also comprise a substantial portion of plant mitochon-drial genomes (Andre et al.1992), including in cucurbits(Ward et al. 1981; Lilly and Havey 2001). Indeed, a handfulof small repetitive motifs account for as much as 13% of thelarge (;1.8 Mb) Cucumis sativus (cucumber) mitochon-drial genome (Lilly and Havey 2001), but reassociation ki-netics did not show a positive correlation between genomesize and proportion of repetitive DNA across Cucurbita-ceae (Ward et al. 1981). Thus, the sources underlying majorexpansions in mitochondrial genome size in cucurbits, and

© The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, pleasee-mail: [email protected]

1436 Mol. Biol. Evol. 27(6):1436–1448. 2010 doi:10.1093/molbev/msq029 Advance Access publication January 29, 2010

Research

article

Page 2: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

across all seed plants for that matter, are largely unknown.In fact, most plant mitochondrial DNA (as much as 90%)is noncoding (Kubo and Mikami 2007), and the originsof most of this sequence are unknown. Clearly moredata—preferably whole-genome sequences—are necessaryto understand the ebb and flow of these remarkablegenomes.

In this study, we present complete mitochondrialgenome sequences and RNA editing data for Ci. lanatus(watermelon) and Cucurbita pepo (zucchini). With size es-timates of 390 kb and 1.0 Mb, Citrullus and Cucurbita havethe smallest characterized cucurbit mitochondrial ge-nomes (Ward et al. 1981). Even so, the nearly 1-Mb (asdetermined in this study) mitochondrial genome of Cucur-bita is the largest organelle genome sequenced to date, ex-ceeded only by the slightly larger 1.02-Mb genome of the‘‘chromatophore’’ (a nascent photosynthetic organelle) ofthe freshwater amoeboid, Paulinella (Nowack et al. 2008).The early diverging phylogenetic positions of Cucurbita andCitrullus within the Cucurbitaceae are such that these twospecies offer an important glimpse into the ancestral fea-tures of cucurbit mitochondrial genomes, that is, before theevents that led to even greater size expansions in Cucumis(fig. 1). Analysis of these two genomes reveals a number ofsurprising features and important insights into the evolu-tion of genome size in plant mitochondria.

Materials and Methods

Mitochondrial DNA Isolation, GenomeSequencing, and AssemblyMitochondria were isolated from etiolated seedlings of Ci.lanatus (cultivar [cv.] Florida Giant) and Cu. pepo (cv. DarkGreen Zucchini) using the DNAse I procedure (Kolodnerand Tewari 1972), and mitochondrial DNA was purifiedfrom lysed mitochondria by CsCl centrifugation (Palmer1982). We chose these cultivars rather than those usedby Ward et al. (1981) because of the availability of al-

ready-purified mitochondrial DNA. Relative proportionsof mitochondrial DNA and contaminant chloroplast andnuclear DNA were assessed by Southern hybridization ofconserved mitochondrial (cob), chloroplast (rbcL), and nu-clear (SSU ribosomal DNA) probes using a chemilumines-cent detection protocol (Nakazato and Gastony 2006).

A single 3-kb library was made for each of Citrullus andCucurbita. Library construction, cloning, and Sangersequencing were carried out by the US DOE Joint GenomeInstitute in Walnut Creek, CA. Detailed protocols areavailable at http://www.jgi.doe.gov/sequencing/protocols/prots_production.html.

Sequence reads initially were assembled with CAP3(Huang and Madan 1999). Consed (Gordon et al. 1998)was then used to calculate library statistics on the initialassembly, and CAP3 was run again with forward–reverseconstraints. These steps were repeated with refined for-ward–reverse constraints until the CAP3 assembly no lon-ger improved. For both genomes, CAP3 assembled the vastmajority of reads into a single contig. Consed was then usedto visualize and validate the final assemblies and to designpolymerase chain reaction (PCR) primers for filling gapsand augmenting regions of low sequence coverage. Anno-tated genome sequences are available from GenBank (ac-cession numbers GQ856147 and GQ856148).

Gene AnnotationWe made amino acid databases for protein-coding genesand nucleotide databases for ribosomal RNA (rRNA)and transfer RNA (tRNA) genes, compiled from all previ-ously sequenced seed plant mitochondrial genomes. NCBI-BlastX and -BlastN searches of the genomes against thesedatabases were performed to find protein and structuralRNA genes, respectively. We also used tRNAscan-SE (Loweand Eddy 1997) to corroborate tRNA boundaries identifiedby BlastN. Blast and tRNAscan output were converted intoan HTML display format similar to that used for the anno-tation of chloroplast and animal mitochondrial genomes(Wyman et al. 2004), which allowed clear visualization ofgene and intron boundaries. Relevant annotation datawere entered into a web-based form and written to a Se-quin-formatted table file with a set of Perl and CGI scripts.

Analysis of Intergenic SequencesEach genome was searched against a database of all previ-ously sequenced seed plant mitochondrial genomes withNCBI-BlastN. All BlastN searches used the following set-tings unless stated otherwise: r 5 5, q 5 �4, G 5 8, E 56, and W 5 7. Visual inspection of BlastN hits of eachgenome to a database of all fully sequenced seed plantmitochondrial genomes identified regions encompassingand extending beyond coding regions that were conservedacross eudicots, angiosperms, or all seed plants. The strongsyntenic and sequence-level conservation suggests thatthese regions might contain trans-spliced introns, pro-moters, untranslated regions, or otherwise importantsequences. Boundaries of these putatively functional ‘‘con-served syntenic regions’’ were manually determined and

102030 0

Citrullus lanatus

Cucurbita pepo

Cucumis sativus

Cucumis melo

Age (million years)

FIG. 1. Evolution of mitochondrial genome size (shown proportionalto oval size) in Cucurbitaceae. Genome sizes for Citrullus (379,236nt) and Cucurbita (982,833 nt) are based on this study. Genomesizes for Cucumis sativus (;1.8 Mb) and Cucumis melo (;2.9 Mb)are estimates from Ward et al. (1981) that are in good agreementwith values from draft genome sequences (Alverson AJ, Palmer JD,unpublished data). Phylogenetic relationships and divergence timesare based on Schaefer et al. (2009).

Evolution of Plant Mitochondrial Genome Size · doi:10.1093/molbev/msq029 MBE

1437

Page 3: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

annotated for each gene (supplementary figs. 1 and 2, Sup-plementary Material online).

Chloroplast-like sequences were identified with NCBI-BlastN searches of mitochondrial genomes against a data-base of representative angiosperm chloroplast genomes.The remaining intergenic regions were extracted andsearched against the following databases maintained bythe National Center for Biotechnology Information (NCBI):the nonredundant nucleotide and protein databases, thewhole-genome shotgun database, and the est_others data-base. BlastN hits to plant mitochondrial genomes were pre-cluded with settings ‘‘all[filter] NOT (viridiplantae[ORGN]AND mitochondrion[filter].’’ Nuclear-derived insertionsare common in plant mitochondrial genomes, and trans-posable elements are often the hallmark of these sequences(Satoh et al. 2006). With this in mind, each genome was alsosearched against the Repbase repetitive element database(version 13.05; Jurka 2000).

Analysis of Repeated SequencesShort conserved repeats with mismatches and indels werefound by searching each genome against itself using Wash-ington University (WU)-Blast with the following settings:M 5 1, N 5 3, Q 5 3, and R 5 3, kap, span, B 5 1 �109, and W 5 7. With these settings, WU-Blast detectedperfect repeats of minimum length 19 nt and imperfectrepeats of minimum length 23 nt. The minimum percentidentify for imperfect repeats was 78.6%, for repeats of154 nt and larger. All Blast hits with an expect value �1were considered repeats.

RNA EditingTotal RNA was isolated from fresh young leaves. RNA prep-aration, cDNA synthesis, and experimental safeguards toidentify potential contaminating genomic DNA followedMower and Palmer (2006). Genome sequences were usedto design PCR primers that immediately flanked genes(supplementary table 1, Supplementary Material online),based on the assumption that these regions comprisedparts of the 5# and 3# untranslated regions. In some cases,these primers failed to amplify any product, so primerswithin the coding sequence were used, resulting in cDNAsequences that were incomplete at one or both ends or, insome cases, the middle of the gene (supplementary table 3,Supplementary Material online). Thus, our data representminimal estimates for both the number of total edits pergene and the number of shared edits between the two spe-cies. Gene sequences were screened with PREP-Mt to en-sure that internal primers were not located in regions withpredicted editing sites (Mower and Palmer 2006; Mower2009). PCR conditions were as follows: 36 cycles of (45 sat 94 �C, 45 s at 48–55 �C, and 2–2.5 min at 72 �C), withan initial step of 3 min at 94 �C and a final step of 10 min at72 �C. Amplicons were purified with ExoSAP-IT (USBCorporation, Cleveland, OH) and directly sequenced withan ABI 3730 (Applied Biosystems, Foster City, CA). RNAediting sites were determined by comparing cDNA andgenomic sequences. Sites with both T and C chromatogram

peaks clearly above background in a majority of sequencedstrands were considered partially edited. If any ambiguityexisted about whether the site was fully or partially edited,it was scored as fully edited.

Analysis of Nucleotide Substitution RatesWe extracted nucleotide sequences for all protein genes(excluding introns) from all 18 completely sequenced seedplant mitochondrial genomes. Sequences were aligned withMUSCLE (version 3.6; Edgar 2004), and ambiguouslyaligned regions were manually adjusted with MacClade(version 4.07). We excluded the following regions from sub-sequent analyses: 1) codons with known RNA editing inArabidopsis, Brassica, Beta, Oryza, Citrullus, or Cucurbita,2) regions for which positional homology could not be de-termined with confidence, and 3) regions for which datawere missing from a majority of taxa. We also excluded sev-eral genes that are missing from many of the sequencedgenomes (sdh3, sdh4, rpl2, rpl5, rpl6, rps1, rps2, rps8,rps10, rps11, rps14, and rps19), resulting in a final alignmentof 30 concatenated genes and 25,553 positions.

We constrained the tree topology to reflect known phy-logenetic relationships (Kellogg and Birchler 1993; Soltiset al. 2000; Barker et al. 2001; Mathews et al. 2002) and es-timated rates of synonymous (dS) and nonsynonymous(dN) substitutions across the tree using the MG94W9 co-don model, allowing for independent estimates of dS anddN on each branch. We also performed relative rate testsusing the same model, separately constraining both dS anddN between Citrullus and Cucurbita, with Carica as the out-group. All analyses were performed with HyPhy (version0.9920070130 for Macintosh).

Results and Discussion

Genome Size and CharacteristicsThe Citrullus and Cucurbita mitochondrial genomes assem-bled into single circular-mapping (Bendich 1996) moleculesof lengths 379,236 nt and 982,833 nt, respectively (table 1).The sizes of these two genomes are remarkably close to thesize estimates based on reassociation kinetics (;390 kb and;1 Mb, respectively) (Ward et al. 1981). Although the es-timates of Ward et al. (1981) are conventionally cited as 330kb for Citrullus and 800 kb for Cucurbita (e.g., Lilly andHavey 2001), these values are problematic owing to 1)rounding issues, 2) uncertainty in converting the megadal-ton estimates of Ward et al. (1981) to kilobases, and 3) theuse of a reassociation-kinetics size standard (Bacillus sub-tilis) of uncertain size. A proper accounting of these givessize estimates of 391, 1014, 1837, and 2936 kb for the mi-tochondrial genomes of Citrullus, Cucurbita, Cu. sativus,and Cu. melo, respectively (fig. 1). Still, some uncertaintyremains in comparing these estimates with the more pre-cise sizes determined in this study. The estimates of Wardet al. (1981) used different cultivars than did the presentstudy, and the genome size of the B. subtilis strain(#746) used by Ward et al. (1981) has not yet been deter-mined. The limited size variation (4,187–4,293 kb) among

Alverson et al. · doi:10.1093/molbev/msq029 MBE

1438

Page 4: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

the five B. subtilis strains so far sequenced suggests, how-ever, that this is unlikely to contribute significant error tothe above estimates.

Although difficult to reconstruct with so few genomes,the size disparity between the two species appears toreflect a dynamic history of expansion and possibly con-traction (fig. 1). Like Citrullus, the common ancestor of cu-curbits might have had a relatively compact genome, witha series of independent expansions leading to the large ge-nomes in Cucurbita and Cucumis (fig. 1). Alternatively, thecommon ancestor of cucurbits might have possessed anunusually large mitochondrial genome, with a contractionresulting in the relatively small mitochondrial genome ofCitrullus (fig. 1). Clearly more data are necessary to distin-guish among the possible scenarios. No correlation is seenbetween the mitochondrial and nuclear genome sizes ofthese four species (Ren et al. 2009).

Using a Blast expect cutoff of 1 � 10�6, Citrullus andCucurbita share ;240 kb of genomic sequence, both cod-ing and noncoding, which translates to roughly 63% and25% genomic coverage, respectively. This is somewhatmore than the amount of shared sequence between twoother confamilial species, Arabidopsis and Brassica, whichwere found to share 143 kb of genomic sequence (Handa2003). The relatively small size of the Brassica genome (222kb) does, however, predict a lower amount of sharedsequence between it and Arabidopsis. Coverage by mito-chondrial-like sequence increases to 74% for Citrullus and38% for Cucurbitawhen all seed plant mitochondrial genomesare considered. So although most previously sequenced plantmitochondrial DNA is species specific (Kubo and Newton2008), a large fraction of the modestly sized Citrullus mito-chondrial genome is not unique to this genome.

Genomic coverage by genes and introns totals ;70 kbfor each of the two species (table 1). Conserved syntenicregions—genes, introns, and conserved flanking sequences(see Materials and Methods)—likely include most of thefunctional sequence in the genome. These regions total

;102 and ;94 kb in Citrullus and Cucurbita, respectively(table 1; supplementary figs. 1 and 2, Supplementary Ma-terial online). Therefore, as in other seed plants, most of thesequence in these genomes is noncoding and probablynonfunctional.

Gene Complement and SyntenyBoth genomes share the same core set of 37 intact proteingenes and 3 rRNA genes. Gene content in Citrullus and Cu-curbita is consistent with the results of a Southern hybrid-ization survey of mitochondrial gene content across 280diverse angiosperms, which included the closely related cu-curbit species, Cu. sativus (Adams et al. 2002). The one ex-ception is that Cu. sativus has apparently very recently lostthe rps19 gene (Adams et al. 2002), which is present twice inboth Citrullus and Cucurbita. One of the rps19 genes is partof the rpl2–rps19–rps3–rpl16 arrangement conserved as farback as liverworts (Takemura et al. 1992), whereas the sec-ond copy is part of an rps19–rps10–cox1 cluster unique tothese two cucurbits (supplementary figs. 1 and 2, Supple-mentary Material online). Like other eudicots (Adams et al.2002), Citrullus and Cucurbita lack the rps2 and rps11 genes,which among sequenced seed plants are found only in Cy-cas and grasses (rps2) or just Cycas (rps11). Finally, with anadditional sdh3 gene and slightly more coding sequenceacross shared genes (table 1), the smaller Citrullus genomehas more overall coding sequence than does Cucurbita.

A total of 14 syntenic gene clusters (defined as two ormore colinear and identically oriented genes) are sharedbetween the two genomes. Many of the syntenic clustersrepresent maintenance of well-characterized, highlyconserved arrangements and cotranscription units (e.g.,Takemura et al. 1992; Perrotta et al. 1996; Quinoneset al. 1996; Hoffmann et al. 1999; Placido et al. 2006),whereas others are present across a more restricted phylo-genetic range (supplementary figs. 1 and 2, SupplementaryMaterial online). With the exception of clusters 12 and 13,the arrangement of the clusters is essentially scrambled be-tween the two genomes (fig. 2). This high level of rearrange-ment is entirely expected (Palmer and Herbon 1989; Satohet al. 2006; Allen et al. 2007).

Most genes show the expected, highly conserved level ofsequence and structural conservation. For example, twopreviously characterized sets of overlapping genes, rps3–rpl16 and cox3–sdh4, were found in both genomes (Take-mura et al. 1992; Giege et al. 1998). As in other angiosperms,the rpl16 genes of both species likely use a GTG start codon(Bock et al. 1994; Sakamoto et al. 1997), and the ‘‘t-ele-ment’’ (likely modified from a chloroplast-derived trnIgene) immediately downstream from the ccmC gene prob-ably facilitates formation of the 3# terminus of the tran-script in both cucurbits, just as it does in Arabidopsis(Forner et al. 2007). Finally, the rps14 gene in both genomesis likely nonfunctional due to numerous indels that disruptthe reading frame. Blast searches to expressed sequence tagdatabases of Citrullus, Cu. sativus, and Cu. melo found anintact and full-length mitochondrial rps14 homolog inCu. melo that showed the expected, high level of sequence

Table 1. Genome Coverage (number of nonredundant nucleotidepositions in the genome) by Coding and Noncoding Features inthe Citrullus and Cucurbita Mitochondrial Genomes.

Class Feature Citrullus (%) Cucurbita (%)

Total Size 379,236 982,833Coding Protein exons 32,370 (8.5) 32,032 (3.3)

Cis-spliced introns 32,476 (8.6) 30,557 (3.1)rRNA 5,148 (1.4) 5,109 (0.5)tRNA 1,358 (0.4) 966 (0.1)Conserved syntenic

regionsa102,531 (27.0) 94,803 (9.6)

Noncoding Mitochondrial-likeb 159,032 (41.9) 180,008 (18.3)Chloroplast-like 22,779 (6.0) 113,347 (11.5)Nuclear-like

Transposableelements

20,914 (5.5) 17,820 (1.8)

Protein genes 3,438 (0.9) 2,818 (0.3)

a Includes all genes, cis- and trans-spliced introns, and the highly conserved(putatively functional) sequences immediately flanking them.b Intergenic sequences with similarity to DNA in at least one other seed plantmitochondrial genome, excluding chloroplast-like sequences.

Evolution of Plant Mitochondrial Genome Size · doi:10.1093/molbev/msq029 MBE

1439

Page 5: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

divergence for a gene transferred to the nucleus in thecommon ancestor of these species.

IntronsThe smaller Citrullus genome actually contains more andlonger cis-spiced introns (table 1). The two species share19 cis- and 5 trans-spliced group II introns, fully 15 of whichare longer in Citrullus. Citrullus also contains the well-char-acterized cox1 group I intron, which has spread widelyacross angiosperms by horizontal transfer (Sanchez-Puerta

et al. 2008). The cox1 intron is also known from three Cu-cumis species, indicating its gain in the Citrullus–Cucumislineage some 20–30 Ma, following the split from Cucurbita(fig. 1) (Sanchez-Puerta et al. 2008). Altogether, the Citrullusgenome contains nearly 2 kb of additional intronicsequence compared with the larger Cucurbita genome(table 1). The ;1.8-Mb genome of their relative, Cu. sativus,contains the largest known plant mitochondrial introns(sometimes two to three times larger than homologousintrons in other land plants) for three surveyed genes

FIG. 2. The mitochondrial genomes of Citrullus (inner circle) and Cucurbita (outer circle), drawn proportional to genome size and displayed ascircles, without implying that this is their in vivo conformation (Bendich 1996). Features on forward and reverse strands are drawn on the outsideand inside of the circles, respectively. Protein-coding, rRNA, and tRNA genes are all shown in black, introns in white, and chloroplast-derivedinsertions in dark gray. Syntenic gene blocks conserved between the two genomes are numbered (1–14) and delimited by light gray boxes on therespective circles. In Citrullus, the black arrows mark the locations of a large 7.3-kb inverted repeat. Of the many chloroplast-derived genes presentin the two genomes, only those that encode potentially functional tRNAs are annotated. In some cases, the scale of tRNAs and short chloroplast-derived regions was increased to improve their legibility. Genome maps were made with OGDRAW (Lohse et al. 2007).

Alverson et al. · doi:10.1093/molbev/msq029 MBE

1440

Page 6: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

(Bartoszewski et al. 2009). Citrullus might therefore mark anearly stage of intron growth that would later accelerate andbecome a source of expansion in the enormous Cucumisgenomes (fig. 1) (Bartoszewski et al. 2009).

Transfer RNAsAlthough both genomes use all 64 codons and show highlysimilar patterns of codon usage (not shown), tRNA com-plement differs between the two species. We classifiedtRNAs based on origin (chloroplast or mitochondrial)and whether they were embedded within larger tracts ofcaptured chloroplast DNA (supplementary table 2, Supple-mentary Material online). The Citrullus and Cucurbitamitochondrial genomes encode 18 and 13 intact and pu-tatively functional tRNAs, respectively, that lie outsidelarger segments of chloroplast-derived segments (supple-mentary table 2, Supplementary Material online). In bothgenomes, three of these tRNAs (trnH-GTG, trnM-CAT, andtrnN-GTT) are nevertheless chloroplast in origin. Both ge-nomes lack several tRNAs that are present in either bryo-phytes (trnA and trnT) or bryophytes and Cycas (trnR andtrnL) (Li et al. 2009) but are commonly missing from an-giosperm mitochondrial genomes. Codons for these aminoacids are abundant in both genomes, so missing tRNAs arelikely encoded in the nucleus (Dietrich et al. 1996). Citrullushas duplicate copies of three tRNAs (trnC-GCA, trnG-GCC,and trnQ-TTG), and Cucurbita has lost both native trnSvariants that are otherwise universally present acrosssequenced seed plant mitochondrial genomes. The mito-chondrial genome of the hornwort, Megaceros aenigmati-cus, is the only other land plant known to lack native trnSgenes (Li et al. 2009). So altogether, the smaller Citrullusgenome contains five more tRNAs than Cucurbita.

The Citrullus and Cucurbita mitochondrial genomescontain substantial amounts of chloroplast-derived se-quences (see next section), many of which contain the ex-pected tRNAs. Citrullus and Cucurbita have 8 and 24 suchtRNAs in their mitochondrial genomes, respectively (sup-plementary table 2, Supplementary Material online). Ofthese, 7 and 15 are intact and potentially functional, respec-tively, with the rest appearing to have degenerated to thepoint of being nonfunctional. In some cases, the same syn-tenic tract of chloroplast sequence contains both intactand degenerate tRNAs. The apparently differential con-straints on the embedded chloroplast tRNAs providecircumstantial evidence that some of them might be func-tional. For example, of the 5 chloroplast-derived trnS genesin Cucurbita, 3 remain intact and 2 of these recognize thesame codons as their notably absent mitochondrial homo-logs (see above), making them candidates for unusually re-cent, functional replacement of native copies.

Noncoding and Promiscuous SequencesMost of the sequence in both genomes—73% in Citrullusand 90% in Cucurbita—is intergenic, lying outside of con-served syntenic regions (table 1). A large fraction of theseintergenic sequences, 159–180 kb, shows similarity to pre-viously sequenced seed plant mitochondrial DNA (table 1),

excluding chloroplast-like sequences. Chloroplast-derivedDNA accounts for 1–9% of sequenced seed plant mito-chondrial genomes (Kubo and Mikami 2007; Goremykinet al. 2009), so in this respect, Citrullus resembles the typicalplant mitochondrial genome, containing 23 kb (6% cover-age) of chloroplast-derived sequence distributed among 20distinct regions in the genome (fig. 2 and table 1; supple-mentary fig. 1, Supplementary Material online). Cucurbita,on the other hand, has a remarkable 113 kb of chloroplast-derived sequence—fully 1.7–29 times more than other fullysequenced seed plant mitochondrial genomes. Put anotherway, the Cucurbita mitochondrial genome contains morechloroplast DNA than it does mitochondrial genes andintrons combined (table 1). Chloroplast sequences are di-vided among 29 distinct regions, ranging from 92 to 18,534nt in length (fig. 2; supplementary fig. 2, SupplementaryMaterial online). The regions are relatively large (medianlength 5 2.3 kb), with nine exceeding 5 kb and two exceed-ing 15 kb in length. At 16.6 and 18.5 kb, the latter two frag-ments are among the largest contiguous stretches ofchloroplast-derived DNA so far characterized in plant mi-tochondria, though much of the 25 kb of chloroplast DNAin maize likely arrived as a single segment that was subse-quently fragmented inside the mitochondrial genome (Clif-ton et al. 2004; Allen et al. 2007). Counting twice thoseregions that map entirely within both copies of the largechloroplast inverted duplication, the 29 regions cover 79%of the Cu. sativus chloroplast genome. Some regions of thechloroplast genome are represented more than once in themitochondrial genome, reflecting either multiple indepen-dent transfers or single transfers that were subsequentlyduplicated inside the mitochondrial genome.

Plant mitochondrial genomes typically house somesmall fraction of discernibly nuclear-derived sequences,most commonly identified as transposable elements.The mitochondrial genome of the lycophyte, Isoetesengelmannii, is exceptional in that it contains degenerateintergenic sequences matching an auxin-responsive tran-scription factor and a phytochrome gene, both of whichare encoded in the nucleus (Grewe et al. 2009). Althoughnuclear sequences are generally more difficult to detect,detailed studies of a few species have nevertheless shownthat �5% of their mitochondrial DNA can be traced to thenucleus (Knoop et al. 1996; Unseld et al. 1997; Notsu et al.2002). The Citrullus and Cucurbita mitochondrial genomescontain 24 kb (6.4%) and 21 kb (2.1%), respectively, ofclearly identifiable nuclear-derived sequences, most ofwhich resemble copia- and gypsy-like retrotransposons(table 1). Both genomes also contain regions with strongmatches to nuclear protein-coding genes. Citrullus and Cu-curbita each contain sequences with similarity to an (R)-mandelonitrile lyase gene and a lectin protein kinase gene.In both cases, the gene fragments cover large and similartracts of their cognate nuclear copies. A close homolog ofthe mandelonitrile lyase gene in the nuclear genome ofArabidopsis (GenBank GI: 15238300) has two introns,and whereas virtually the entire length of the Cucurbitafragment is from exon 2, the longer Citrullus fragment

Evolution of Plant Mitochondrial Genome Size · doi:10.1093/molbev/msq029 MBE

1441

Page 7: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

covers much of exons 2 and 3 along with the interveningintron, which indicates that the transfer did not involve anRNA intermediate. For both species, the lectin protein ki-nase fragments fall within the large second exon of a homo-log in the nuclear genome of Populus (GenBank GI:116256320). The lectin protein kinase fragment, which isnearly full length in Citrullus, is divided between two dis-tantly spaced fragments in Cucurbita, an apparent conse-quence of intramolecular recombination following thetransfer.

Given that most plant mitochondrial DNA (as much as80–90%) shows no similarity to known sequences, one hy-pothesis is that much of the variation in genome size re-flects different amounts of DNA acquired (and retained)from the large and mostly noncoding plant nuclear ge-nome (Palmer 1990). It therefore came as a surprise thatthe smaller Citrullus mitochondrial genome contains moreidentifiably nuclear DNA than does the Cucurbita genome.The unsequenced nuclear genomes of Citrullus (430 Mb)and Cucurbita (539 Mb) (Ren et al. 2009) represent hugereservoirs of unexamined sequence, some of which couldhave found its way into the mitochondrial genome. In ad-dition to the present availability of only a few nuclear ge-nome sequences from relatively distantly related plants, thechallenge of identifying putative nuclear sequences is fur-ther complicated by the possibility that the ancestral nu-clear genomes of these species could have been muchlarger. A large fraction of the sequence in these two mito-chondrial genomes—21% in Citrullus and 58% in Cucurbi-ta—shows weak or no similarity to known sequences, sothe possibility remains that they might contain substantialamounts of additional nuclear-derived DNA.

RepeatsReassociation kinetics suggested that 5–10% of the se-quence in the Citrullus and Cucurbita mitochondrial ge-nomes consists of low-complexity repetitive DNA (Wardet al. 1981). Consistent with this estimate, Citrullus has1,154 repeats that cover 10% of the genome, based onour Blast settings and an expect cutoff of 1. The largest re-peat—a 7.3-kb inverted repeat—creates duplicate copies ofthe sdh3, trnQ, and trnG genes. The short 3-kb clone libraryused to sequence the Citrullus genome (see Materials andMethods) could not provide insights into whether this re-peat engages in high-frequency recombination, as would beexpected for a repeat of this size (Lonsdale et al. 1988;Palmer and Herbon 1989). All remaining repeats are,400 nt in length, and most of these (900 of 1,154) areonly 19–40 nt in length (table 2). Repeat coverage is notsimply a reflection of duplicated genes, as the majorityof repeat coverage (81%) in Citrullus lies outside of genesand introns.

Despite similar distributions of repeat lengths, Cucurbitahas nearly 50 times more repeats (54,783 vs. 1,154) thanCitrullus (table 2), based on our Blast settings and an expectcutoff of 1. In fact, much of the genome expansion in Cu-curbita owes to an accumulation of repeats in intergenicregions, with total genomic coverage by repeats (371 kb,

or 38% of the genome) summing to nearly the entire lengthof the Citrullus genome. Like Citrullus, the overwhelmingmajority of repeats in Cucurbita are short; a total of38,724 of them (71%) are just 19–40 nt in length, account-ing for .272 kb (28%) of the total genome coverage. Mostof the repeats lie within intergenic regions, with ,2% ofrepeated sequences occurring within genes, introns, andRNA genes. A larger fraction of the total repeat coverage(11%) overlaps with chloroplast-derived regions, pointingto multiple introductions of the same chloroplast regionand/or duplications of chloroplast-like sequences withinthe mitochondrial genome. Tandem repeats account fora virtually negligible fraction of the repeat coverage in bothgenomes.

Large repeats are common in seed plant mitochondrialgenomes, with lengths of maximal perfect repeats rangingfrom 897 nt in the large (773 kb) Vitis genome to 87 kb inthe 501-kb Beta (Owen-CMS) genome (Satoh et al. 2004;Goremykin et al. 2009). In general, genome size tends toscale positively with genomic coverage by large (.500nt) repeats. The trend appears to hold within species (Allenet al. 2007), within genera (Palmer and Herbon 1989),and—with some notable exceptions (e.g., Vitis)—acrossthe whole of angiosperms. Moreover, large sequence dupli-cations can cause major and rapid increases in genomesize, as is the case for five maize mitochondrial genomes,which have virtually identical sequence complexities butnevertheless range from 536 to 740 kb in size (Allenet al. 2007). It therefore came as a surprise that the largest

Table 2. Frequency Distributions of Repeat Lengths in theMitochondrial Genomes of Citrullus and Cucurbita.

Repeat Length(# nt)

Number of Repeats (% coverage)

Citrullus Cucurbita

19–20 95 (0.47) 4,331 (7.09)21–40 805 (3.44) 34,393 (26.92)41–60 134 (1.34) 8,552 (15.08)61–80 39 (0.66) 3,417 (9.67)81–100 23 (0.46) 1,591 (6.94)101–120 14 (0.31) 821 (5.38)121–140 15 (0.48) 510 (4.24)141–160 7 (0.23) 362 (3.57)161–180 6 (0.27) 246 (2.45)181–200 6 (0.30) 173 (2.09)201–220 0 (0.00) 96 (1.47)221–240 0 (0.00) 67 (1.16)241–260 2 (0.14) 72 (1.49)261–280 0 (0.00) 34 (0.79)281–300 0 (0.00) 34 (0.90)301–400 6 (0.55) 56 (1.64)401–500 0 (0.00) 20 (0.63)501–600 0 (0.00) 4 (0.22)601–700 0 (0.00) 4 (0.25)�7,286 2 (3.84) 0 (0.00)

NOTE.—Repeats were identified by Blasting each genome to itself (see Materialsand Methods) and considering all hits with a Blast expect value �1. Repeats aredefined by their begin and end coordinates in the genome. In tallying the totalnumber of repeats, those with identical begin and end coordinates are countedonly once. Percent coverage is the percentage of nucleotide positions in thegenome occupied by repeats of given length category (column 1), calculatedwithout respect to repeats in other length categories. Whereas the number ofrepeats is additive among rows, the coverage values are not because repeats can(and do) overlap.

Alverson et al. · doi:10.1093/molbev/msq029 MBE

1442

Page 8: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

repeat in the nearly 1-Mb Cucurbita genome is quite small,just 621 nt in length. Thus, both Cucurbita and Vitis areexceptional in combining an unusually large genome witha scarcity of large repeats.

As highlighted by the Cucurbita genome, small repeatedsequences are an important determinant of plant mito-chondrial genome size. As sites of active recombination,repeats have important impacts on the structure of plantmitochondrial genomes as well (Palmer and Herbon 1989;Andre et al. 1992). At least 38% of the Cucurbita genome isrepetitive DNA, considerably more than the 5–10% esti-mate based on reassociation kinetics (Ward et al. 1981).This discrepancy probably reflects the very short size ofthe repeats, which are likely to fail to reassociate with re-petitive kinetics. The repetitive fraction alone accounts for.60% of the absolute size difference between Citrullus andCucurbita. Limited sequencing and hybridization analysesled, by extrapolation, to the estimate that seven short(30–53 nt) motifs account for some 13% of the large(;1.8 Mb) mitochondrial genome of the close relative,Cu. sativus (fig. 1) (Lilly and Havey 2001), providing an earlyindication that repetitive sequences might be a more im-portant factor in the growth of cucurbit genomes thanoriginally thought (Ward et al. 1981). Blast searches ofthe 7 dominant repetitive motifs from Cu. sativus (Lillyand Havey 2001) confirmed their absence from both Cit-rullus and Cucurbita, indicating that entirely different suitesof small repeats underlie the genome expansions in Cucur-bita and Cu. sativus. One caveat to this conclusion is that,along with point mutations, recombination across repeatscan in principle effectively shuffle their sequences, erasingany signal of homology over time (Andre et al. 1992). InCycas, the proliferation of a single, short (36 nt), and ap-parently self-replicating repeat (termed ‘‘Bpu sequence’’)accounts for 5% coverage of its 415-kb mitochondrial ge-nome (Chaw et al. 2008). The Bpu sequence was not foundin either Citrullus or Cucurbita, nor does it appear that thelarge repetitive fraction of the Cucurbita genome reflectsthe proliferation of just one or a few motifs. Finally, al-though less common, small repeats can be a major sourceof expansion in other organelle genomes. Despite havinga relatively reduced gene complement, the green alga Chla-mydomonas has an unusually large, 203-kb chloroplast ge-nome, ;20% of which comprises short sequence repeatslocated in intergenic regions (Maul et al. 2002). Anothergreen alga, Volvox carteri, has an extraordinarily large(�420 kb) chloroplast genome, more than 60% of whichconsists of short, presumably self-replicating palindromicrepeats located primarily in intergenic regions (Smithand Lee 2009b). The same repeat element has led to majorexpansion of the Volvox mitochondrial genome as well(Smith and Lee 2009b).

RNA EditingSequencing of full or nearly full-length cDNAs for 37 mito-chondrial protein genes identified a similar set of RNA edit-ing sites in the two genomes. RNA editing data aresummarized in supplementary table 3 (Supplementary Ma-

terial online), and the locations of editing sites are availablein the GenBank records (GQ856147 and GQ856148). TheCitrullus and Cucurbita mitochondrial genomes containa minimum of 463 and 444 sites with C-to-U editing, re-spectively. We found no evidence of U-to-C editing. Con-sidering only one copy of each of the duplicated genes, theaverage density of editing sites is also higher in Citrullus(1.57 edits/100 nt) than in Cucurbita (1.51 edits/100 nt),so the absolute difference in the number of edits doesnot reflect minor variation in gene length or cDNA se-quence coverage between the two species (fig. 3). Citrullusdoes, however, have one additional source of five editedsites in the form of a partial atp9 pseudogene.

The Citrullus and Cucurbita totals (463 and 444 edits,respectively, in 37 genes) are comparable with those inthe four other angiosperms for which comprehensivecDNA sequencing has been done: Arabidopsis (441 editsin 36 genes), Brassica (427 edits in 34 genes), Beta (357 editsin 31 genes), and Oryza (491 edits in 34 genes) (Giege andBrennicke 1999; Notsu et al. 2002; Handa 2003; Mower andPalmer 2006). Citrullus and Cucurbita have 437 and 422 ed-iting sites, respectively, in those regions of the genes withcDNA coverage in both species (27,881 nt). A total of 394 ofthese sites are shared between the two species, whichtranslates to 90% of the sites in Citrullus and 93% of thesites in Cucurbita. By comparison, two other confamilialspecies with comprehensive editing data, Arabidopsisand Brassica (Brassicaceae), share 81% and 84% of editedsites, respectively (Handa 2003); it is not clear, however,if or how these calculations accounted for missing and/or unshared cDNA coverage.

Both cucurbits showed the typical pattern of relative ed-iting levels across genes. For example, ribosomal proteinstend to have fewer edits than other genes, and the mttB,ccmB, and ccmFn genes are highly edited. As with other

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7

Cuc

urbi

ta (

edits

/100

nt)

Citrullus (edits/100 nt)

FIG. 3. Density of C-to-U RNA editing (number of edits/100 ntcDNA sequence) for 37 protein genes from the Citrullus andCucurbita mitochondrial genomes. Data points falling directly onthe diagonal identify genes with equal editing density in bothspecies. Primary data are summarized in supplementary table 3(Supplementary Material online).

Evolution of Plant Mitochondrial Genome Size · doi:10.1093/molbev/msq029 MBE

1443

Page 9: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

angiosperms, the majority of edits (.92%) alter the aminoacid, and most of these sites are fully edited. Although theset of fully edited, nonsynonymous sites is generally highlyconserved between the two species, there are several strik-ing differences between them in the nonsynonymous editsfor six genes. Most of the edited sites in the ccmFc, cob,matR, and mttB genes are nonsynonymous edits thatare fully edited in Cucurbita but partially edited in Citrullus.One notable example is the mttB gene, in which 18 of the20 nonsynonymous edit sites are fully edited in Cucurbita,whereas just 3 of 18 nonsynonymous edits are fully editedin Citrullus. In the other direction, most of the nonsynon-ymous edits in nad9 and rps4 are fully edited in Citrullusbut partially edited in Cucurbita. For example, Citrullusand Cucurbita each have eight edit sites in the rps4 gene,all of which are nonsynonymous and shared between thetwo species. Whereas all eight of the sites are fully edited inCitrullus, just one of them is fully edited in Cucurbita.

For both species, RNA editing creates start codons forthe nad1, nad4L, and rps10 genes and stop codons forthe atp9 and rps10 genes. We had insufficient cDNA cov-erage to determine whether several other putative start(i.e., ACG) and stop (i.e., CAA) codons that would seemto require editing are in fact edited in either or both species.In addition, canonical start codons were not detected forthe matR gene in Citrullus or for the mttB gene in Cucurbita,nor was a canonical stop codon detected for the Citrullussdh3 gene. The cDNA sequence of the latter showed thatneither the CGA nor the CAA codon in the 129 nt imme-diately downstream of sdh3 is edited to produce a stopcodon.

Nucleotide Substitution Rates and the MutationPressure HypothesisAs exemplified by these two genomes, seed plants are no-torious for their extremely large and variably sized mito-chondrial genomes. Rates of synonymous substitution,which presumably reflect the underlying mutation rate,show similarly dramatic fluctuations across seed plantsas well (Mower et al. 2007). Notwithstanding this variation,the mutation rate is generally extremely low, some 3–5times lower than in the chloroplast and 40–100 times lowerthan in animal mitochondria (Wolfe et al. 1987), whose ge-nomes are comparatively miniscule in size (typically 14–20kb) (Lynch et al. 2006). Thus, a seemingly strong negativecorrelation between organelle genome size and mutationrate is seen across several groups of eukaryotes (Lynchet al. 2006). This pattern, along with the apparent lackof correlation between the effective number of genesper locus in organelles (Ng) and their genome sizes, ledto the hypothesis that mutation rate is the primary deter-minant of organelle genome size (Lynch et al. 2006).According to this hypothesis, the generally low mitochon-drial mutation rate of plants facilitates the accumulation ofnoncoding sequences and hence the overall growth of theirmitochondrial genomes. The elevated mitochondrial mu-tation rate likewise maintains (indirectly) the small, stream-lined mitochondrial genomes of animals. In essence, the

superfluous noncoding DNA carries less potential burden(Lynch 2006; Lynch et al. 2006) in the low mutational en-vironment present in most plant mitochondria (Wolfeet al. 1987; Mower et al. 2007). Like genes and introns, sitesof RNA editing require up- and downstream sequence con-servation to be properly processed (Choury et al. 2004;Mulligan et al. 2007), which, again, are thought to be moreeasily preserved in a low mutational background (Lynchet al. 2006). The mutation pressure hypothesis thereforepredicts that both RNA editing frequency and genome sizeshould be negatively correlated with the mutation rate(Lynch et al. 2006).

We made a concatenated alignment of 30 mitochondrialgenes from the now 18 fully sequenced seed plant mito-chondrial genomes and used it to estimate rates of synon-ymous (dS) and nonsynonymous (dN) substitution inCitrullus and Cucurbita. The dS and dN trees show similarpatterns of divergence. Namely, Cucurbita falls on relativelylong branches in both dS and dN trees (fig. 4). Relative ratetests confirm that both dS and dN are significantly higher inCucurbita (P , 0.001). Compared with Citrullus, Cucurbitahas a 4-fold higher dS and an 8-fold higher dN (fig. 4). Thatmost of the total variation is at silent sites implicates mu-tation rate as the underlying cause of the observed rate in-crease in Cucurbita (Kimura 1983).

Based on the previous knowledge of a nearly 3-fold largergenome size than Citrullus, the mutation pressure

Cucurbitaceae

Poales

Brassicales

0.05

Zea mays

Tripsacum

Triticum

Zea mays

Carica

Brassica

Vitis

Citrullus

Zea luxurians

Beta

Beta (CMS)

Sorghum

Oryza

Arabidopsis

Zea perennis

Cucurbita

Nicotiana

FIG. 4. Nucleotide substitution rates in 17 angiosperms with fullysequenced mitochondrial genomes. Branch lengths are proportionalto rates of synonymous (left panel) and nonsynonymous (rightpanel) substitutions, based on a concatenated alignment of 30protein genes (25,553 positions), and with the tree topologyconstrained as shown (see Materials and Methods). Relative ratetests show significantly higher dS and dN in Cucurbita comparedwith Citrullus. Cycas was included in the analysis to root the basalangiosperm divergence but, for simplicity, is not shown.

Alverson et al. · doi:10.1093/molbev/msq029 MBE

1444

Page 10: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

hypothesis (Lynch et al. 2006) predicts that Cucurbitashould have a lower mutation rate and more RNA editingsites. However, since its split from Citrullus some 30 Ma,Cucurbita has actually experienced a 4-fold higher muta-tion rate than Citrullus, whereas its density of RNA editingis only slightly lower (97% of the level of Citrullus; determin-ing the polarity—the gain and loss—of editing site differ-ences is difficult with these data alone). So whereas RNAediting frequencies are weakly consistent with predictionsof the mutation pressure hypothesis, genome sizes clearlyare not. Although a quantitative assessment of any corre-lations cannot be measured from these two data pointsalone, the elevated synonymous substitution rate (and pre-sumably the underlying mutation rate) in Cucurbita wasnonetheless surprising to discover in light of its long-estab-lished larger genome size.

The mutation pressure hypothesis was based on broad-scale patterns of organelle mutation rate, Ng, and genomesize across a diverse set of eukaryotes, so it is unclear howor when these same factors are manifest in the genomesof taxa spanning a much narrower phylogenetic range(see discussions in Lynch and Conery 2004; Vinogradov2004). That is, many large-scale mitochondrial genomicchanges clearly have occurred over the course of angio-sperm—even Cucurbitaceae—evolution, but it is unclearwhether the magnitude of mutation rate variation andthe timescales over which those shifts have occurred aresufficient to drive the predicted changes in genome sizeand RNA editing frequency. The stochastic nature of bothNg (for which we have no data) and mutation rate predictslocal departures from the overall trend (Lynch and Conery2004), making the hypothesis difficult to reject at more lo-cal phylogenetic scales (Vinogradov 2004), within Cucurbi-taceae for example. Limited data from this and otherstudies of organelle genome evolution (Smith and Lee2008, 2009a) show mixed support for the mutation pres-sure hypothesis. We do note, however, the long dS branchleading to the common ancestor of Arabidopsis and Bras-sica (fig. 4), both of which have relatively low editing levelsand, by angiosperm standards, compact mitochondrial ge-nomes. Additional plant mitochondrial genome sequenceswill no doubt shed valuable light on the extent to which Ng

and mutation rate drive organelle genome evolution. Oneimportant and likely difficult challenge will be findingmarkers from the slowly evolving plant mitochondrial ge-nomes with enough variability to allow estimation of Ng

from the standing level of neutral variation. Such datamight show, for example, that the large and variably sizedcucurbit mitochondrial genomes reflect stochastic varia-tion resulting from a generally low Ng across Cucurbitaceae.

Perspectives on Genome Size EvolutionThe genome size estimates of Ward et al. (1981) for fourspecies of Cucurbitaceae provided very early indicationsthat plant mitochondrial genome size is extremely fluid—afinding that has been validated repeatedly in the ensuingyears. The discovery of 8-fold variation in genome size,and organelle genomes that measured in the megabases,

sparked years of discussion, speculation, and study aboutthe origin of the ‘‘extra’’ DNA in the largest genomes. Datafrom this study show that growth of the nearly 1-Mb Cu-curbita genome is largely attributable to the uptake andretention of an usually large quantity of exogenous chloro-plast sequences and the virtually unprecedented prolifer-ation of small repeats, the origins of which remainunclear. Indeed, subtracting both of these componentsyields an average-sized plant mitochondrial genome.

Major and rapid changes in genome size are common inthe mitochondrial and nuclear genomes of seed plants,whereas chloroplast genome size is much more conserved.Paradoxically, though, the forces underlying genome sizeevolution have greater overlap between the two organellegenomes than between the dynamic genomes of the mi-tochondrion and nucleus. In plant nuclear genomes, rapidgrowth is usually the result of polyploidization and/or theproliferation of transposable elements (Hawkins et al.2008). Polyploidization is a uniquely nuclear process, andactive mobile elements have not been identified in anyplant mitochondrial genome, with the possible exceptionof the Cycas genome, 5% of which consists of a family ofvery short (36 nt), putatively self-replicating elements(Chaw et al. 2008). The chloroplast genomes of photosyn-thetic seed plants vary by less than 2-fold in size, with mostangiosperm genomes occupying an even narrower sizerange (135–160 kb; Ravi et al. 2008). Most of this size rangeis the result of expansion or contraction of a large (0–76 kb)inverted repeat (e.g., Chumley et al. 2006).

A diverse and fairly lineage-specific constellation of fac-tors appears to underlie rapid and major size fluctuations inthe mitochondrial genomes of seed plants. Whereas differ-ences in length and number of large duplicated (sometimestriplicated) regions are, as in chloroplast genomes, the par-amount drivers among related species and genetic lines ofZea and Beta (Satoh et al. 2004; Allen et al. 2007), the pri-mary force driving genome expansion in Cucurbita is theproliferation of short dispersed repeats, with the accumu-lation of chloroplast sequences playing an important sec-ondary role. The large mitochondrial genome of Cucurbitais surprisingly devoid of large segmental duplications. LikeCucurbita, the next largest sequenced mitochondrial ge-nome (Vitis—773 kb) contains a large quantity of chloro-plast sequences and no major segmental duplications butdiffers in having relatively few short repeats (Goremykinet al. 2009). The gain of foreign sequences from more-or-less distantly related species by horizontal gene transferis relatively common in plant mitochondrial genomes(Richardson and Palmer 2007) and is a major force under-lying genome expansion in the flowering plant Amborellatrichopoda (Bergthorsson et al. 2004; Rice DW, Palmer JD,unpublished data). Horizontal transfer does not, however,appear to have been an important factor in the overall evo-lution of the now 18 fully sequenced seed plant mitochon-drial genomes. Insofar as can be determined, the uptake ofnuclear sequences seems to have been of relatively minorimportance in the completely sequenced genomes as well(but see above for caveats concerning this conclusion).

Evolution of Plant Mitochondrial Genome Size · doi:10.1093/molbev/msq029 MBE

1445

Page 11: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

It is increasingly clear that two forces that sometimesplay a major role in genome size evolution in other mito-chondrial lineages, gene loss (e.g., in some green algae—Nedelcu 1998) and intron gain and loss (e.g., in fungi—Cummings et al. 1990), have been of negligible importancein the remarkable expansion and contraction of seed plantmitochondrial genomes. This is particularly striking in com-paring the Citrullus and Cucurbita genomes, which are sodifferent in size but nearly identical in gene and intron con-tent. Current efforts to sequence and characterize theextraordinarily large Cucumis genomes have largely con-firmed the early size estimates of Ward et al. (1981) andwill undoubtedly turn up additional insights about the in-terrelationships of mutation rate, RNA editing, and genomesize in organelle genomes.

Supplementary MaterialSupplementary figures 1 and 2 and tables 1–3 are availableat Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

AcknowledgmentsWe thank Arnold Bendich, Weilong Hao, Dan Sloan, andtwo anonymous reviewers for commenting on earlierversions of the manuscript. We thank Stacia Wymanfor providing the DOGMA source code and for adviceon developing the gene annotation scripts. This workwas supported by a National Institutes of Health (NIH)Ruth L. Kirschstein National Research Service Award Post-doctoral Fellowship (1F32GM080079-01A1) to A.J.A., anNIH research grant RO1-GM-70612, and the METACyt Ini-tiative of Indiana University, funded in part through a majorgrant from the Lilly Endowment, Inc., to J.D.P. This workwas performed under the auspices of the US Departmentof Energy’s Office of Science, Biological and EnvironmentalResearch Program, and by the University of California, Law-rence Berkeley National Laboratory under contract no.DE-AC02-05CH11231, Lawrence Livermore National Labo-ratory under contract no. DE-AC52-07NA27344, and LosAlamos National Laboratory under contract no. DE-AC02-06NA25396.

ReferencesAdams KL, Qiu Y-L, Stoutemyer M, Palmer JD. 2002. Punctuated

evolution of mitochondrial gene content: high and variablerates of mitochondrial gene loss and transfer to the nucleusduring angiosperm evolution. Proc Natl Acad Sci U S A.99:9905–9912.

Allen JO, Fauron CM, Minx P, et al. (16 co-authors). 2007.Comparisons among two fertile and three male-sterile mito-chondrial genomes of maize. Genetics 177:1173–1192.

Andre C, Levy A, Walbot V. 1992. Small repeated sequences and thestructure of plant mitochondrial genomes. Trends Genet.8:128–132.

Barker NP, Clark LG, Davis JI, et al. (13 co-authors). 2001. Phylogenyand subfamilial classification of the grasses (Poaceae). Ann MoBot Gard. 88:373–457.

Bartoszewski G, Gawronski P, Szklarczyk M, Verbakel H, Havey MJ.2009. A one-megabase physical map provides insights on gene

organization in the enormous mitochondrial genome ofcucumber. Genome 52:299–307.

Bendich AJ. 1996. Structural analysis of mitochondrial DNAmolecules from fungi and plants using moving pictures andpulsed-field gel electrophoresis. J Mol Biol. 255:564–588.

Bergthorsson U, Richardson AO, Young GJ, Goertzen LR, Palmer JD.2004. Massive horizontal transfer of mitochondrial genes fromdiverse land plant donors to the basal angiosperm Amborella.Proc Natl Acad Sci U S A. 101:17747–17752.

Bock H, Brennicke A, Schuster W. 1994. rps3 and rpl16 genes do notoverlap in Oenothera mitochondria: GTG as a potential trans-lation initiation codon in plant mitochondria. Plant Mol Biol.24:811–818.

Chaw SM, Shih ACC, Wang D, Wu YW, Liu SM, Chou TY. 2008. Themitochondrial genome of the gymnosperm Cycas taitungensiscontains a novel family of short interspersed elements, Bpusequences, and abundant RNA editing sites. Mol Biol Evol.25:603–615.

Choury D, Farre JC, Jordana X, Araya A. 2004. Different patterns inthe recognition of editing sites in plant mitochondria. NucleicAcids Res. 32:6397–6406.

Chumley TW, Palmer JD, Mower JP, Fourcade HM, Calie PJ, Boore JL,Jansen RK. 2006. The complete chloroplast genome sequence ofPelargonium � hortorum: organization and evolution of thelargest and most highly rearranged chloroplast genome of landplants. Mol Biol Evol. 23:2175–2190.

Clifton SW, Minx P, Fauron CMR, et al. (13 co-authors). 2004.Sequence and comparative analysis of the maize NB mitochon-drial genome. Plant Physiol. 136:3486–3503.

Cummings DJ, McNally KL, Domenico JM, Matsuura ET. 1990. Thecomplete DNA sequence of the mitochondrial genome ofPodospora anserina. Curr Genet. 17:375–402.

Dietrich A, Small I, Cosset A, Weil JH, Marechal-Drouard L. 1996.Editing and import: strategies for providing plant mitochondriawith a complete set of functional transfer RNAs. Biochimie78:518–529.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with highaccuracy and high throughput. Nucleic Acids Res. 32:1792–1797.

Forner J, Weber B, Thuss S, Wildum S, Binder S. 2007. Mapping ofmitochondrial mRNA termini in Arabidopsis thaliana: t-elements contribute to 5’ and 3’ end formation. Nucleic AcidsRes. 35:3676–3692.

Giege P, Brennicke A. 1999. RNA editing in Arabidopsis mitochon-dria effects 441 C to U changes in ORFs. Proc Natl Acad Sci U S A.96:15324–15329.

Giege P, Knoop V, Brennicke A. 1998. Complex II subunit 4 (sdh4)homologous sequences in plant mitochondrial genomes. CurrGenet. 34:313–317.

Gordon D, Abajian C, Green P. 1998. Consed: a graphical tool forsequence finishing. Genome Res. 8:195–202.

Goremykin VV, Salamini F, Velasco R, Viola R. 2009. MitochondrialDNA of Vitis vinifera and the issue of rampant horizontal genetransfer. Mol Biol Evol. 26:99–110.

Grewe F, Viehoever P, Weisshaar B, Knoop V. 2009. A trans-splicinggroup I intron and tRNA-hyperediting in the mitochondrialgenome of the lycophyte Isoetes engelmannii. Nucleic Acids Res.37:5093–5104.

Handa H. 2003. The complete nucleotide sequence and RNAediting content of the mitochondrial genome of rapeseed(Brassica napus L.): comparative analysis of the mitochondrialgenomes of rapeseed and Arabidopsis thaliana. Nucleic Acids Res.31:5907–5916.

Havey MJ, McCreight JD, Rhodes B, Taurick G. 1998. Differentialtransmission of the Cucumis organellar genomes. Theor ApplGenet. 97:122–128.

Alverson et al. · doi:10.1093/molbev/msq029 MBE

1446

Page 12: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

Hawkins JS, Grover CE, Wendel JF. 2008. Repeated big bangs and theexpanding universe: directionality in plant genome sizeevolution. Plant Sci. 174:557–562.

Hiesel R, Wissinger B, Schuster W, Brennicke A. 1989. RNA editing inplant mitochondria. Science 246:1632–1634.

Hoffmann M, Dombrowski S, Guha C, Binder S. 1999. Cotran-scription of the rpl5-rps14-cob gene cluster in pea mitochondria.Mol Gen Genet. 261:537–545.

Hsu CL, Mullin BC. 1989. Physical characterization of themitochondrial DNA from cotton. Plant Mol Biol. 13:467–468.

Huang X, Madan A. 1999. CAP3: a DNA sequence assemblyprogram. Genome Res. 9:868–877.

Jurka J. 2000. Repbase update: a database and an electronic journalof repetitive elements. Trends Genet. 16:418–420.

Kellogg EA, Birchler JA. 1993. Linking phylogeny and genetics: Zeamays as a tool for phylogenetic studies. Syst Biol. 42:415–439.

Kimura M. 1983. The neutral theory of molecular evolution.Cambridge: Cambridge University Press.

Knoop V. 2004. The mitochondrial DNA of land plants: peculiaritiesin phylogenetic perspective. Curr Genet. 46:123–139.

Knoop V, Unseld M, Marienfeld J, Brandt P, Sunkel S, Ullrich H,Brennicke A. 1996. copia-, gypsy- and LINE-like retrotransposonfragments in the mitochondrial genome of Arabidopsis thaliana.Genetics 142:579–585.

Kolodner R, Tewari KK. 1972. Physicochemical characterization ofmitochondrial DNA from pea leaves. Proc Natl Acad Sci U S A.69:1830–1834.

Kubo T, Mikami T. 2007. Organization and variation of angiospermmitochondrial genome. Physiol Plant. 129:6–13.

Kubo T, Newton KJ. 2008. Angiosperm mitochondrial genomes andmutations. Mitochondrion 8:5–14.

Li LB, Wang B, Liu Y, Qiu YL. 2009. The complete mitochondrialgenome sequence of the hornwort Megaceros aenigmaticusshows a mixed mode of conservative yet dynamic evolutionin early land plant mitochondrial genomes. J Mol Evol.68:665–678.

Lilly JW, Havey MJ. 2001. Small, repetitive DNAs contributesignificantly to the expanded mitochondrial genome ofcucumber. Genetics 159:317–328.

Lohse M, Drechsel O, Bock R. 2007. OrganellarGenomeDRAW(OGDRAW): a tool for the easy generation of high-qualitycustom graphical maps of plastid and mitochondrial genomes.Curr Genet. 52:267–274.

Lonsdale DM, Brears T, Hodge TP, Melville SE, Rottmann WH. 1988.The plant mitochondrial genome: homologous recombinationas a mechanism for generating heterogeneity. Philos Trans R SocLond B Biol Sci. 319:149–163.

Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for improveddetection of transfer RNA genes in genomic sequence. NucleicAcids Res. 25:955–964.

Lynch M. 2006. The origins of eukaryotic gene structure. Mol BiolEvol. 23:450–468.

Lynch M, Conery JS. 2004. Testing genome complexity—response.Science 304:390.

Lynch M, Koskella B, Schaack S. 2006. Mutation pressure and theevolution of organelle genomic architecture. Science 311:1727–1730.

Mathews S, Spangler RE, Mason-Gamer RJ, Kellogg EA. 2002.Phylogeny of Andropogoneae inferred from phytochrome B,GBSSI, and NDHF. Int J Plant Sci. 163:441–450.

Maul JE, Lilly JW, Cui LY, dePamphilis CW, Miller W, Harris EH,Stern DB. 2002. The Chlamydomonas reinhardtti plastidchromosome: islands of genes in a sea of repeats. Plant Cell.14:2659–2679.

Moran NA. 2002. Microbial minimalism: genome reduction inbacterial pathogens. Cell 108:583–586.

Mower JP. 2009. The PREP suite: predictive RNA editors for plantmitochondrial genes, chloroplast genes and user-defined align-ments. Nucleic Acids Res. 37:W253–W259.

Mower JP, Palmer JD. 2006. Patterns of partial RNA editing inmitochondrial genes of Beta vulgaris. Mol Genet Genomics.276:285–293.

Mower JP, Touzet P, Gummow JS, Delph LF, Palmer JD. 2007.Extensive variation in synonymous substitution rates inmitochondrial genes of seed plants. BMC Evol Biol. 7:135.

Mulligan RM, Chang KL, Chou CC. 2007. Computational analysis ofRNA editing sites in plant mitochondrial genomes reveals similarinformation content and a sporadic distribution of editing sites.Mol Biol Evol. 24:1971–1981.

Nakazato T, Gastony GJ. 2006. High-throughput RFLP genotypingmethod for large genomes based on a chemiluminescentdetection system. Plant Mol Biol Rep. 24:245a–245f.

Nedelcu AM. 1998. Contrasting mitochondrial genome organizationsand sequence affiliations among green algae: potential factors,mechanisms, and evolutionary scenarios. J Phycol. 34:16–28.

Notsu Y, Masood S, Nishikawa T, Kubo N, Akiduki G, Nakazono M,Hirai A, Kadowaki K. 2002. The complete sequence of the rice(Oryza sativa L.) mitochondrial genome: frequent DNAsequence acquisition and loss during the evolution of floweringplants. Mol Genet Genomics. 268:434–445.

Nowack ECM, Melkonian M, Glockner G. 2008. Chromatophoregenome sequence of Paulinella sheds light on acquisition ofphotosynthesis by eukaryotes. Curr Biol. 18:410–418.

Palmer JD. 1982. Physical and gene mapping of chloroplast DNAfrom Atriplex triangularis and Cucumis sativa. Nucleic Acids Res.10:1593–1605.

Palmer JD. 1990. Contrasting modes and tempos of genomeevolution in land plant organelles. Trends Genet. 6:115–120.

Palmer JD, Herbon LA. 1989. Plant mitochondrial DNA evolvesrapidly in structure, but slowly in sequence. J Mol Evol. 28:87–97.

Perrotta G, Regina TMR, Ceci LR, Quagliariello C. 1996. Conservationof the organization of the mitochondrial nad3 and rps12 genes inevolutionarily distant angiosperms. Mol Gen Genet. 251:326–337.

Placido A, Damiano F, Sciancalepore M, De Benedetto C, Rainaldi G,Gallerani R. 2006. Comparison of promoters controlling on thesunflower mitochondrial genome the transcription of twocopies of the same native trnK gene reveals some differences intheir structure. Biochim Biophys Acta. 1757:1207–1216.

Quetier F, Vedel F. 1977. Heterogeneous population of mitochon-drial DNA molecules in higher plants. Nature 268:365–368.

Quinones V, Zanlungo S, Moenne A, Gomez I, Holuigue L, Litvak S,Jordana X. 1996. The rpl5- rps14- cob gene arrangement inSolanum tuberosum: rps14 is a transcribed and uneditedpseudogene. Plant Mol Biol. 31:937–943.

Ravi V, Khurana JP, Tyagi AK, Khurana P. 2008. An update onchloroplast genomes. Plant Syst Evol. 271:101–122.

Ren Y, Zhang Z, Liu J, et al. (16 co-authors). 2009. An integratedgenetic and cytogenetic map of the cucumber genome. PLoSONE. 4:e5795.

Richardson AO, Palmer JD. 2007. Horizontal gene transfer in plants.J Exp Bot. 58:1–9.

Sakamoto W, Tan SH, Murata M, Motoyoshi F. 1997. An unusualmitochondrial atp9-rpl16 cotranscript found in the maternaldistorted leaf mutant of Arabidopsis thaliana: implication ofGUG as an initiation codon in plant mitochondria. Plant CellPhysiol. 38:975–979.

Sanchez-Puerta MV, Cho Y, Mower JP, Alverson AJ, Palmer JD. 2008.Frequent, phylogenetically local horizontal transfer of the cox1group I intron in flowering plant mitochondria. Mol Biol Evol.25:1762–1777.

Satoh M, Kubo T, Mikami T. 2006. The Owen mitochondrialgenome in sugar beet (Beta vulgaris L.): possible mechanisms of

Evolution of Plant Mitochondrial Genome Size · doi:10.1093/molbev/msq029 MBE

1447

Page 13: Insights into the Evolution of Mitochondrial Genome Size ......Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo

extensive rearrangements and the origin of the mitotype-uniqueregions. Theor Appl Genet. 113:477–484.

Satoh M, Kubo T, Nishizawa S, Estiati A, Itchoda N, Mikami T. 2004.The cytoplasmic male-sterile type and normal type mitochon-drial genomes of sugar beet share the same complement ofgenes of known function but differ in the content of expressedORFs. Mol Genet Genomics. 272:247–256.

Schaefer H, Heibl C, Renner SS. 2009. Gourds afloat: a datedphylogeny reveals an Asian origin of the gourd family(Cucurbitaceae) and numerous oversea dispersal events. ProcBiol Sci. 276:843–851.

Smith DR, Lee RW. 2008. Nucleotide diversity in the mitochondrialand nuclear compartments of Chlamydomonas reinhardtii: inves-tigating the origins of genome architecture. BMC Evol Biol. 8:156.

Smith DR, Lee RW. 2009a. Nucleotide diversity of the Chlamydo-monas reinhardtii plastid genome: addressing the mutational-hazard hypothesis. BMC Evol Biol. 9:120.

Smith DR, Lee RW. 2009b. The mitochondrial and plastid genomesof Volvox carteri: bloated molecules rich in repetitive DNA. BMCGenomics. 10:14.

Soltis DE, Soltis PS, Chase MW, Mort ME, Albach DC, Zanis M,Savolainen V, Hahn WH, Hoot SB, Fay MF. 2000. Angiospermphylogeny inferred from 18S rDNA, rbcL, and atpB sequences.Bot J Linn Soc. 133:381–461.

Stern DB, Lonsdale DM. 1982. Mitochondrial and chloroplastgenomes of maize have a 12-kilobase DNA sequence incommon. Nature 299:698–702.

Stern DB, Newton KJ. 1985. Mitochondrial gene expression inCucurbitaceae: conserved and variable features. Curr Genet.9:395–405.

Stern DB, Palmer JD, Thompson WF, Lonsdale DM. 1983.Mitochondrial DNA sequence evolution and homologyto chloroplast DNA in angiosperms. In: Goldberg RB,editor. Plant molecular biology, ICN-UCLA symposia onmolecular and cellular biology. New York: Alan R. Liss, Inc.p. 467–477

Takemura M, Oda K, Yamato K, Ohta E, Nakamura Y, Nozato N,Akashi K, Ohyama K. 1992. Gene clusters for ribosomal proteinsin the mitochondrial genome of a liverwort, Marchantiapolymorpha. Nucleic Acids Res. 20:3199–3205.

Unseld M, Marienfeld JR, Brandt P, Brennicke A. 1997. Themitochondrial genome of Arabidopsis thaliana contains 57genes in 366,924 nucleotides. Nat Genet. 15:57–61.

Vinogradov AE. 2004. Testing genome complexity. Science 304:389–390.

Ward B, Anderson R, Bendich A. 1981. The size of the mitochondrialgenome is large and variable in a family of plants (Cucurbita-ceae). Cell 25:793–803.

Wolfe KH, Li WH, Sharp PM. 1987. Rates of nucleotide substitutionvary greatly among plant mitochondrial, chloroplast, andnuclear DNAs. Proc Natl Acad Sci U S A. 84:9054–9058.

Wyman SK, Jansen RK, Boore JL. 2004. Automatic annotation oforganellar genomes with DOGMA. Bioinformatics 20:3252–3255.

Alverson et al. · doi:10.1093/molbev/msq029 MBE

1448