8
ORIGINAL RESEARCH Whole Genome Duplication of Intra- and Inter-chromosomes in the Tomato Genome Chi Song a,b , Juan Guo a,b , Wei Sun b,c , Ying Wang a, * a Key Laboratory of Plant Germplasm Enhancement and Speciality Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, Hubei 430074, China b Graduate University of Chinese Academy of Sciences, Beijing 100049, China c South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, Guangdong 510650, China Received 13 June 2012; revised 17 June 2012; accepted 17 June 2012 Available online 23 June 2012 ABSTRACT Whole genome duplication (WGD) events have been proven to occur in the evolutionary history of most angiosperms. Tomato is considered a model species of the Solanaceae family. In this study, we describe the details of the evolutionary process of the tomato genome by detecting collinearity blocks and dating the WGD events on the tree of life by combining two different methods: synonymous substitution rates (Ks) and phylogenetic trees. In total, 593 collinearity blocks were discovered out of 12 pseudo-chromosomes con- structed. It was evident that chromosome 2 had experienced an intra-chromosomal duplication event. Major inter-chromosomal dupli- cation occurred among all the pseudo-chromosome. We calculated the Ks value of these collinearity blocks. Two peaks of Ks distribution were found, corresponding to two WGD events occurring approximately 36e82 million years ago (MYA) and 148e205 MYA. Addi- tionally, the results of phylogenetic trees suggested that the more recent WGD event may have occurred after the divergence of the rosid- asterid clade, but before the major diversification in Solanaceae. The older WGD event was shown to have occurred before the divergence of the rosid-asterid clade and after the divergence of rice-Arabidopsis (monocot-dicot). KEYWORDS: Whole genome duplication; Collinearity block; Phylogenetic tree 1. INTRODUCTION Gene duplication has played an important role in evolution (Ohno, 1970). Genome doubling (polyploidy), often referred to whole genome duplication (WGD), has played a dramatic role in the diversification of most, if not all, eukaryotic line- ages. This diversification is perhaps most impressively seen within the angiosperms (Cui et al., 2006; Soltis et al., 2009). The plant genomes have experienced comprehensive sequence diversification (Bowers et al., 2005), such as small fragment insertions, deletions, inversions, translocations, duplication (Navajas-Perez and Paterson, 2009), chromosomal rearrange- ment and fusion (Simillion et al., 2004) from an ancient WGD event. These processes eventually led to species diversification. The complete sequencing of plant genomes has dramati- cally increased the investigation of WGD in angiosperms. Analysis of the Arabidopsis thaliana genome revealed a number of duplicated genes and suggested that two or three rounds of WGD had occured (Blanc et al., 2000, 2003; Paterson et al., 2000; Bowers et al., 2003; Simillion et al., 2004). The complete rice genome also suggested an ancient polyploidy event (Paterson et al., 2004; Yu et al., 2005). In addition, the recently completed sequencing of the Sorghum bicolor genome confirmed WGD in a common ancestor of cereals (Paterson et al., 2009). Sequencing of the complete Populus genome suggested that an independent WGD event occurred before the divergence of Salix and Populus and that an older duplication was shared by both the Populus and * Corresponding author. Tel: þ86 27 8751 0675, fax: þ86 27 8751 0670. E-mail address: [email protected] (Y. Wang). Available online at www.sciencedirect.com Journal of Genetics and Genomics 39 (2012) 361e368 JGG 1673-8527/$ - see front matter Copyright Ó 2012, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Limited and Science Press. All rights reserved. http://dx.doi.org/10.1016/j.jgg.2012.06.002

Whole Genome Duplication of Intra- and Inter-chromosomes in the Tomato Genome

Embed Size (px)

Citation preview

Available online at www.sciencedirect.com

Journal of Genetics and Genomics 39 (2012) 361e368

JGG

ORIGINAL RESEARCH

Whole Genome Duplication of Intra- and Inter-chromosomesin the Tomato Genome

Chi Song a,b, Juan Guo a,b, Wei Sun b,c, Ying Wang a,*

aKey Laboratory of Plant Germplasm Enhancement and Speciality Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences,

Wuhan, Hubei 430074, ChinabGraduate University of Chinese Academy of Sciences, Beijing 100049, China

c South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, Guangdong 510650, China

Received 13 June 2012; revised 17 June 2012; accepted 17 June 2012

Available online 23 June 2012

ABSTRACT

Whole genome duplication (WGD) events have been proven to occur in the evolutionary history of most angiosperms. Tomato isconsidered a model species of the Solanaceae family. In this study, we describe the details of the evolutionary process of the tomatogenome by detecting collinearity blocks and dating the WGD events on the tree of life by combining two different methods: synonymoussubstitution rates (Ks) and phylogenetic trees. In total, 593 collinearity blocks were discovered out of 12 pseudo-chromosomes con-structed. It was evident that chromosome 2 had experienced an intra-chromosomal duplication event. Major inter-chromosomal dupli-cation occurred among all the pseudo-chromosome. We calculated the Ks value of these collinearity blocks. Two peaks of Ks distributionwere found, corresponding to two WGD events occurring approximately 36e82 million years ago (MYA) and 148e205 MYA. Addi-tionally, the results of phylogenetic trees suggested that the more recent WGD event may have occurred after the divergence of the rosid-asterid clade, but before the major diversification in Solanaceae. The older WGD event was shown to have occurred before the divergenceof the rosid-asterid clade and after the divergence of rice-Arabidopsis (monocot-dicot).

KEYWORDS: Whole genome duplication; Collinearity block; Phylogenetic tree

1. INTRODUCTION

Gene duplication has played an important role in evolution(Ohno, 1970). Genome doubling (polyploidy), often referredto whole genome duplication (WGD), has played a dramaticrole in the diversification of most, if not all, eukaryotic line-ages. This diversification is perhaps most impressively seenwithin the angiosperms (Cui et al., 2006; Soltis et al., 2009).The plant genomes have experienced comprehensive sequencediversification (Bowers et al., 2005), such as small fragmentinsertions, deletions, inversions, translocations, duplication(Navajas-Perez and Paterson, 2009), chromosomal rearrange-ment and fusion (Simillion et al., 2004) from an ancient

* Corresponding author. Tel: þ86 27 8751 0675, fax: þ86 27 8751 0670.

E-mail address: [email protected] (Y. Wang).

1673-8527/$ - see front matter Copyright � 2012, Institute of Genetics and Develop

Published by Elsevier Limited and Science Press. All rights reserved.

http://dx.doi.org/10.1016/j.jgg.2012.06.002

WGD event. These processes eventually led to speciesdiversification.

The complete sequencing of plant genomes has dramati-cally increased the investigation of WGD in angiosperms.Analysis of the Arabidopsis thaliana genome revealeda number of duplicated genes and suggested that two or threerounds of WGD had occured (Blanc et al., 2000, 2003;Paterson et al., 2000; Bowers et al., 2003; Simillion et al.,2004). The complete rice genome also suggested an ancientpolyploidy event (Paterson et al., 2004; Yu et al., 2005). Inaddition, the recently completed sequencing of the Sorghumbicolor genome confirmed WGD in a common ancestor ofcereals (Paterson et al., 2009). Sequencing of the completePopulus genome suggested that an independent WGD eventoccurred before the divergence of Salix and Populus and thatan older duplication was shared by both the Populus and

mental Biology, Chinese Academy of Sciences, and Genetics Society of China.

362 C. Song et al. / Journal of Genetics and Genomics 39 (2012) 361e368

Arabidopsis lineages (Tuskan et al., 2006). Jaillon et al. (2007)suggested that the Vitis genome had an ancestral hexaploid-ization, which was shared by all eudicots. However, Velascoet al. (2007) found different results and suggested that Vitisexperienced two distinct WGD events. The potato genomesequencing consortium (2011) found that at least twogenome duplication events indicative of a palaeopolyploidorigin of potato.

The sequenced plant genomes belong to either monocots(rice and S. bicolor) or rosids (A. thaliana, Populus and Caricapapaya), except for Vitis (sister to all other rosids) (Soltiset al., 2000; Jansen et al., 2006). At this time, only potatoand tomato (http://solgenomics.net/) have whole genomesequences for the asterid clade, which is considered to bea sister to the rosid clade. Using large numbers of unigenes,tomato and potato have been shown to contain an independentgenome duplication event, which occurred in their commonancestor within Solanaceae (Blanc and Wolfe, 2004; Schlueteret al., 2004). Schlueter et al. (2004) dated the WGD eventin Solanaceae to 50e52 MYA by analysis of synonymoussubstitution rate (Ks) distributions of paralogous pairs ofunigenes.

Tomato is a member of the family Solanaceae and belongs tothe asterid clade. Tomato plants play an important role in ourdaily life, especially due to its fruit. An international collabo-ration project has been finished for sequencing the genomes oftomato (Solanum lycopersicum) and its closest wild relative(Solanum pimpinellifolium), and a database about Solanaceaespecies has been developed and made available to the public(http://solgenomics.net/) (Li et al., 2008; The tomato genomeconsortium, 2012). It is of great value to study the genomeduplication events in tomato during its evolutionary history. Inorder to identify the evolutionary pattern among the chromo-somes, the detection of collinearity blocks has been performedto elucidate chromosome fusions and rearrangements. Themethods of Ks and phylogenetic trees were used to identify theWGD events and discover their exact placement on the tree oflife. WGD events would provide a valuable framework both forthe inference of shared ancestry by Solanaceae and other asteridclades and for the transferring of findings frommodel organismsto less-well-understood systems.

2. MATERIALS AND METHODS

2.1. Collection of data sets

We used genome sequence data obtained from the SolanaceaeGenomics Network (SGN, http://solgenomics.net/) in May 2012to detect collinearity in tomato. We used 33,840 predictedproteins anchored to 12 pseudo-chromosomes produced by ITAG(International TomatoAnnotationGroup). The repeat librarywasconstructed by combining the repeat elements from SGN, TheInstitute for Genomic Research (TIGR, http://www.tigr.org/tdb/e2k1/plant.repeats, TIGR_Arabidopsis_Repeats.v2_0_0, TIGR_Brassicaceae_Repeats.v2_0_0, TIGR_Solanaceae_Repeats.v3.2_0_0), Munich Information Center for Protein Sequence (MIPS,http://mips.gsf.de/proj/plant/webapp/recat, MIPS_REdat_4.3)

and plant repeats found within the RepeatMasker library (http://www.repeatmasker.org/, RepBase14.03). The unigene sets forpine (release_7), spruce (release_3), petunia (release_2) andtobacco (release_5) were downloaded from TIGR gene indices(http://compbio.dfci.harvard.edu/tgi/plant.html). Additionally,the predicted proteins from thewhole genomes ofmoss (Version1_1), rice (Release 6.0), Arabidopsis (TAIR9) and potato(PGSC_DM_v3.4) were gained from DOE Joint GenomeInstitute (JGI, http://genome.jgi-psf.org//Phypa1_1/Phypa1_1.home.html), Rice Genome Annotation (http://rice.plantbiology.msu.edu/), The Arabidopsis Information Resource (TAIR,http://www.arabidopsis.org/) and SGN (http://solgenomics.net/),respectively.

2.2. Dataset cleaning and identification of collinearityblocks

In order to mask the repeat elements, predicted proteins intomato were compared with the repeat library using tBLASTn(Altschul et al., 1990). If a homologous gene was found in therepeat library using the method of Rost (1999), then thesequence was deleted. Paralogous genes were computed by anall-against-all comparison of tomato proteins using BLASTp.Alignments with an expected E-value of less than 10�10 andmeeting Rost’s criteria were retained. Two genes wereconsidered paralogs if gene B had match for gene A and if Ahad match for B. Using the single linkage group method,genes were defined as tandem genes when they were located inthe same pseudo-chromosome and the number of genesbetween two adjacent paralogous genes was less than 30. Wediscarded tandem duplicated genes from the paralogous genesets and retained only the longest of the tandem genes forfurther analysis. Collinearity blocks were detected by theMCSCAN (http://chibba.agtec.uga.edu/duplication/mcscan/)by default parameters. In order to show that the collinearityblocks were not detected by chance, we performed a simula-tion analysis. The position of the gene was randomized 1000times. Using the same criteria as above to detect collinearityblocks, the probability of the duplicated groups detected bychance was estimated using the one-sample t-test (P < 0.01).The software CIRCOS (http://circos.ca/software/download/circos/) was used to draw the chromatin topology of 12pseudo-chromosome of tomato genome.

2.3. Distribution of synonymous substitutions among thecollinearity blocks

For each pair of paralogs, the two proteins were alignedusing ClustalW (Thompson et al., 1994). The resultingalignment was used as a guide to align the nucleotidesequences. To determine the Ks value of the pairs, we usedYN00 from the PAML package (Yang, 2007). Pairs with a Ksvalue less than 3.0 were retained for further studies. Wecalculated the median Ks value within the collinearity block,which was also considered as the Ks value of the corre-sponding collinearity block. Ks distributions were based onthese local median Ks values.

363C. Song et al. / Journal of Genetics and Genomics 39 (2012) 361e368

2.4. Construction of phylogenetic trees

To find homologs of tomato paralogous proteins withincollinearity blocks, duplicated gene pairs from tomato weresearched against the unigene sets of pine, spruce, petunia andtobacco. Homologs were selected if the tBLASTn hit was thelongest matching region, the E-value was less than 10�10 andthe hit met the Rost’s criteria. These homologous unigenes weretranslated to proteins using the Genewise program (Birneyet al., 2004) with the corresponding best match protein froma BLASTx search with the non-redundant protein sequencesfrom National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/) as a guide. Additionally, BLASTpwas used to detect homologs between tomato paralogous pairsand the predicted proteins of moss, rice and Arabidopsis usingthe same criteria as above. Phylogenetic trees were constructedusing four different protein sequences: the two tomato paralo-gous proteins from a duplicated gene pair, the best homologfrom a comparison organism and the best homolog from anoutgroup organism (moss, pine, spruce). For the foursequences, multiple alignments were performed using

Fig. 1. Genomic map of collinearity blocks in the tomato genome.

It is evident that chromosome 2 had experienced an intra-chromosomal duplicat

pseudo-chromosome. Lines link collinearity blocks. The size of chromosome is co

ClustalW (Thompson et al., 1994). The PHYLIP programs(version 3.68, http://evolution.genetics.washington.edu/phylip.html) were used for bootstrap maximum likelihood (program“proml”) analyses. The fraction of “internal trees” associatedwith corresponded collinearity blocks of tomato was comparedusing one-way ANOVA for corrected samples; Tukey’s stu-dentized range test (Bowers et al., 2003) and the LSD test forpost-ANOVA comparisons among species.

3. RESULTS

3.1. Detection of collinearity blocks

After removing the genes unanchored to pseudo-chromosomes, 33,840 genes are found in 12 tomato pseudo-chromosomes. Each tomato chromosome contains a differentnumber of genes (Fig. S1). Compared with other chromo-somes, chromosome 2 has higher gene density. After removingrepetitive elements and discarding tandem duplicated genes,26,178 predicted proteins were used to detect the collinearity

ion event. Major inter-chromosomal duplication had occurred among all the

nsistent with the actual pseudo-chromosome size.

364 C. Song et al. / Journal of Genetics and Genomics 39 (2012) 361e368

blocks. The average size of a protein in the cleaned list was358 amino acids; sizes ranged from 50 to 5080 amino acids.

We found 241,562 reciprocal blast hits using BLASTp. Thecollinearity block was selected only when at least five paral-ogous genes clustered with conserved content and order. In ourstudy, 593 collinearity blocks were detected (Table S1). To testwhether the collinearity blocks were detected by chance, weperformed a simulation process using a Perl script byrandomizing the original data 1000 times. We then calculatedthe average collinearity blocks arising by chance. Accordingto the simulation, 22 collinearity blocks would be detected bychance, which suggests that the collinearity blocks occurringby chance were significantly lower (one-sample t-test,P < 0.01) than those detected using our criteria.

Among the collinearity blocks, 131 collinearity blocks(22%) contained 7 paralogous gene pairs, 120 collinearityblocks (20%) contained 8 paralogous gene pairs, 70 collin-earity blocks (12%) contained 9 paralogous gene pairs and 272collinearity blocks (46%) contained more than 9 paralogousgene pairs (Table S1). The largest collinearity block whichcontained 90 paralogous gene pairs corresponded to tomatochromosomes 4 and 12 and showed conserved gene contentand order (Table S2). The largest collinearity block took up9% of length of chromosome 4 and 3% of length of chro-mosome 12.

These collinearity blocks compose an interesting networkamong chromosomes (Fig. 1). We found that tomato chro-mosome 2 contained a large-scale intra-chromosomalduplication, which involved 9 collinearity blocks across thewhole chromosome. This result suggests that there was anintra-chromosomal duplication event, which occurred onchromosome 2. Additionally, we found lots of inter-chromosomal duplication events. Each chromosome con-tained collinearity blocks with other 11 chromosomes.

Fig. 2. Ks distribution of tomato collinearity blocks.

By plotting the number of collinearity blocks against the median Ks value, a distribu

obvious secondary peaks were found.

Chromosome 3 contained the most inter-chromosomecollinearity blocks. There were 126 collinearity blocksdetected between chromosome 3 and 6 (25 blocks), 3 and 2(18 blocks), 3 and 8 (13 blocks), 3 and 1 (12 blocks), 3 and 5(11 blocks), 3 and 10 (11 blocks), 3 and 12 (9 blocks), 3 and7 (8 blocks), 3 and 11 (7 blocks), 3 and 4 (6 blocks), 3 and 9(6 blocks). Of the 593 collinearity blocks, intra-chromosomalduplications were found in 54 collinearity blocks (9%),which were concentrated in chromosomes 2 (9 blocks) and 6(8 blocks). However, no intra-chromosomal duplication wasfound in chromosome 3 based on our current method. Inter-chromosomal duplications were found in 539 block pairs(91%), which covered most of the tomato genome.

3.2. Age distributions of collinearity blocks based onsynonymous substitution rates

If duplicated blocks were generated by a single, largefragmental duplication event, then they should have beencreated at the same time. The number of synonymous substi-tutions per site increases during the evolutionary process.Thus, the Ks between coding sequences of each paralogouspair may provide a clue to duplication events. A remarkablesecondary peak of Ks distribution may suggest a large frag-mental duplication event such as whole genome duplication,segmental duplication or aneuploidy (Blanc and Wolfe, 2004).Paralogous proteins with a Ks value less than 3.0 were retainedfrom original set of collinearity blocks. The age of thecollinearity block was determined by calculating the medianKs value of paralogous proteins contained in the collinearityblock. In total, 570 collinearity blocks out of 593 (approxi-mately 96%) were assigned an age by this method. By plottingthe number of collinearity blocks against the median Ks value,we obtained a distribution reflecting the approximate age of

tion was obtained reflecting the approximate age of the duplication events. Two

365C. Song et al. / Journal of Genetics and Genomics 39 (2012) 361e368

the duplication events (Fig. 2). Besides the initial peak, we candistinguish two secondary peaks with Ks from 0.8 to 1.0 andfrom 1.8 to 2.5, respectively. In order to estimate the age of theWGD, we used a molecular clock rate of 6.1 � 10�9 synon-ymous substitutions per site per year, as suggested by Lynchand Conery (2000). Our results suggest that the more recentduplication may have occurred 36e82 million years ago(MYA), and the older duplication occurred between148e205 MYA.

3.3. Intra-chromosome duplication withinchromosome 2

After analyzing the collinearity blocks, we found thatchromosome 2 was an active location for intra-chromosomeduplication events than others (one-sample t-test, P < 0.01).A total of 9 out of the 54 intra-chromosome collinearity blocks(17%) were related to chromosome 2 (Fig. 3). These intra-chromosome collinearity blocks contained approximately66% of the total length of the chromosome 2. Besides 9intra-chromosome duplication, chromosome 2 contained 96inter-chromosome collinearity blocks detected between

Fig. 3. Details of intra-chromosome duplication within chromosome 2.

Curves link collinearity blocks. For the all intra-chromosomal collinearity

blocks of chromosome 2, two collinearity blocks are contained in the more

ancient duplication event and six in the recent duplication event.

chromosome 2 and 3 (18 blocks), 2 and 4 (14 blocks), 2 and 1(10 blocks), 2 and 12 (10 blocks), 2 and 5 (9 blocks), 2 and 7(9 blocks), 2 and 6 (8 blocks), 2 and 10 (8 blocks), 2 and 8 (4blocks), 2 and 9 (4 blocks), 2 and 11 (2 blocks) (Fig. 1).

These intra-chromosomal collinearity blocks indicatea complex relationship. We further analyzed the intra-chromosomal duplications based on the distribution ofmedian Ks value. The results indicated that the intra-chromosomal duplications in chromosome 2 correspond totwo WGD events (Fig. 3). There were two collinearity blockswith Ks distribution from 1.8 to 2.5, which were proposed tooccur during the older WGD event. Another six collinearityblocks, with Ks distributions from 0.8 to 1.0, were proposed tohave occurred during the more recent WGD event, asmentioned above.

3.4. Age distribution of collinearity blocks based onphylogenetic trees

To decide the exact time of WGD events and map them to anexact position on the tree of life, a phylogenetic tree was con-structed using rice, Arabidopsis, petunia, tobacco and potato;moss, pine and spruce were selected as outgroup species. All ofthe species used in our study belong to different evolutionaryclades (Fig. 4). Phylogenetic trees were constructed with fourdifferent proteins including a pair of tomato paralogousproteins, the best matching protein from the plant species ofinterest and the best protein match from the outgroup species.Moss, pine and spruce were chosen as the outgroup species toreduce the accident error. The PHYLIP programs “proml” wasusedwith default parameters to construct the phylogenetic trees.Only treeswith at least a 70-percent bootstrap confidence (out of100 simulations) were retained for further analysis. Twopossible phylogenetic trees could be produced. The first is aninternal tree with the homologous protein more similar to one ofthe tomato paralogous proteins than the other tomato paralogousprotein, which would indicate that the taxon divergence is morerecent than the gene duplication. The other is an external treewhere the tomato paralogous proteins are more similar to eachother than to the homologous protein, indicating that the geneduplication is more recent than the taxon divergence (Chapmanet al., 2004; Kim et al., 2009).

Fig. 4. The phylogeny of species analyzed in this study.

Black circles indicate the placement of WGD events that occurred.

366 C. Song et al. / Journal of Genetics and Genomics 39 (2012) 361e368

For the first secondary peak of Ks distribution, we selectedthe collinearity blocks with Ks distribution ranging from 0.8 to1.0. The frequency of internal trees in asterids was signifi-cantly higher than those in both Arabidopsis and rice, basedonly on the LSD test (P < 0.05) (Table 1). These resultsindicate that the more recent WGD event may have occurredafter the divergence between the rosid-asterid clade and beforethe major diversification in Solanaceae (Fig. 4).

For the second secondary peak of Ks distribution, weselected the collinearity blocks with Ks distribution from 1.8to 2.5. The frequency of internal trees in eudicots wassignificantly higher than in rice based on Tukey’s studentizedrange test (P < 0.01) (Table 1). This result clearly indicatesthat the older WGD event occurred before the divergence ofthe rosid-asterid clade and after the divergence of rice-Ara-bidopsis (monocot-dicot) (Fig. 4).

4. DISCUSSION

WGD events have been comprehensively studied in angio-sperms, especially forArabidopsis, rice and other species wherethe sequence of the whole genome is available (Blanc et al.,2000; Yu et al., 2005; Tuskan et al., 2006; Velasco et al.,2007). However, large fragmental duplications in the asteridclade have been poorly studied. Using large numbers of ESTdata sets, a recent large-scale duplication event in Solanaceaehas been proposed (Cui et al., 2006), which may be shared bytomato and potato (Blanc and Wolfe, 2004; Schlueter et al.,2004). The potato genome sequencing consortium (2011) hadindicated that the recent duplication occurred near theCretaceous-Tertiary boundary (w65 MYA). However, it is stillunclear when this more recent large-scale duplication eventoccurred, if other members of Solanaceae shared this morerecent large-scale duplication event besides Solanoideae and ifan older large fragmental duplication event existed.

Table 1

Phylogenetic dating of genome duplication in the tomato lineage

Rooted species Rice Arabidopsis

Peak 1a

(Ks value from 0.4 to 1.0)

Moss 0.012 (165) 0.053 (150)

Pine 0.018 (226) 0.064 (204)

Spruce 0.018 (226) 0.067 (210)

Total 0.016 (617) 0.061 (564)

Significanceb A A

Peak 2c

(Ks value from 1.4 to 2.4)

Moss 0.475 (40) 0.700 (50)

Pine 0.493 (67) 0.767 (73)

Spruce 0.424 (59) 0.802 (81)

Total 0.464 (166) 0.756 (204)

Significanced A B

Primary data represents (decimal) ratio of gene trees that are internal. The values in t

for post-ANOVA comparisons among species; b Significantly different at P < 0.05;

different at P < 0.01. Significances with the same letters are not significantly differe

In our study, two WGD events have been detected. Basedon the results of phylogenetic analysis, WGD events have beenmapped to an exact position on the tree of life. These intra-and inter-chromosomal duplications detected in our studysuggest that WGD events have occurred during the longevolutionary history of tomato. Ku et al. found that one of theduplication events between tomato and Arabidopsis is ancientand may predate the divergence of the Arabidopsis and tomatolineages (2000). By cloning and constructing the phylogenetictrees of MIKCc-type MADS-box genes in tomato, it wassuggested that multiple gene duplication events have takenplace after the diversification of the lineages leading to eithertomato or Arabidopsis (Hileman et al., 2006). In our study, 593collinearity blocks have been found and belong to twodifferent large-scale duplication events. These results suggestgenomic reorganization and a divergent evolution of blockpairs after ancient duplications (Vandepoele et al., 2003;Paterson et al., 2004). During the long evolution process, theplant genome had experienced great changes, which involvedvarious small chromosomal rearrangements (insertions, dele-tions, inversions and translocations) (Bennetzen, 2000),extensive genome replication, chromosome fusions andfrequent rearrangements (Lagercrantz, 1998). These changesmake it difficult to look for the traces of duplication events.

Schlueter et al. (2004) presumed a WGD event in Sol-anaceae about 50e52 MYA. Bell et al. (2005) proposed thatthe age of the stem group of Solanaceae was approximately49e68 MYA and the crown group was 32e50 MYA. Based onthe analysis of Ks distribution, the more recent WGD eventwas proven to occur after the divergence of the rosid andasterid clades. This duplication event is in approximateagreement with the estimated age of the duplication believedto have occurred in Solanaceae. However, the more recentWGD event contained fewer collinearity blocks than the olderWGD event (Fig. 2). Additionally, the older duplication

Petunia Tobacco Potato

0.673 (55) 0.820 (139) 0.824 (255)

0.675 (80) 0.824 (193) 0.869 (351)

0.827 (81) 0.827 (173) 0.801 (357)

0.725 (216) 0.824 (505) 0.831 (963)

B B B

0.860 (43) 0.945 (73) 0.907 (129)

0.847 (59) 0.897 (126) 0.918 (208)

0.821 (67) 0.876 (113) 0.869 (214)

0.843 (169) 0.906 (312) 0.898 (551)

B B B

he parentheses indicate the number of genes that were informative. a LSD testc Tukey’s studentized range test for post-ANOVA comparisons; d Significantly

nt.

367C. Song et al. / Journal of Genetics and Genomics 39 (2012) 361e368

detected in our study, which showed a wide range of Ksvalues, may have occurred before the divergence of Arabi-dopsis and other dicots, corresponding to the b event sug-gested by Bowers et al. (2003). Because of the intrinsiclimitations on Ks value detection e “saturation of synonymouschanges” (Li and Li, 1997), we could not find any moreancient duplication event.

Based on the results of phylogenetic analysis, the olderWGDevent was mapped exactly on the phylogenetic tree, whichcorresponds to the b event suggested by Bowers et al. (2003).However, our results also suggest that the more recent WGDevent occurred after the divergence of rosid and asterid clade andwas shared by Petunieae, Nicotianoideae and Solanoideae.Interestingly, Nicotianoideae and Solanoideae are referred to asthe haploid n ¼ 12 clade. Other members of Solanaceae havehaploid chromosome numbers from n ¼ 7 to 11. If these threeclades shared the sameWGD event, members of Solanaceae nothaving a haploid chromosome number of n ¼ 12 would haveexperienced comprehensive chromosome fusion and arrange-ment after the more recentWGD event (Vandepoele et al., 2003;Paterson et al., 2004). This assumption would need to beinvestigated oncemore genomic sequence data is available fromother members of Solanaceae.

In our study, we used tomato whole genome to detect theWGD event during the long evolutionary history of Sol-anaceae. Two WGD events had been detected after thedivergence of rice-Arabidopsis (monocot-dicot). With themore and more whole genome data available, more details willbe able to be determined about the evolutionary history oftomato and other closely related Solanum genomes.

ACKNOWLEDGEMENTS

We thank Yangzhou Wang and Wenfeng Guo for advices onstatistical analysis. This work was supported by the Major StateBasic Research Development Program of China (973 Program)(Grant No. 2010CB126603), the National Natural ScienceFoundation of China (No. 30570172), and the CAS/SAFEAInternational Partnership Program forCreativeResearchTeams.

SUPPLEMENTARY DATA

Fig. S1. Ratio of gene number and chromosome length ofeach tomato chromosome.

Table S1. Length distribution of all the 593 collinearityblocks detected in our study.

Table S2. Detailed information of the largest collinearityblock from chromosome 4 and 12.

Supplementary data associated with this article can befound in the online version at http://dx.doi.org/10.1016/j.jgg.2012.06.002.

REFERENCES

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic

local alignment search tool. J. Mol. Biol. 215, 403e410.

Bell, C.D., Soltis, D.E., Soltis, P.S., 2005. The age of the angiosperms:

a molecular timescale without a clock. Evolution 59, 1245e1258.

Bennetzen, J.L., 2000. Comparative sequence analysis of plant nuclear

genomes: microcolinearity and its many exceptions. Plant Cell 12,

1021e1029.Birney, E., Clamp, M., Durbin, R., 2004. Genewise and genomewise. Genome

Res. 14, 988e995.

Blanc, G., Barakat, A., Guyot, R., Cooke, R., Delseny, M., 2000. Extensive

duplication and reshuffling in the Arabidopsis genome. Plant Cell 12,

1093e1101.

Blanc, G., Hokamp, K., Wolfe, K.H., 2003. A recent polyploidy superimposed

on older large-scale duplications in the Arabidopsis genome. Genome Res.

13, 137e144.

Blanc, G., Wolfe, K.H., 2004. Widespread paleopolyploidy in model plant

species inferred from age distributions of duplicate genes. Plant Cell 16,

1667e1678.Bowers, J.E., Chapman, B.A., Rong, J.K., Paterson, A.H., 2003. Unravelling

angiosperm genome evolution by phylogenetic analysis of chromosomal

duplication events. Nature 422, 433e438.Bowers, J.E., Arias, M.A., Asher, R., Avise, J.A., Ball, R.T., Brewer, G.A.,

Buss, R.W., Chen, A.H., Edwards, T.M., Estill, J.C., Exum, H.E.,

Goff, V.H., Herrick, K.L., Steele, C.L.J., Karunakaran, S., Lafayette, G.K.,

Lemke, C., Marler, B.S., Masters, S.L., McMillan, J.M., Nelson, L.K.,

Newsome, G.A., Nwakanma, C.C., Odeh, R.N., Phelps, C.A., Rarick, E.A.,

Rogers, C.J., Ryan, S.P., Slaughter, K.A., Soderlund, C.A., Tang, H.B.,

Wing, R.A., Paterson, A.H., 2005. Comparative physical mapping links

conservation of microsynteny to chromosome structure and recombination

in grasses. Proc. Natl. Acad. Sci. USA 102, 13206e13211.

Chapman, B.A., Bowers, J.E., Schulze, S.R., Paterson, A.H., 2004. A

comparative phylogenetic approach for dating whole genome duplication

events. Bioinformatics 20, 180e185.

Cui, L.Y.,Wall, P.K., Leebens-Mack, J.H., Lindsay, B.G., Soltis, D.E., Doyle, J.J.,

Soltis, P.S., Carlson, J.E., Arumuganathan, K., Barakat, A., Albert, V.A.,

Ma, H., dePamphilis, C.W., 2006. Widespread genome duplications

throughout the history of flowering plants. Genome Res. 16, 738e749.

Hileman, L.C., Sundstrom, J.F., Litt, A., Chen, M.Q., Shumba, T., Irish, V.F.,

2006. Molecular and phylogenetic analyses of the mads-box gene family in

tomato. Mol. Biol. Evol. 23, 2245e2258.Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande, A.,

Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., Vezzi, A., Legeai, F.,

Hugueney, P., Dasilva, C., Horner, D., Mica, E., Jublot, D., Poulain, J.,

Bruyere, C., Billault, A., Segurens, B., Gouyvenoux, M., Ugarte, E.,

Cattonaro, F., Anthouard, V., Vico, V., Del Fabbro, C., Alaux, M., Di

Gaspero, G., Dumas, V., Felice, N., Paillard, S., Juman, I., Moroldo, M.,

Scalabrin, S., Canaguier, A., Le Clainche, I., Malacrida, G., Durand, E.,

Pesole, G., Laucou, V., Chatelet, P., Merdinoglu, D., Delledonne, M.,

Pezzotti, M., Lecharny, A., Scarpelli, C., Artiguenave, F., Pe, M.E.,

Valle, G., Morgante, M., Caboche, M., Adam-Blondon, A.F.,

Weissenbach, J., Quetier, F., Wincker, P., French-Italian, P., 2007. The

grapevine genome sequence suggests ancestral hexaploidization in major

angiosperm phyla. Nature 449, 463e467.

Jansen, R.K., Kaittanis, C., Saski, C., Lee, S.B., Tomkins, J., Alverson, A.J.,

Daniell, H., 2006. Phylogenetic analyses of Vitis (vitaceae) based on

complete chloroplast genome sequences: effects of taxon sampling and

phylogenetic methods on resolving relationships among rosids. BMC Evol.

Biol. 6, 32e45.Kim, C., Tang, H., Paterson, A.H., 2009. Duplication and divergence of grass

genomes: integrating the chloridoids. Trop. Plant Biol. 2, 51e62.

Lagercrantz, U., 1998. Comparative mapping between Arabidopsis thaliana

and Brassica nigra indicates that Brassica genomes have evolved through

extensive genome replication accompanied by chromosome fusions and

frequent rearrangements. Genetics 150, 1217e1228.

Li, C., Zhao, J., Jiang, H., Geng, Y., Dai, Y., Fan, H., Zhang, D., Chen, J.,

Lu, F., Shi, J., Sun, S., Chen, J., Yang, X., Lu, C., Chen, M., Cheng, Z.,

Ling, H., Wang, Y., Xue, Y., Li, C., 2008. A snapshot of the Chinese SOL

project. J. Genet. Genomics 35, 387e390.

Li, W.-H., Li, W.H., 1997. Molecular Evolution. Sinauer Associates, Sun-

derland, MA.

368 C. Song et al. / Journal of Genetics and Genomics 39 (2012) 361e368

Lynch, M., Conery, J.S., 2000. The evolutionary fate and consequences of

duplicate genes. Science 290, 1151e1155.

Navajas-Perez, R., Paterson, A.H., 2009. Patterns of tandem repetition in plant

whole genome assemblies. Mol. Genet. Genomics 281, 579e590.

Ohno, S., 1970. Evolution by Gene Duplication. Springer, New York.

Paterson, A.H., Bowers, J.E., Burow, M.D., Draye, X., Elsik, C.G., Jiang, C.X.,

Katsar, C.S., Lan, T.H., Lin, Y.R., Ming, R., Wright, R.J., 2000.

Comparative genomics of plant chromosomes. Plant Cell 12, 1523e1540.Paterson, A.H., Bowers, J.E., Chapman, B.A., 2004. Ancient polyploidization

predating divergence of the cereals, and its consequences for comparative

genomics. Proc. Natl. Acad. Sci. USA 101, 9903e9908.

Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood, J.,

Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A.,

Schmutz, J., Spannagl, M., Tang, H.B., Wang, X.Y., Wicker, T.,

Bharti, A.K., Chapman, J., Feltus, F.A., Gowik, U., Grigoriev, I.V.,

Lyons, E., Maher, C.A., Martis, M., Narechania, A., Otillar, R.P.,

Penning, B.W., Salamov, A.A., Wang, Y., Zhang, L.F., Carpita, N.C.,

Freeling, M., Gingle, A.R., Hash, C.T., Keller, B., Klein, P., Kresovich, S.,

McCann, M.C., Ming, R., Peterson, D.G., Mehboob ur, R., Ware, D.,

Westhoff, P., Mayer, K.F.X., Messing, J., Rokhsar, D.S., 2009. The Sorghum

bicolor genome and the diversification of grasses. Nature 457, 551e556.

Rost, B., 1999. Twilight zone of protein sequence alignments. Protein Eng. 12,

85e94.Schlueter, J.A., Dixon, P., Granger, C., Grant, D., Clark, L., Doyle, J.J.,

Shoemaker, R.C., 2004. Mining EST databases to resolve evolutionary

events in major crop species. Genome 47, 868e876.

Simillion, C., Vandepoele, K., Saeys, Y., van de Peer, Y., 2004. Building

genomic profiles for uncovering segmental homology in the twilight zone.

Genome Res. 14, 1095e1106.

Soltis, D.E., Soltis, P.S., Chase, M.W., Mort, M.E., Albach, D.C., Zanis, M.,

Savolainen, V., Hahn, W.H., Hoot, S.B., Fay, M.F., Axtell, M.,

Swensen, S.M., Prince, L.M., Kress, W.J., Nixon, K.C., Farris, J.S., 2000.

Angiosperm phylogeny inferred from 18s rDNA, rbcL, and atpB

sequences. Bot. J. Linn. Soc. 133, 381e461.Soltis, D.E., Albert, V.A., Leebens-Mack, J., Bell, C.D., Paterson, A.H.,

Zheng, C.F., Sankoff, D., dePamphilis, C.W., Wall, P.K., Soltis, P.S., 2009.

Polyploidy and angiosperm diversification. Am. J. Bot. 96, 336e348.

The potato genome sequencing consortium, 2011. Genome sequence and

analysis of the tuber crop potato. Nature 475, 189e195.

The tomato genome consortium, 2012. The tomato genome sequence provides

insights into fleshy fruit evolution. Nature 485, 635e641.

Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. Clustal-w e improving the

sensitivity of progressive multiple sequence alignment through sequence

weighting, position-specific gap penalties and weight matrix choice.

Nucleic Acids Res. 22, 4673e4680.Tuskan, G.A., DiFazio, S., Jansson, S., Bohlmann, J., Grigoriev, I.,

Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A., Schein, J.,

Sterck, L., Aerts, A., Bhalerao, R.R., Bhalerao, R.P., Blaudez, D.,

Boerjan, W., Brun, A., Brunner, A., Busov, V., Campbell, M., Carlson, J.,

Chalot, M., Chapman, J., Chen, G.L., Cooper, D., Coutinho, P.M.,

Couturier, J., Covert, S., Cronk, Q., Cunningham, R., Davis, J.,

Degroeve, S., Dejardin, A., Depamphilis, C., Detter, J., Dirks, B.,

Dubchak, I., Duplessis, S., Ehlting, J., Ellis, B., Gendler, K., Goodstein, D.,

Gribskov, M., Grimwood, J., Groover, A., Gunter, L., Hamberger, B.,

Heinze, B., Helariutta, Y., Henrissat, B., Holligan, D., Holt, R., Huang, W.,

Islam-Faridi, N., Jones, S., Jones-Rhoades, M., Jorgensen, R., Joshi, C.,

Kangasjarvi, J., Karlsson, J., Kelleher, C., Kirkpatrick, R., Kirst, M.,

Kohler, A., Kalluri, U., Larimer, F., Leebens-Mack, J., Leple, J.C.,

Locascio, P., Lou, Y., Lucas, S., Martin, F., Montanini, B., Napoli, C.,

Nelson, D.R., Nelson, C., Nieminen, K., Nilsson, O., Pereda, V., Peter, G.,

Philippe, R., Pilate, G., Poliakov, A., Razumovskaya, J., Richardson, P.,

Rinaldi, C., Ritland, K., Rouze, P., Ryaboy, D., Schmutz, J., Schrader, J.,

Segerman, B., Shin, H., Siddiqui, A., Sterky, F., Terry, A., Tsai, C.J.,

Uberbacher, E., Unneberg, P., Vahala, J., Wall, K., Wessler, S., Yang, G.,

Yin, T., Douglas, C., Marra, M., Sandberg, G., Van de Peer, Y.,

Rokhsar, D., 2006. The genome of black cottonwood, Populus trichocarpa

(torr. & gray). Science 313, 1596e1604.Vandepoele, K., Simillion, C., Van de Peer, Y., 2003. Evidence that rice and

other cereals are ancient aneuploids. Plant Cell 15, 2192e2202.

Velasco, R., Zharkikh, A., Troggio, M., Cartwright, D.A., Cestaro, A.,

Pruss, D., Pindo, M., Fitzgerald, L.M., Vezzulli, S., Reid, J.,

Malacarne, G., Iliev, D., Coppola, G., Wardell, B., Micheletti, D.,

Macalma, T., Facci, M., Mitchell, J.T., Perazzolli, M., Eldredge, G.,

Gatto, P., Oyzerski, R., Moretto, M., Gutin, N., Stefanini, M., Chen, Y.,

Segala, C., Davenport, C., Dematte, L., Mraz, A., Battilana, J.,

Stormo, K., Costa, F., Tao, Q., Si-Ammour, A., Harkins, T., Lackey, A.,

Perbost, C., Taillon, B., Stella, A., Solovyev, V., Fawcett, J.A., Sterck, L.,

Vandepoele, K., Grando, S.M., Toppo, S., Moser, C., Lanchbury, J.,

Bogden, R., Skolnick, M., Sgaramella, V., Bhatnagar, S.K., Fontana, P.,

Gutin, A., Van de Peer, Y., Salamini, F., Viola, R., 2007. A high quality

draft consensus sequence of the genome of a heterozygous grapevine

variety. PLoS ONE 2, e1326.

Yang, Z.H., 2007. Paml 4: phylogenetic analysis by maximum likelihood. Mol.

Biol. Evol. 24, 1586e1591.

Yu, J., Wang, J., Lin, W., Li, S.G., Li, H., Zhou, J., Ni, P.X., Dong, W., Hu, S.N.,

Zeng, C.Q., Zhang, J.G., Zhang, Y., Li, R.Q., Xu, Z.Y., Li, S.T., Li, X.R.,

Zheng, H.K., Cong, L.J., Lin, L., Yin, J.N., Geng, J.N., Li, G.Y., Shi, J.P.,

Liu, J., Lv, H., Li, J., Wang, J., Deng, Y.J., Ran, L.H., Shi, X.L., Wang, X.Y.,

Wu, Q.F., Li, C.F., Ren, X.Y., Wang, J.Q., Wang, X.L., Li, D.W., Liu, D.Y.,

Zhang, X.W., Ji, Z.D., Zhao, W.M., Sun, Y.Q., Zhang, Z.P., Bao, J.Y.,

Han, Y.J., Dong, L.L., Ji, J., Chen, P., Wu, S.M., Liu, J.S., Xiao, Y., Bu, D.B.,

Tan, J.L., Yang, L., Ye, C., Zhang, J.F., Xu, J.Y., Zhou,Y., Yu,Y.P., Zhang,B.,

Zhuang, S.L., Wei, H.B., Liu, B., Lei, M., Yu, H., Li, Y.Z., Xu, H.,Wei, S.L.,

He, X.M., Fang, L.J., Zhang, Z.J., Zhang, Y.Z., Huang, X.G., Su, Z.X.,

Tong,W., Li, J.H., Tong, Z.Z., Li, S.L., Ye, J.,Wang, L.S., Fang, L., Lei, T.T.,

Chen,C., Chen,H.,Xu, Z., Li, H.H.,Huang,H.Y., Zhang, F., Xu,H.Y., Li, N.,

Zhao, C.F., Li, S.T., Dong, L.J., Huang,Y.Q., Li, L., Xi, Y., Qi, Q.H., Li,W.J.,

Zhang, B., Hu, W., Zhang, Y.L., Tian, X.J., Jiao, Y.Z., Liang, X.H., Jin, J.A.,

Gao, L., Zheng, W.M., Hao, B.L., Liu, S.Q., Wang, W., Yuan, L.P.,

Cao, M.L., McDermott, J., Samudrala, R., Wang, J., Wong, G.K.S.,

Yang, H.M., 2005. The genomes of Oryza sativa: a history of duplications.

PLoS Biol. 3, 266e281.