66
1 LARGE-SCALE BIOLOGY 1 2 Breaking Free: the Genomics of Allopolyploidy-facilitated Niche Expansion 3 in White Clover 4 5 Andrew G. Griffiths 1* , Roger Moraga 1 , Marni Tausen 2,3 , Vikas Gupta 2 , Timothy P. 6 Bilton 4 , Matthew A. Campbell 5 , Rachael Ashby 4 , Istvan Nagy 6 , Anar Khan 4 , Anna 7 Larking 1 , Craig Anderson 1 , Benjamin Franzmayr 1 , Kerry Hancock 1 , Alicia Scott 1 , Nick 8 W. Ellison 1 , Murray P. Cox 5 , Torben Asp 6 , Thomas Mailund 3 , Mikkel H. Schierup 3,7 , 9 and Stig Uggerhøj Andersen 2* 10 * To whom correspondence should be addressed. 11 12 Author affiliations: 13 1 AgResearch, Grasslands Research Centre, Private Bag 11008, Palmerston North 4442, New 14 Zealand. 15 2 Department of Molecular Biology and Genetics, Aarhus University, Gustav Wieds Vej 10, 16 8000 Aarhus C, Denmark. 17 3 Bioinformatics Research Centre, Aarhus University, C. F. Møllers Allé 8, 8000 Aarhus C, 18 Denmark. 19 4 AgResearch, Invermay Agricultural Centre, Private Bag 50034, Mosgiel 9053, New 20 Zealand. 21 5 Bioinformatics and Statistics Group, Institute of Fundamental Sciences, Massey University, 22 Private Bag 11222, Palmerston North 4410, New Zealand. 23 6 Department of Molecular Biology and Genetics, Aarhus University, Forsøgsvej 1, 4200 24 Slagelse, Denmark. 25 7 Department of Bioscience, Aarhus University, Ny Munkegade 116, 8000 Aarhus C, 26 Denmark. 27 Corresponding authors 28 Andrew Griffiths 29 Address: AgResearch, Grasslands Research Centre, Private Bag 11008, Palmerston North 30 4442, New Zealand. 31 Tel: +6463518183 32 Email: [email protected] 33 Stig U. Andersen 34 Address: Department of Molecular Biology and Genetics, Aarhus University, Gustav Wieds 35 Vej 10, 8000 Aarhus C, Denmark. 36 Tel: +4587154937 37 Email: [email protected] 38 39 Short Title: Allopolyploidy-facilitated niche expansion in White Clover 40 41 One-sentence Summary: Allopolyploid white clover arose in the last glaciation, retained 42 progenitor genome integrity, and the few genes switching between homoeologues across 43 tissues were enriched for flavonoid synthesis. 44 45 Material Distribution Footnote: The authors responsible for distribution of materials 46 integral to the findings presented in this article in accordance with the policy described in the 47 Plant Cell Advance Publication. Published on April 25, 2019, doi:10.1105/tpc.18.00606 ©2019 American Society of Plant Biologists. All Rights Reserved

Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

1

LARGE-SCALE BIOLOGY 1 2

Breaking Free: the Genomics of Allopolyploidy-facilitated Niche Expansion 3

in White Clover 4

5

Andrew G. Griffiths1*

, Roger Moraga1, Marni Tausen

2,3, Vikas Gupta

2, Timothy P. 6

Bilton4, Matthew A. Campbell

5, Rachael Ashby

4, Istvan Nagy

6, Anar Khan

4, Anna 7

Larking1, Craig Anderson

1, Benjamin Franzmayr

1, Kerry Hancock

1, Alicia Scott

1, Nick 8

W. Ellison1, Murray P. Cox

5, Torben Asp

6, Thomas Mailund

3, Mikkel H. Schierup

3,7, 9

and Stig Uggerhøj Andersen2*

10 *To whom correspondence should be addressed. 11

12

Author affiliations: 13 1AgResearch, Grasslands Research Centre, Private Bag 11008, Palmerston North 4442, New 14

Zealand. 15 2Department of Molecular Biology and Genetics, Aarhus University, Gustav Wieds Vej 10, 16

8000 Aarhus C, Denmark. 17 3Bioinformatics Research Centre, Aarhus University, C. F. Møllers Allé 8, 8000 Aarhus C, 18

Denmark. 19 4AgResearch, Invermay Agricultural Centre, Private Bag 50034, Mosgiel 9053, New 20

Zealand. 21 5Bioinformatics and Statistics Group, Institute of Fundamental Sciences, Massey University, 22

Private Bag 11222, Palmerston North 4410, New Zealand. 23 6Department of Molecular Biology and Genetics, Aarhus University, Forsøgsvej 1, 4200 24

Slagelse, Denmark. 25 7Department of Bioscience, Aarhus University, Ny Munkegade 116, 8000 Aarhus C, 26

Denmark. 27

Corresponding authors 28

Andrew Griffiths 29

Address: AgResearch, Grasslands Research Centre, Private Bag 11008, Palmerston North 30

4442, New Zealand. 31

Tel: +6463518183 32

Email: [email protected] 33

Stig U. Andersen 34

Address: Department of Molecular Biology and Genetics, Aarhus University, Gustav Wieds 35

Vej 10, 8000 Aarhus C, Denmark. 36

Tel: +4587154937 37

Email: [email protected] 38

39

Short Title: Allopolyploidy-facilitated niche expansion in White Clover 40

41

One-sentence Summary: Allopolyploid white clover arose in the last glaciation, retained 42

progenitor genome integrity, and the few genes switching between homoeologues across 43

tissues were enriched for flavonoid synthesis. 44

45

Material Distribution Footnote: The authors responsible for distribution of materials 46

integral to the findings presented in this article in accordance with the policy described in the 47

Plant Cell Advance Publication. Published on April 25, 2019, doi:10.1105/tpc.18.00606

©2019 American Society of Plant Biologists. All Rights Reserved

Page 2: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

2

Instructions for Authors (www.plantcell.org) are: Andrew G. Griffiths 48

([email protected]) and Stig Uggerhøj Andersen ([email protected]). 49

50

ABSTRACT 51

The merging of distinct genomes, allopolyploidization, is a widespread phenomenon in 52

plants. It generates adaptive potential through increased genetic diversity, but examples 53

demonstrating its exploitation remain scarce. White clover, Trifolium repens, is a ubiquitous 54

temperate allotetraploid forage crop derived from two European diploid progenitors confined 55

to extreme coastal or alpine habitats. We sequenced and assembled the genomes and 56

transcriptomes of this species complex to gain insight into the genesis of white clover and the 57

consequences of allopolyploidization. Based on these data, we estimate that white clover 58

originated ~15-28,000 years ago during the last glaciation when alpine and coastal 59

progenitors were likely co-located in glacial refugia. We found evidence of progenitor 60

diversity carry-over through multiple hybridization events and show that the progenitor 61

subgenomes have retained integrity and gene expression activity as they travelled within 62

white clover from their original confined habitats to a global presence. At the transcriptional 63

level, we observed remarkably stable subgenome expression ratios across tissues. Among the 64

few genes that show tissue-specific switching between homoeologous gene copies, we found 65

flavonoid biosynthesis genes strongly overrepresented, suggesting an adaptive role of some 66

allopolyploidy-associated transcriptional changes. Our results highlight white clover as an 67

example of allopolyploidy-facilitated niche expansion, where two progenitor genomes, 68

adapted and confined to disparate and highly specialized habitats, expanded to a ubiquitous 69

global presence following glaciation-associated allopolyploidization. 70

Keywords: polyploidy; Trifolium; genome; progenitor; subgenome; evolution; legume; plant; 71

flavonoid; homoeologous gene expression; diversity; hybridization timing 72

73

74

Page 3: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

3

INTRODUCTION 75

Polyploidy, where more than two genomes reside in a single nucleus, is widely distributed 76

among eukaryotes and is particularly prevalent in angiosperms (Otto and Whitton, 2000; 77

Mable, 2003; Wood et al., 2009), where it is considered a driver of speciation and 78

biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 79

interspecific hybridization followed by genome doubling or fusion of unreduced gametes 80

results in allopolyploidy, where divergent homoeologous subgenomes reside within the same 81

nucleus. These duplications can be recent (< 150 years) (Ownbey, 1950; Soltis et al., 2004; 82

Ainouche et al., 2009; Chester et al., 2012) or ancient, extending back to the origin of seed 83

plants 350 million years ago (Jia et al., 2013; Li et al., 2015). In many lineages, such 84

duplications are a recurring phenomenon (Soltis et al., 2016; Soltis and Soltis, 2016). Many 85

neopolyploids likely become extinct shortly after formation, and there is much debate over 86

the implications of polyploidization and whether it is an evolutionary jump-start or dead-end 87

(Otto, 2007; Madlung, 2013). Polyploidy events can confer enhanced fitness, phenotypic 88

plasticity and adaptability, and have been correlated with survival in stressful environments, 89

such as during the Cretaceous-Paleogene mass extinction (Gross et al., 2004; Leitch and 90

Leitch, 2008; Fawcett et al., 2009; Vanneste et al., 2014b; Vanneste et al., 2014a; Selmecki et 91

al., 2015). Consequences of genome duplication, particularly for allopolyploids, can include 92

rapid gene loss and rearrangement (McClintock, 1984; Rapp et al., 2009; Grover et al., 2012; 93

Tayalé and Parisod, 2013; Yoo et al., 2013; Garsmeur et al., 2014; Woodhouse et al., 2014) 94

through a range of mechanisms in which transposable element activity is implicated 95

(Woodhouse et al., 2014). Many studied allopolyploids show evidence of a genomic or 96

transcriptomic response to accommodating divergent genomes within a single nucleus, and 97

these processes can occur within very few generations (Chester et al., 2012; Grover et al., 98

2012; Tayalé and Parisod, 2013). 99

White clover (Trifolium repens L.) is an example of a relatively recent allopolyploid that 100

likely arose during the last major glaciations 13,000 to 130,000 years ago (Williams et al., 101

2012). A successful and highly variable perennial allotetraploid (AABB-type genome; 102

2n=4x=32) legume exhibiting disomic inheritance (Williams et al., 1998), it has a relatively 103

compact genome (1C=1093 Megabases (Mb)) (Bennett and Leitch, 2011) arranged in small 104

similar-sized chromosomes with few obvious features to distinguish homoeologues (Ansari et 105

al., 1999). Furthermore, its outbreeding nature leads to significant sequence polymorphism 106

and highly heterogeneous populations (Aasmo Finne et al., 2000; Zhang et al., 2010). White 107

clover has a widespread natural range encompassing grasslands of Europe, Western Asia and 108

Page 4: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

4

North Africa across latitudes and altitudes (Daday, 1958) (Figure 1). Due to this ability to 109

occupy a wide range of climatic conditions, white clover is used extensively as a companion 110

forage in moist temperate agriculture, thereby extending its range globally (Daday, 1958; 111

Abberton et al., 2006) (Figure 1). Extant relatives of the paternal and maternal progenitors of 112

white clover have been identified as T. occidentale Coombe (Western Clover) and T. 113

pallescens Schreb. (Pale Clover) (Ellison et al., 2006; Williams et al., 2012). In contrast to 114

white clover, these diploid (2n=2x=16) species have very limited and disparate extreme 115

habitats at the margins of, but not overlapping with, the range of the white clover. The 116

creeping T. occidentale (To) is confined to within ~100 m of the shore in a maritime niche on 117

the western coasts of Europe (Coombe, 1961), whereas the non-creeping T. pallescens (Tp) is 118

restricted to European alpine habitats at altitudes between 1800 and 2700 m (Raffl et al., 119

2008) (Figure 1). The allopolyploidization event giving rise to white clover is thought to be 120

relatively recent due to the lack of significant divergence of To genic sequence and the To-121

like homoeologues, and is hypothesized to have occurred during the last major glaciations 122

when environmental conditions brought alpine and maritime species in close proximity in 123

glacial refugia (Williams et al., 2012). In contrast to white clover, which occupies markedly 124

different environments to To and Tp, most allopolyploids with extant progenitors share 125

similar habitats with either progenitor (Soltis et al., 2016). 126

Here, we sequenced and analysed the full genomes and transcriptomes of white clover and 127

extant relatives of its diploid progenitors to gain insight into the timing, process and 128

consequences of the allopolyploidization event. 129

130

131

Page 5: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

5

RESULTS 132

Assembly of white clover and progenitor genomes 133

We sequenced inbred individuals of white clover (Tr) and extant relatives of its diploid 134

progenitors, To and Tp, using a range of Illumina sequencing libraries, including white clover 135

TruSeq Synthetic Long-Reads (TSLRs) (Supplemental Table 1). Extant progenitor genome 136

size estimates derived from the sequence data were similar (~530 Mb), together equalling 137

approximately the 1,174 Mb size estimate of the white clover genome (Table 1; Figure 1; 138

Supplemental Figure 1). To and Tp assemblies encompassed, respectively, 437 Mb 139

(N50=192 kilobases (kb)) and 382 Mb (N50=173 kb) of sequence, accounting for 82% and 140

72% of the predicted genome sizes (Figure 1, Table 1). The white clover assembly spanned 141

841 Mb (N50 = 122 kb), and approximated the sum of the assemblies of the extant 142

progenitors; suggesting that conflation of homoeologous sequences had been successfully 143

eliminated (Figure 1, Table 1). The Tp assembly was more fragmented than the other 144

species, which was expected due to the lower sequencing data input (Table 1, Supplemental 145

Table 1). 146

Scaffold ordering, pseudomolecule construction and assignment to subgenomes was guided 147

by pairwise linkage disequilibrium (LD), calculated as r2, between ~7,300 Genotyping-by-148

Sequencing (GBS) Single Nucleotide Polymorphism (SNP) markers on 3,364 scaffolds 149

(Figure 2A; Supplemental Figure 2), and alignment to the model forage legume Medicago 150

truncatula genome (Mt4.0) (Tang et al., 2014). Incorporation of single locus homoeologue-151

specific (SLHS) simple sequence repeat (SSR) markers (Supplemental Table 2) in the LD 152

analysis enabled alignment of pseudomolecules and homoeologues with a previous SSR-153

based genetic linkage map (Griffiths et al., 2013) (Supplemental Figure 2C-D). 154

The white clover pseudomolecules were used to order the assemblies for To and Tp, and the 155

resulting pseudomolecules were of similar size for the progenitors and their corresponding 156

white clover subgenomes (Figure 2B; Supplemental Table 3). As an assessment of genome 157

coverage in the assemblies, whole genome shotgun Illumina sequence reads (180 bp insert 158

size; Supplemental Table 1) representing 70×, 80× and 35× depth for To, Tp and white 159

clover, respectively, were mapped back to the corresponding assemblies, yielding a mapping 160

rate of ~95% (Supplemental Table 4). This indicates the assemblies encompass a high 161

proportion of their genomes. Alignment of the white clover To and Tp-derived subgenomes 162

(TrTo and TrTp, respectively) with the Mt4.0 genome (Tang et al., 2014) identified general 163

Page 6: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

6

macrosynteny with some major rearrangements relative to M. truncatula (Figure 2C), 164

consistent with previous results (Griffiths et al., 2013). 165

Gene annotation for all three species was enhanced by use of RNA-sequence assemblies 166

derived from paired-end reads of transcripts expressed in floral, leaf, root and stolon/shoot 167

tissue (Pooled dataset) from each species, with an average of 77 million reads per sample 168

(Supplemental Tables 4 and 5). Of the raw reads, ~88%, 96%, and 95% mapped to the To, 169

Tp and white clover genome sequences, respectively, indicating these assemblies encompass 170

most of the gene space. The high read coverage facilitated gene annotation, with most gene 171

models based directly on transcriptional evidence (Supplemental Table 5). The number of 172

protein-coding genes was similar for the progenitors and the number of protein-coding genes 173

in white clover was close to the sum of the contributing progenitors (Supplemental Table 6). 174

175

To and Tp are the progenitors of white clover 176

Identification of the extant relatives of white clover’s diploid progenitors (Figure 1; Figure 177

3A) had been based in part on similarities with discrete portions of the chloroplast genome as 178

well as Internal Transcribed Spacer (ITS) sequences of the Tp and To, respectively (Ellison et 179

al., 2006). Expanding this comparison with the complete chloroplast genomes of white clover 180

and its extant progenitors, the alignment with Tp revealed 99.5% identical bases and a mean 181

coverage of 100% (Supplemental Table 7), confirming Tp as the likely chloroplast donor 182

(Supplemental Figure 3). Furthering this analysis, we surveyed the data generated from 183

whole-genome resequencing (~49-fold coverage) of four outbred white clover individuals 184

from different populations (Supplemental Table 8). Mapping white clover reads from each 185

individual against the chloroplast genomes of white clover, To, and Tp identified high 186

similarity with Tp (mean = 54 SNP variants), whereas there were many differences relative to 187

the To chloroplast (mean = 563 SNP variants) (Supplemental Table 9). In summary, 188

sequencing of five white clover individuals (the S9 and four resequenced individuals) confirm 189

Tp as the likely source of the white clover chloroplast and, therefore, the maternal progenitor 190

of white clover. 191

The 18S-ITS1-5.8S-ITS2-26S rDNA region of the nuclear genome harbouring the ITS 192

sequences forms a gene cluster comprising many hundreds of tandem repeat arrays located in 193

the nucleolar organising region (NOR) (Álvarez and Wendel, 2003). Previous analysis of 194

PCR-amplified white clover ITS sequences had shown them to be either identical or contain a 195

single nucleotide polymorphism (SNP) variant to those of To, and only the To-NOR was 196

Page 7: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

7

detected (Ellison et al., 2006; Williams et al., 2012). In this study, we surveyed the 668,571 197

white clover TruSeq Synthetic Long-Reads (TSLRs) for ITS sequences and identified 774 198

full matches spanning the 623 base pair (bp) ITS1-5.8S rRNA-ITS2 region. Alignment with 199

extant progenitor ITS data identified 770 TSLRs containing SNPs diagnostic of To-derived 200

ITS and, additionally, four TSLRs diagnostic of Tp-derived ITS, supporting confirmation of 201

To and Tp as progenitors (Supplemental Figure 4). The prevalence of the To-ITS sequence 202

supports observations that the active NOR in white clover is To-derived (Williams et al., 203

2012), whereas detection of Tp-derived ITS is the first discovery of a remnant Tp-NOR. This 204

is further evidence that Tp contributed to the white clover nuclear genome in addition to the 205

chloroplast genome. The reduction in the Tp-NOR is consistent with nucleolar dominance 206

observed in allopolyploids whereby the rDNA locus of one progenitor is often silenced and 207

subsequently reduced or eliminated from the genome (Ansari et al., 2008; Kovarik et al., 208

2008). 209

Expanding comparisons beyond chloroplast and ITS evidence, we also assessed whole 210

genome similarities between extant progenitors and their corresponding white clover 211

subgenomes. We identified SNPs specific either to subgenome/progenitor pairs (TrTo = To ≠ 212

TrTp = Tp) or to each progenitor genome (To ≠ TrTo = TrTp = Tp; Tp ≠ TrTo= TrTp = To) using 213

HyLiTE (Duchemin et al., 2015), which determined genic SNP subgenome and genome 214

identity based on transcriptomic and genomic data from the three species mapped to To or Tp 215

reference gene models. We found ~8000 SNPs specific to each progenitor in transcribed 216

regions, whereas we identified ~50-fold more shared between each progenitor and its 217

corresponding subgenome (Supplemental Table 10). This indicates substantial conservation 218

of SNPs between the progenitor/subgenome pairs providing further evidence that To and Tp 219

are the likely progenitors. In conclusion, the combined ITS, chloroplast and broader genomic 220

evidence confirmed that white clover is derived from To and Tp progenitors with Tp the 221

likely maternal progenitor. 222

223

White clover originated ~15-28,000 years ago during the last glaciation 224

The analysis of genic SNPs specific to To and Tp using HyLiTE (Duchemin et al., 2015) 225

detected relatively few and similar numbers of genic SNPs (~8000) specific to each extant 226

progenitor, suggesting similar and possibly recent divergence times between each progenitor 227

and its corresponding white clover subgenome (Supplemental Table 10). We investigated 228

this further by estimating the timing of the white clover allopolyploidization event using an 229

Page 8: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

8

isolation model (Mailund et al., 2012) that analysed pairwise alignments of progenitor 230

genomes (To and Tp) and white clover subgenomes (TrTo and TrTp) (Figure 3A). Whole 231

pseudomolecule alignment of three genome pairs (To vs. Tp; To vs. TrTo; and Tp vs. TrTp) 232

produced 10-32k alignments with average lengths ranging from 7.2-8.0 kb for each 233

comparison (Supplemental Figures 5-7; Supplemental Table 11). As the model assumes 234

neutral polymorphic sites, we masked genes in the alignments to remove regions likely to be 235

under selection. The mutation rate for white clover is unknown, but a base substitution rate of 236

6.5×10-9

per generation has been determined in Arabidopsis thaliana (Ossowski et al., 2010), 237

and mutation rate appears to increase with genome size (Lynch, 2010). Based on these 238

observations, we estimated the mutation rates of white clover and its progenitors to range 239

between 1.1×10-8

and 1.8×10-8

per base per generation (Supplemental Figure 8). This aligns 240

with the allotetraploid Arachis hypogaea (peanut) estimated genome-wide mutation rate of 241

1.6×10-8

(Bertioli et al., 2016). As the divergence time scales with the mutation rate used, the 242

mutation rate directly influences divergence time estimates. Based on a single generation per 243

year, the Arabidopsis 6.5×10-9

mutation rate placed the white clover progenitors divergence 244

~530k years ago, with the progenitor-subgenome divergence 43-48k years ago 245

(Supplemental Table 11). However, when using the proxy clover mutation rates, the 246

estimated progenitor divergence was ~191-313k years ago, and the progenitors and 247

corresponding subgenomes divergence was 15-28k years ago (Figure 3B-C; Supplemental 248

Table 11). Assuming the timing of the allopolyploidization event equated to that of 249

progenitor genome divergence from the white clover subgenomes (Figure 3A), the assessed 250

mutation rates placed the allopolyploidization within the glaciation period that culminated in 251

the last European Glacial Maximum ~20k years ago (Yokoyama et al., 2000; Mangerud et al., 252

2004), whereas the proxy clover rates coincided with the Glacial Maximum itself (Figure 253

3C). This was a period when the alpine and coastal progenitors were likely to be in close 254

proximity in glacial refugia. 255

256

White clover arose from multiple hybridization events 257

There is a range of allopolyploidization scenarios (Ramsey and Schemske, 1998; Mason and 258

Pires, 2015) that may have given rise to white clover, resulting in different genomic diversity 259

signatures. Common examples rely on unreduced gamete production and may include 260

scenarios such as a single hybridization and chromosome doubling event 261

(A×B=AB→AABB), or an interspecific hybridization of unreduced gametes 262

Page 9: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

9

(AA×BB=AABB). If the genesis of white clover resulted from a single hybridization event, 263

this would be associated with a severe genetic bottleneck. We assessed this by analysing 264

whole-genome resequencing data (~49-fold coverage) from four outbred white clover 265

individuals from different populations (Supplemental Table 8). These genomes were then 266

evaluated with Pairwise Sequentially Markovian Coalescent (PSMC) models (Li and Durbin, 267

2011) using the mutation rates described above ranging from 6.5×10-9

to 1.8×10-8

per site per 268

generation. The PSMC model estimated current and past effective population sizes (Ne) based 269

on mutation rates of the four genomes and their subgenomes, which also infers Ne of the pre-270

allopolyploidization progenitor populations. For three of the four individuals, the model 271

suggested only a mild genetic bottleneck 2k to 15k generations ago when the Ne dropped to 272

~20k from a pre-bottleneck level of ~48k (Figure 3D; Supplemental Table 12). This was 273

followed by a rapid recent expansion that lagged the warming period after the last Glacial 274

Maximum (Mangerud et al., 2004) (Figure 3C-D). The fourth individual (27) had a larger 275

number of polymorphic sites than the other three, which could explain the difference in 276

estimated population history (Supplemental Table 8). In addition to the mutation rate, the 277

recombination rate also influences PSMC analysis, but we found the results stable over a 278

wide range of recombination settings (Supplemental Figure 9). The 20-fold increase in the 279

white clover Ne relative to the progenitor populations appears high and could be inflated as 280

the PSMC model has limited power to accurately determine Ne in the recent past (<20k years 281

ago) (Li and Durbin, 2011). The PSMC analysis did not identify a severe population 282

bottleneck coinciding with the hybridization event as would be expected if white clover arose 283

from two or few parent plants, suggesting that diversity may have been carried over to white 284

clover from its progenitors through multiple allopolyploidization events. These results were 285

corroborated using Multiple Sequentially Markovian Coalescent (MSMC) analysis (Schiffels 286

and Durbin, 2014), an extended version of PSMC where several individuals can be surveyed 287

simultaneously. Incorporating more individuals (or haplotypes) improves Ne estimation in 288

recent time periods (2k-30k years ago) (Schiffels and Durbin, 2014), complementing the 289

PSMC model. MSMC analysis identified a weak bottleneck, similar to the PSMC model, that 290

became non-existent leaving only a recent expansion in population size as resolution 291

improved with incremental addition of haplotypes (Figure 3E). As with the PSMC, the 292

MSMC results were consistent over a wide range of recombination and mutation settings 293

(Supplemental Figure 10). 294

These models, however, cannot reliably differentiate between a very severe bottleneck of 295

short duration and a less pronounced reduction of Ne over a longer period (Li and Durbin, 296

Page 10: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

10

2011; Schiffels and Durbin, 2014). To distinguish between these possibilities, we 297

characterized diversity in extant white clover populations using Genotyping-by-Sequencing 298

(GBS)-derived SNPs identified in 200 individuals from 20 different clover populations 299

(Supplemental Dataset 1). We found a high level of diversity with polymorphism detected 300

at 7.0% of all reliably sequenced sites with an average heterozygosity of 0.21 yielding an 301

overall genome wide nucleotide diversity of 0.015. This is similar to other outbreeding 302

species such as teosinte (0.012 (Chen et al., 2017)), Arabidopsis lyrata and A. halleri (0.024 303

and 0.020) (Novikova et al., 2016), and higher than selfing species such as A. thaliana 304

(0.0060) (Novikova et al., 2016) and Medicago truncatula (0.0043) (Branca et al., 2011). 305

This dataset allowed us to generate a detailed allele, or site, frequency spectrum (SFS) of the 306

current white clover population sample against which we could compare simulated SFS 307

produced under different demographic scenarios. 308

The most extreme bottleneck would result from a single hybridization and chromosome 309

doubling event (A×B=AB→AABB), producing a single homozygous white clover founder 310

individual with no polymorphic sites, from which all diversity would accumulate through 311

mutations. This scenario exhibited a very strong bias against common alleles coupled with a 312

low rate of polymorphism, and was inconsistent with the observed data (Figure 3F). A 313

second possibility is that two unreduced gametes, one from each of the heterozygous diploid 314

progenitors, fused to produce a single white clover hybrid (AA×BB=AABB), thereby 315

carrying variation over to white clover from the two founder progenitors. This second 316

scenario produced similar SFS to the unreduced gametes model, albeit with a less severe 317

skew compared to the observed data (Figure 3F). These results suggested that a population 318

bottleneck alone could not explain the observed data. Instead, we modelled combinations of 319

various types of population bottlenecks with recent expansion of the population size up to 320

twenty-fold the size of the pre-allopolyploidization effective population size (Supplemental 321

Figure 11). Of all scenarios tested, a recent three-fold expansion of the effective population 322

size in the absence of a population bottleneck came closest to matching the observed data 323

(Figure 3F; Supplemental Figure 11; Supplemental Table 12). Simulations also matched 324

the observed mutation density when the progenitor population size was increased assuming 325

the PSMC analysis had underestimated the progenitor Ne which at ~48k was lower than the 326

usual 100k-1000k range for nuclear genomes of multicellular species (Lynch, 2010) (Figure 327

3F; Supplemental Table 12). 328

White clover is thus unlikely to have experienced a severe genetic bottleneck at the time of 329

allopolyploidization during the most recent glaciation ~15-28k years ago, and must have 330

Page 11: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

11

arisen from multiple independently generated hybrids, which resulted in carryover of 331

progenitor diversity into the new species. In addition, our PSMC and MSMC analyses and 332

mutation accumulation simulation both indicated a rapid recent population expansion, which 333

is consistent with white clover quickly filling available niches as the climate warmed to 334

generate a larger effective population size and occupy a broader habitat range than its 335

progenitors had prior to allopolyploidization. 336

337

Progenitor genome integrity has been retained in white clover 338

The extant progenitors of white clover occupy restricted ranges in highly specialized niches 339

(Figure 1). If the progenitor genomes have remained intact, their coexistence within white 340

clover following allopolyploidization has enabled them to undergo global allopolyploidy-341

facilitated niche-expansion. To investigate subgenome integrity, we made further 342

comparisons between the progenitor genomes and their corresponding white clover 343

subgenome derivatives. At a whole-genome level, the sequence-derived estimated genome 344

size of white clover approximated the combined estimates of the contributing progenitors, 345

indicating that major chromosomal fragments had not been lost in white clover compared to 346

the extant progenitors (Table 1; Supplemental Figure 1). 347

Inter-homoeologous recombination could lead to rapid subgenome alterations, but white 348

clover is considered a functional diploid (Williams et al., 1998), and cytological evidence 349

suggests that homoeologous recombination events are very rare in white clover. To determine 350

if the genomic data supported these observations of subgenome independence, we assessed 351

the incidence of white clover Illumina TruSeq Synthetic Long Reads (TSLRs) containing 352

sequence from both progenitors. These long reads (total=668,571; average length=4,113 bp; 353

Supplemental Table 1) were unlikely to contain chimaeric artefacts as they were assembled 354

from single DNA fragments. We identified only five high confidence recombination events in 355

contigs derived from 38 TSLR reads (0.006% of the TSLR pool), where TSLRs contained 356

well-supported blocks arising from both progenitors (Supplemental Figure 12). The very 357

few signatures of homoeologous recombination in white clover agree with the lack of 358

observations of quadrivalent meiotic configurations in cytological studies (Ansari H. 359

(AgResearch), pers. comm.). In conclusion, there appears to have been very limited inter-360

homoeologous recombination in white clover. 361

Gene loss following polyploidization is common (Grover et al., 2012), and could also 362

compromise the integrity of the white clover subgenomes. To assess the level of gene loss, 363

Page 12: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

12

we mapped sequencing reads from white clover genomic DNA to the combined reference 364

sequences of the extant progenitor genomes, requiring a unique match for each read. Of the 365

~39k and ~36k protein-coding genes found in both progenitor genomes (Supplemental 366

Table 6), ~36k and ~35k were covered by uniquely mapping white clover reads in To and 367

Tp, respectively, indicating no more than 5% gene loss. 368

Gene silencing is also commonly associated with genome hybridization or duplication, and 369

often precedes gene loss. We identified 68,475 protein-coding genes in white clover with 370

>99% based on direct transcriptional evidence (Supplemental Table 5), which approximated 371

the sums of To and Tp protein-coding gene counts (Supplemental Table 6). This indicated 372

that the subgenomes had retained a similar transcriptionally active gene complement to each 373

other and to the corresponding extant progenitors, suggesting a limited level of 374

allopolyploidy-associated gene silencing in white clover. 375

In summary, the white clover subgenome sizes and transcriptionally active gene counts 376

reflect those of the contributing progenitors, indicating that the subgenomes retained in white 377

clover have maintained independence and integrity since allopolyploidization. 378

Stable subgenome expression ratios across tissues 379

Transcriptional consequences of allopolyploidization encompass homoeologue expression 380

bias and expression level dominance (genomic dominance), and have been well-documented 381

in many polyploids (Grover et al., 2012). To better understand these consequences in white 382

clover through characterization of subgenome transcription patterns, we first identified 383

progenitor-specific SNPs by mapping genomic and Pooled transcriptomic data 384

(Supplemental Table 13) from To and Tp onto To gene models using HyLiTE (Duchemin et 385

al., 2015). We then assigned subgenome identity to white clover RNA-sequence (RNA-seq) 386

reads based on these SNPs. A total of 720 million white clover RNA-seq reads derived from 387

flowers, leaves, stolons/stems, and roots were processed, and 42% were assigned to a specific 388

subgenome. Most of the remaining reads could not be assigned as they did not overlap 389

subgenome-specific SNPs (Supplemental Table 14; Supplemental Dataset 2). The reads 390

were assigned to a similar number of genes on each subgenome (TrTo: 36,181 genes; TrTp: 391

34,942 genes), indicating near equivalence in active gene count per subgenome under 392

glasshouse conditions (Supplemental Table 14). 393

Previous studies in other allopolyploids have examined differential expression of 394

homoeologues across tissues, but as only few genes were examined or average tissue biases 395

were calculated, it remains unclear how often the direction of bias changes between tissues in 396

Page 13: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

13

allopolyploids (Adams et al., 2003; Liu and Adams, 2007; Leach et al., 2014; Zhang et al., 397

2015b; Yang et al., 2016). For white clover, we found that 69% of the 19,954 genes, for 398

which expression ratios could be reliably determined, exhibited the same direction of 399

homoeologue expression bias across the tissues assessed (Figure 4A). An example of 400

consistent homoeologue expression ratio (TrTo/TrTp) between tissues is shown (Figure 4B), 401

and such homoeologue ratio stabilities persisted even when the correlation between total gene 402

expression levels (TrTo + TrTp) was low (Figure 4C). 403

404

Expression ratio outliers were enriched for flavonoid biosynthesis genes 405

The high frequency of white clover genes showing stable cross-tissue homoeologue 406

expression ratios suggested that this may represent the default state following 407

allopolyploidization. Genes deviating strongly from this pattern, could, therefore, represent 408

instances of adaptive gene expression control in white clover driven by selection. To 409

investigate this possibility, we selected genes with unusually large log(TrTo/TrTp) expression 410

ratio variance across tissues, identifying 1263 transcripts as expression ratio outliers 411

(Supplemental Dataset 3). 412

Based on gene annotations, the outliers appeared rich in enzyme-encoding genes, leading us 413

to focus on investigating biosynthetic pathways. We assigned enzyme categories (ECs) using 414

the MetaCyc database (Caspi et al., 2016) and found a striking enrichment for components of 415

the phenylpropanoid derivatives biosynthesis pathway (P=4.6×10-24

after Benjamini-416

Hochberg correction) and its subset, the flavonoid biosynthesis pathway (P=2.2×10-10

after 417

Benjamini-Hochberg correction), as well as an enrichment for biosynthesis of secondary 418

metabolites (P=1.9×10-15

) and plant cell structural components (P=1.9×10-10

) (Supplemental 419

Table 15). The initial experiment had been carried out using biological replicates pooled 420

prior to sequencing. To examine if the expression ratio outliers could be reproducibly 421

detected, we carried out two additional experiments with three replicates of four tissues. 422

These experiments were conducted using material from clones of the white clover S9 inbred 423

individual used for sequencing and its selfed progeny (S10), respectively, and we refer to 424

these three separate expression outlier detection experiments as “Pooled”, “S9” and “S10” 425

(Supplemental Tables 13 and 14). For the outliers in the S9 and S10 experiments, we found 426

enrichment of the same pathways (Supplemental Table 15). As indicated by the pathway 427

enrichment analysis, the 2,262 genes identified as expression ratio outliers showed a 428

pronounced overlap among the three experiments (Figure 4D). Repeating the pathway 429

Page 14: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

14

enrichment analysis with genes identified as expression ratio outliers in at least two out of the 430

three experiments again highlighted the same pathways (Supplemental Table 15). To 431

increase stringency and take advantage of the replicate information, we performed ANOVA 432

using all samples from the three experiments. In addition to a high mean expression variance 433

across tissues we required that the means be significantly different in the ANOVA after 434

Bonferroni correction. The ANOVA-filtered results again pointed to the same enriched 435

pathways (Supplemental Table 15), and the identified genes showed consistent expression 436

ratios within tissues across experiments and replicates, with particularly pronounced contrasts 437

between expression ratios in roots and mature leaves (Figure 4E-G). The clustered heat map 438

of the outlier expression ratios (Figure 4F) highlighted not only the contrast between root 439

and leaves, but also a temporal and transgenerational similarity within tissues among the 440

Pooled, S9 and S10 samples. The outlier genes included putative transcription factors, 441

receptor-like kinases and other enzymes, but the genes associated with the above-mentioned 442

pathways made up the only clearly overrepresented sets (Supplemental Dataset 3). 443

To assess whether the pathway enrichment could occur by chance, we generated random gene 444

sets meeting the same subgenome read assignment cut-off as the expression ratio outliers. We 445

found that the white clover genome was not generally enriched for genes involved in the 446

pathways associated with expression outliers, as only a single pathway (PWY-7214 baicalein 447

degradation, P < 1×10-3

) was enriched in one out of ten random gene sets (Supplemental 448

Table 15). Many of the identified flavonoid pathway enzymes appeared to cluster relatively 449

closely in the biosynthetic pathway downstream of the metabolite naringenin, which is a key 450

branch point for production of isoflavonoids, flavonols, anthocyanins and condensed tannins 451

(Franzmayr et al., 2012) (Figure 4H). 452

Of 909 outlier genes detected in the ANOVA analysis, 904 contained SNPs between the 453

differentially expressed TrTo and TrTp homoeologues that resulted in amino acid substitutions. 454

The outliers did not, however, show above-average divergence at the protein level 455

(Supplemental Figure 13). 456

457

Expression ratio outliers diverged mainly post-allopolyploidization 458

The deviations from stable subgenome expression ratios across tissues could be 459

allopolyploidy-associated or result from parental inheritance of expression patterns that had 460

already diverged between the progenitors. To distinguish between these possibilities, we 461

compared the subgenome-specific white clover expression data to progenitor expression data. 462

Page 15: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

15

For the Pooled and S10 experiments, we also conducted RNA-seq for the progenitors 463

(Supplemental Table 13), and analysed data from both these experiments. After 464

normalization of the expression data using TMM (Robinson et al., 2010), PCA analysis 465

clustered the four tissues and showed tight grouping of the white clover subgenomes for all 466

tissues, whereas the progenitors were more dispersed (Supplemental Figure 14A and E). If 467

the subgenome expression ratio deviations were derived from parental inheritance, the 468

outliers should also show extreme divergence in the inter-progenitor comparison. We first 469

tested this hypothesis by comparing Pearson correlation coefficients for logged expression 470

values across the four tissues in pairwise genome comparisons. As expected, we found the 471

outlier genes greatly skewed (p<10-27

, Kolmogorov-Smirnov test) towards lower correlation 472

coefficients for the inter-subgenome TrTo vs. TrTp comparison. We did not, however, observe 473

a similar skew for the inter-parental comparison To vs Tp, arguing against parental 474

divergence and inheritance being causal for the white clover expression ratio outliers 475

(Supplemental Figure 14B and F; Supplemental Dataset 4). 476

The correlation coefficient analysis captures the directions of change in expression levels 477

between tissues, but does not consider the magnitude of the expression differences between 478

the compared genomes. To examine these, we analysed deviance defined as the sum of the 479

squared differences between logged expression values in the pairwise genome comparisons. 480

Again, we found the largest difference between the outliers and all genes in the inter-481

subgenome comparison, with the outliers skewed towards higher deviance. We also observed 482

significant skews for all other comparisons, indicating that the outliers showed above average 483

expression deviance in the progenitors and between progenitors and subgenomes (p<10-76

, 484

Kolmogorov-Smirnov test) (Supplemental Figure 14C and G; Supplemental Dataset 4). 485

The expression outliers were selected based variance in log(TrTo/TrTp) ratios, making them 486

clearly distinct from the overall gene average (Supplemental Figure 14D and H). However, 487

we also detected a less pronounced, but statistically significant (p<10-93

, Kolmogorov-488

Smirnov test), skew towards higher variance for the inter-progenitor comparison 489

(Supplemental Figure 14D, Supplemental Dataset 4). The correlation, deviance and 490

variance analyses produced very similar results for the Pooled and S10 experiments 491

(Supplemental Figure 14). 492

Overall, the To vs Tp deviations appeared much less pronounced than the TrTo vs TrTp 493

deviations for the outlier gene set, suggesting a limited contribution of parental inheritance to 494

the outlier expression patterns. To quantify the effect, we classified genes as showing 495

parental expression inheritance when they displayed higher correlation coefficients and lower 496

Page 16: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

16

deviance for the subgenome-progenitor than for the inter-progenitor comparison(s) and 497

showed subgenome-progenitor correlations coefficients >0.8. Only 104 and 239 genes met 498

these criteria in the Pooled and S10 experiments, respectively, 22 and 43 of which were 499

expression ratio outliers (Supplemental Dataset 4). A total of 19,954 genes and 909 outliers 500

were included in the analysis, making the outliers significantly enriched in genes showing 501

parental expression inheritance (p<10-4

, Chi-squared test with Yates correction). However, 502

parental expression inheritance only explained a few percent of the outliers detected, 503

suggesting that outlier expression divergence occurred mainly post-allopolyploidization. 504

DISCUSSION 505

White clover (Trifolium repens L.) is a successful allotetraploid naturalized globally in moist 506

temperate grasslands and is an integral component of temperate pastoral agriculture. By 507

contrast, its diploid extant European progenitor relatives remain in extreme specialized 508

habitats with T. occidentale (To) restricted to high salinity coastal niches and T. pallescens 509

(Tp) to alpine screes. Based on whole-genome data for both progenitors and white clover, our 510

work has provided a clear example of allopolyploidization-enabled niche-expansion, which 511

has facilitated global radiation of the previously confined specialist progenitor genomes. This 512

occurred without compromising progenitor genome integrity in the past ~20,000 years since 513

the allopolyploidization event, which took place in the glaciation that culminated in the last 514

European Glacial Maximum. This was a period when alpine and coastal species were likely 515

in proximity in glacial refugia. Such extreme conditions are conducive to allopolyploid 516

formation, as the Arctic is one of the most polyploid-rich regions with post-deglaciation 517

colonization dominated by polyploids (Brochmann et al., 2004). Furthermore, temperature 518

extremes can increase production of unreduced male gametes (De Storme and Geelen, 2014), 519

a pathway to repeated allopolyploid formation (Mason and Pires, 2015). This process also 520

facilitates carry-over of diversity from the progenitors to the allopolyploid, as we observed 521

for white clover. 522

Genome rearrangements and loss of duplicate genes are common features of polyploids as 523

they progress towards a diploid-like genome via fractionation. This process can occur at 524

markedly different speeds depending on the polyploid system, and is preceded by changes in 525

homoeologue expression (Soltis et al., 2015; Soltis et al., 2016). In white clover, this has 526

occurred to a very limited degree, as the genomes and functional gene complement of its 527

progenitors have been retained with little evidence of large-scale inter-homoeologous 528

recombination or unbalanced homoeologue expression bias. As more polyploids are 529

Page 17: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

17

sequenced, it is clear there is a wide range of genomic responses to polyploidization events. 530

Some very recent polyploids, and polyploids with similar genesis times to white clover, 531

exhibit gene retention with little genome reduction and no evidence of biased fractionation or 532

major genetic changes (Spartina anglica 130 years ago) (Chelaifa et al., 2010; Ferreira de 533

Carvalho et al., 2013); Arachis hypogaea (peanut) 9.4 thousand years ago (kya) (Bertioli et 534

al., 2016); Triticum aestivum (hexaploid wheat) 9kya ((IWGSC). 2014); Coffea arabica 10-535

50kya (Cenci et al., 2012; Combes et al., 2015); and Capsella bursa-pastoris 100-300kya 536

(Douglas et al., 2015). Of these species, Coffea arabica (Bardil et al., 2011) and Spartina 537

anglica (Chelaifa et al., 2010) exhibit expression level or genomic dominance implicated in 538

the process of diploidization. The subgenomes of the allotetraploid cotton (Gossypium 539

hirsutum 1-2Mya)) have little gene loss but show evidence of asymmetric evolution (Zhang 540

et al., 2015b), as well as homoeologue expression bias in certain tissues or conditions (Yoo et 541

al., 2013; Zhang et al., 2015b). By contrast, some very recent (Tragapogon spp. 80ya 542

(Chester et al., 2012); synthetic wheat (Feldman et al., 1997), synthetic allotetraploid 543

Arabidopsis (Wang et al., 2004) and ancient polyploids (Brassica rapa 13-17Mya (Wang et 544

al., 2011; Cheng et al., 2012); Maize 5-12Mya (Schnable et al., 2011) have undergone 545

significant genome and homoeologue expression changes on the path to diploidization. It 546

may be that factors including progenitor relatedness and genome size similarity influence the 547

fate of the post-polyploid genomes. 548

With limited post-hybridization alterations to the subgenomes, white clover appears to have 549

resulted from a fortuitous combination of compatible progenitors, which allowed multiple 550

allopolyploidization events to occur in the contact zones of the progenitors, quickly 551

generating very diverse white clover populations. Furthermore, the repeated 552

allopolyploidization events suggest high compatibility between progenitor genomes. This 553

may explain why we found a stable background of subgenome expression ratios across 554

tissues on a per-homoeologous gene pair basis, which in turn allowed us to identify genes 555

with strongly aberrant expression patterns. Differential homoeologue expression across 556

tissues, during development and in response to stresses has been described in a number of 557

species, including cotton, wheat, and Tragopogon (Adams et al., 2003; Bottley et al., 2006; 558

Liu and Adams, 2007; Hovav et al., 2008; Buggs et al., 2010). It has also been investigated at 559

the whole-transcriptome level, where differential regulation was confirmed in coffee, 560

synthetic Brassica allotetraploids, cotton, wheat, and Brassica juncea (Combes et al., 2013; 561

Liu et al., 2015; Zhang et al., 2015a; Zhang et al., 2015b; Yang et al., 2016). Our analysis 562

strategy distinguishes itself from previous studies by first establishing a testable null 563

Page 18: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

18

hypothesis, in this case stable homoeologue expression ratios across tissues, which allows 564

quantitative identification of expression ratio outliers. This phenomenon appears to be a 565

temporal and transgenerational feature as evidenced by the observed overlap in these outlier 566

gene among the different experiments which comprised either the same individuals or 567

progeny sampled at different times. Coupled with pathway enrichment analysis of the 568

expression ratio outliers, we identified the phenylpropanoid and flavonoid biosynthesis 569

pathways as putative targets of selection in white clover. This overrepresentation of the 570

biosynthetic pathways among genes with deviating expression patterns argues against the 571

outliers resulting from rare stochastic events, and instead suggests that selection has driven 572

tissue-specific differential use of homoeologue gene copies. The selective pressure in 573

question could be internal, driven by incompatibilities between diverged homoeologous 574

proteins. However, we did not observe increased diversity at the protein level for the outliers, 575

arguing against this possibility. Taken together with our results indicating that expression 576

divergence occurred mainly post-allopolyploidization, the expression ratio outlier patterns 577

could represent examples of adaptive, allopolyploidy-associated transcriptional changes. 578

Such changes may be influenced by features such as the development of specialised tissue- or 579

homoeologue-specific transcription factors, or localised epigenetic characteristics including 580

differential tissue-specific methylation among homoeologues. 581

As more polyploids are explored in greater depth, other examples of allopolyploids exhibiting 582

stable homoeologue expression ratios are appearing. Arabidopsis kamchatica, an 583

allotetraploid, arose at a similar time (~20 kya) (Akama et al., 2014) to white clover. 584

Similarly, it has greater environmental plasticity than its extant progenitors, although with a 585

significant overlap in distribution (Hoffmann, 2005). Interestingly, it exhibits stable 586

homoeologous expression ratios for the majority of homoeologous pairs, with a very small 587

proportion (~1%) altering expression ratios in response to abiotic stress (Akama et al., 2014; 588

Paape et al., 2016). In contrast to white clover, however, it appears that A. kamchatica has 589

combined the transcriptional patterns of both parental species (Paape et al., 2016). Another 590

recent polyploid also exhibits homoeologue expression features comparable to white clover. 591

In allohexaploid wheat, ~72% homoeologous gene triads showed stable expression ratios 592

across 15 tissues (Ramirez-Gonzalez et al., 2018), remarkably similar to the ~70% we 593

observed in white clover. Furthermore, of the homoeologue triads with variable expression 594

ratios across tissues, the majority did not reflect expression ratios of the wheat progenitor 595

species, indicating possible allopolyploidy-associated transcriptional change (Ramirez-596

Gonzalez et al., 2018). These genes were enriched for defence and external stimuli responses 597

Page 19: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

19

involved in fitness (Ramirez-Gonzalez et al., 2018), with a possible role in adaptation for this 598

domesticated species. 599

In white clover, the overrepresentation of the phenylpropanoid, flavonoid and secondary 600

metabolite pathways suggests that secondary metabolites may be under particularly strong 601

selective pressure. Indeed, flavonoids are widely distributed in plants and 9,000 different 602

chemical structures have been characterized (Williams and Grayer, 2004). The pervasive 603

nature and extreme diversity of flavonoids, along with variation at the population level (Cao 604

et al., 2017), argues strongly for their role in plant adaptation, where they have been shown to 605

facilitate tolerance to a wide range of biotic and abiotic stresses including protection against 606

reactive oxygen species, pathogenic microbes and insect predation, as well as in signalling 607

and development (Mierziak et al., 2014; Mouradov and Spangenberg, 2014; Davies et al., 608

2018). It may be that white clover exploits homoeologous isozymes, an option available only 609

to allopolyploids, to fine-tune complex mixes of flavonoids and other secondary metabolites 610

across tissues, for instance to balance responses to symbionts and pathogens in roots or to 611

optimize herbivory deterrence and antifungal defence in leaves through differential regulation 612

of naringenin and its derivatives (Mierziak et al., 2014; Davies et al., 2018) (Figure 4H). 613

It will require further investigation to determine if allopolyploidy-associated transcriptional 614

reprogramming of flavonoid biosynthesis genes contributed to the evolutionary success of 615

white clover. The white clover species complex described here, with well-defined 616

progenitors, limited inter-homoeologue recombination, limited subgenome gene loss, and 617

clearly defined natural ranges, will facilitate such studies, and provides a powerful platform 618

for understanding how allopolyploidy can underpin adaptation and range expansion. As a 619

first step, the current study has highlighted white clover as an example of allopolyploidy-620

facilitated niche-expansion, where two diverged genomes were reunited by a changing 621

climate and colonized the globe. 622

623

Page 20: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

20

MATERIALS AND METHODS 624

Plant Material. Trifolium repens L. (white clover): An inbred allopolyploid S9 individual 625

was selected for sequencing from the ninth sequential selfed generation derived by single 626

seed descent from a self-fertile white clover cv. ‘Crau’ derivative (Cousins and Woodfield, 627

2006). T. occidentale Coombe (western clover): an individual derived by controlled selfing 628

from an individual from a wild population (OCD16, Margot Forde Germplasm Centre, New 629

Zealand). T. pallescens Schreb. (pale clover): an individual from a wild population (AZ1895, 630

Margot Forde Germplasm Centre, New Zealand). For linkage disequilibrium mapping, a 631

population (PGen3×PGen9) of 93 F1 full-sib progeny was developed from a hand-pollinated 632

reciprocal pair cross of heterozygous individuals, PGen3 and PGen9. All plants were grown 633

under glasshouse conditions prior to DNA extraction. 634

DNA extraction and sequencing: For the three Trifolium species in this study, nuclear DNA 635

was extracted from young leaves as described (Anderson et al., 2018) and sequenced by 636

Macrogen (Seoul, South Korea). Using a range of platforms (Roche 454-GLX; Illumina 637

HiSeq 2000), libraries (shotgun, paired-end; mate-paired) and insert sizes (180 – 8000 bp), 638

sequence data were generated to an approximate coverage of 492×, 114×, and 230× for T. 639

occidentale, T. pallescens, and T. repens, respectively (Supplemental Table 1). Furthermore, 640

high molecular weight DNA was extracted (Anderson et al., 2018) to create six white clover 641

libraries of Illumina TruSeq Synthetic Long-Reads (TSLRs) generated by Illumina FastTrack 642

Sequencing Services Long Reads pipeline to provide a 3× coverage (Supplemental Table 1). 643

Original sequence data are deposited in NCBI under Bioprojects PRJNA523044, 644

PRJNA521254, and PRJNA523043 for white clover, T. occidentale, and T. pallescens, 645

respectively, as detailed in Accession Numbers (below). 646

RNA extraction and sequencing: Transcriptome analyses of the S9 T. repens individual and 647

extant progenitors T. occidentale and T. pallescens were derived from Illumina RNA-seq data 648

generated from four different tissue samples: mature fully expanded leaf (third node from 649

stolon tip); tap root (T. pallescens) nodal root or nodal root (T. repens and T. occidentale); 650

young opened flowers from the middle whorls of the inflorescence; internodal shoot (T. 651

pallescens) or stolon (T. repens and T. occidentale). The tissues were sampled in 2013 as 652

three biological replicates harvested at a single time point from plants grown in glasshouse 653

conditions, and frozen immediately in liquid nitrogen. Total RNA was isolated from non-654

floral tissue using an Isolate II RNA Plant Kit (Bioline, London, UK). Due to the high 655

phenolic content of white clover flowers, floral RNA was extracted as described (Jamalnasir 656

Page 21: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

21

et al., 2013) with the modifications including replacement of chloroform:isoamyl alcohol 657

(24:1, v/v) with pure chloroform, followed by an additional purification step. The final 658

precipitated RNA was resuspended in Isolate II RNA Plant Kit (Bioline, London, UK) 659

extraction buffer prior to binding, washing and eluting from an Isolate II RNA Plant Kit 660

column (Bioline, London, UK). Prior to sequencing, RNA from the biological replicates for 661

each tissue were pooled, transported on dry-ice and sequenced as 500 bp paired-end libraries 662

at Macrogen (Seoul, South Korea) using the Illumina HiSeq 2000 2×100 bp paired-end 663

chemistry (Supplemental Table 13). Reads were filtered using Kraken (Wood and Salzberg, 664

2014) to eliminate viral RNA, and filtered for low quality using SolexaQA (Cox et al., 2010). 665

These data were used for gene annotation and homoeologue expression analysis and is 666

referred to as the “Pooled” data. 667

To further explore stable homoeologue expression across tissues of the T. repens S9 668

individual, additional RNA-seq data were generated at Novogene (Hong Kong) from each of 669

three cloned plants providing unpooled biological replicates per tissue sample (mature fully 670

expanded leaf; root (nodal/tap); stolon/shoot); and young open flowers from the middle 671

whorls of the inflorescence). Tissues were harvested simultaneously in 2017 from each of 672

three cloned plants grown in glasshouse conditions and snap frozen in liquid nitrogen and 673

extracted as described above. RNA was transported frozen on dry-ice to Novogene (Hong 674

Kong) and 250-300 bp insert libraries were sequenced using the Illumina NovoSeq 6000 675

2×150 bp paired-end chemistry (Supplemental Table 13). Reads were filtered for quality 676

and to remove viral RNA as described above. This RNA-seq dataset was used for replicated 677

homoeologue expression analysis and referred to as “S9”. 678

To further investigate conservation of homoeologue expression among extant progenitors and 679

white clover plants of the same age, RNA-seq data were generated from tissue 680

(emerging/young leaf; mature fully expanded leaf; stolon/shoot; root (nodal/tap)) from each 681

of three cloned plants (biological replicates) of an individual selfed progeny each of the 682

sequenced T. occidentale (selfed OCD16), T. pallescens (selfed AZ1895), and T. repens (S10) 683

These plants, co-located in the same glasshouse, were harvested simultaneously in 2018 at 684

the same time of day (early afternoon) as previous tissue collections to minimise effects of 685

diurnal variation on gene expression. RNA was extracted, sequenced and filtered as described 686

above, and the RNA-seq dataset referred to as “S10”. Original sequence data are deposited in 687

NCBI under Bioprojects PRJNA523044, PRJNA521254, and PRJNA523043 for white 688

clover, T. occidentale, and T. pallescens, respectively, as detailed in Accession Numbers 689

(below). 690

Page 22: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

22

K-mer analysis for estimating genome size. Total and frequency of k-mers (n=17) was 691

counted in unassembled Illumina 180 bp insert paired-end sequence data for white clover and 692

its progenitors using Jellyfish v2.2.0 (Marçais and Kingsford, 2011) default parameters. The 693

data were plotted to determine maximum read depth for each specific 17-mer, and genome 694

size was estimated as (total 17-mer number)/(peak depth). 695

Genome Assembly: Raw sequence data for all three Trifolium species underwent quality 696

control with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), was then 697

trimmed using the DynamicTrim program from the SolexaQA package (Cox et al., 2010), 698

and in the case of mate-pair libraries, the high level of adapter contamination was reduced 699

using Trimmomatic (Bolger et al., 2014). 700

Assembly of T. pallescens was accomplished exclusively with the use of ALLPATHS-LG 701

(Gnerre et al., 2011) with default parameters. A low level of bacterial contamination was 702

found in the DNA, and these were filtered with the use of a simple in-house script using 703

Bowtie 2 (Langmead and Salzberg, 2012) for raw read mapping. The dataset underwent a 704

further error correction step by re-mapping pair-end data and searching for non-repeats with 705

uneven read distribution. 706

Assembly of T. occidentale was separated in two stages. In the first stage, an assembly was 707

produced using ALLPATHS-LG (Gnerre et al., 2011) with default parameters. All mate-pair 708

libraries were used, but only the single 180 bp pair-end library was incorporated into the 709

ALLPATHS-LG assembly. The other pair-end libraries were assembled separately using 710

Velvet (Zerbino and Birney, 2008). The raw reads were first subjected to digital 711

normalization using the khmer package (http://ged.msu.edu/papers/2012-diginorm/), using a 712

k-mer size of 31, a maximum coverage of 50, and a minimum coverage of 3. The filtered data 713

was assembled with Velvet using the following parameters: k-mer size of 55; expected 714

coverage of 25; coverage cutoff of 7.5; maximum coverage of 45; maximum divergence of 715

0.05; minimum scaffolding pair count of 7.0. The resulting contigs were compared to the 716

ALLPATHS-LG assembly, and used to extend and resolve gaps where long, high-quality 717

overlaps were found. A final round of gap filling using all pair-end data was done using an 718

in-house script (https://github.com/Lanilen/SemHelpers). This resulted in a longer assembly 719

with fewer ambiguities when compared to that of the similar sized genome of T. pallescens. 720

Assembly of T. repens using ALLPATHS-LG yielded unsatisfactory results, with a reduced 721

assembly of 600 Mb displaying a high degree of homoeologue conflation. Neither 722

ALLPATHS-LG nor MaSuRCA (Zimin et al., 2013) assemblers seemed capable of utilising 723

Page 23: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

23

the Illumina TruSeq Synthetic Long Reads (TSLRs) (both running out of memory while 724

executing in a 64-core, 1 Tb RAM server), thus the final assembly was done using Velvet 725

(Zerbino and Birney, 2008). 726

All pair-end libraries were subjected to khmer digital normalization to a maximum coverage 727

of 50×, minimum coverage of 3×, and using a k-mer size of 31, independent of each other. 728

The resulting filtered libraries were assembled together with all Illumina TSLRs with Velvet 729

using the following parameters: k-mer size of 47;automatic expected coverage; minimum 730

coverage of 10; fixed insert length of 350 for longer pair-end libraries and 0 for 180 bp pair-731

end library; maximum branch length of 500; maximum divergence of 0.05; minimum contig 732

length of 1,000 (so as to match the filter used by ALLPATHS-LG and make assembly 733

comparisons more meaningful); and turning the conserveLong flag on. 734

The resulting assembly was scaffolded using the mate-pair libraries with the use of SSPACE-735

Basic 2.0 (Boetzer et al., 2011). Illumina TSLRs were split into 1,000 bp fragments, with a 736

generous 750 bp overlap, and added to the mate-pair library collection to take advantage of 737

the gap-filling option of SSPACE. 738

Once the assembly was completed, the Illumina TSLRs were used one last time for an error-739

correction step using in-house scripts (https://github.com/Lanilen/SemHelpers), by aligning 740

them against the assembly and looking for small indels in the alignment. When a gap or an 741

insertion was well supported by TSLRs, the reference genome was corrected with the 742

consensus from the TSLRs. 743

Pseudomolecule construction: White clover scaffolds were assigned to subgenomes and 744

ordered into pseudomolecules using pairwise linkage disequilibrium (LD), calculated as r2, 745

between microsatellite (SSR) markers and Genotyping-by-Sequencing (GBS)-derived 746

biallelic single nucleotide polymorphisms (SNPs) segregating in F1 progeny (n = 93) of a 747

white clover biparental cross (PGen3×PGen9). To enable linkage group identification and 748

distinguish homoeologues in this population, and to align the sequence assembly 749

pseudomolecules with previously published white clover linkage groups, DNA extracted 750

from the parent and progeny plants (Anderson et al., 2018) was screened with 16 single-locus 751

homoeologue-specific (SLHS) SSR markers (Supplemental Table 2) as described (Griffiths 752

et al., 2013). 753

A denser marker set for LD mapping was then generated by preparing GBS libraries from 754

each parent and 93 progeny as described (Elshire et al., 2011). Briefly, 100 ng of sample 755

DNA was digested with ApeKI, followed by ligation of 99 ng barcoded and reverse adapters, 756

Page 24: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

24

then pooled and PCR amplified. The pooled libraries were sequenced on two lanes of an 757

Illumina HiSeq 2500, and samples demultiplexed using GBSX (Herten et al., 2015) with one 758

mismatch allowed in the enzyme and barcode. Each sample was mapped back to the 759

reference genome individually using Geneious 8 (Kearse et al., 2012) using default ‘Fast’ 760

parameters and discarding non-unique hits. SNPs were called from the resulting BAM files 761

using ‘ref_map’ from STACKS (Catchen et al., 2013) with a minimum depth of three reads 762

required to form a locus, with one mismatch allowed per locus. The Populations pipeline 763

from STACKS was used to consolidate individuals and generate SNPs across all the samples. 764

The resulting SNPs were filtered as follows. Individual heterozygous genotype calls where 765

the ratio of reads of one allele relative to the other exceeded 9:1 were reset to homozygous 766

for the most numerous allele read. Overall, approximately 1% of heterozygous genotypes 767

were reset to homozygous. SNPs were discarded that had a minor allele frequency (MAF) 768

less than 10%, or at least 50% missing genotype calls, or more than 75% heterozygous 769

genotypes. The remaining SNPs were grouped according to parental genotype, provided that 770

the read depth for both parents was 10 or more. Specifically, a SNP was classified as a 771

PGen3Het SNP if parent PGen3 was heterozygous and parent PGen9 was homozygous; or a 772

PGen9Het SNP if PGen3 was homozygous and PGen9 was heterozygous. These SNPs were 773

discarded if the frequency of the GBS homozygous major genotype was either less than 0.25, 774

or less than twice the frequency of the GBS homozygous minor genotype, as they may have 775

been misclassified. After filtering, the number of SNPs was 22,663 comprising 10,824 776

PGen3Het and 11,839 PGen9Het SNPs. 777

LD analysis was performed using the PGen3Het and PGen9Het SNPs based on a pseudo-778

backcross approach where LD was only considered between SNPs with the same parental 779

genotype configuration and along haplotypes inherited from the heterozygous parent. 780

Estimates of pairwise r2 values were computed using GUS-LD (Bilton et al., 2018), which 781

accounts for uncertainty in GBS genotypes associated with low read depth, where the major 782

allele frequencies were fixed at 0.5 and 𝜀 = 0. These LD estimates can be expressed in terms 783

of recombination fraction, therefore equating the LD analysis to a two-point linkage analysis 784

that accounts for read depth. 785

Linkage groups (LGs) representing parental chromosomes comprising either the PGen3Het 786

or PGen9Het parental SNP data were assembled using the following procedure. For SNPs 787

from GBS sequence tags mapped to the white clover assembly, an initial LG was formed by 788

selecting a set of SNPs from the largest group of SNPs in LD based on a visual inspection of 789

the matrix of r2 values. These LGs were assigned to the assembly pseudomolecules according 790

Page 25: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

25

to where the SNP GBS tags mapped. The remaining SNPs aligned to a pseudomolecule were 791

added to the corresponding LG if the mean of the largest thirty r2 values between the 792

unmapped SNP and the LG SNPs was greater than 0.4. Each of the remaining unmapped 793

SNPs were then assigned to an LG based on the greatest mean of the ten largest r2 values 794

between the unmapped SNP and LG SNPs, provided the mean was greater than 0.6. To 795

ensure SNP placement in the LG was unequivocal, SNPs remained unassigned if the second 796

largest r2 mean was to another LG and was greater than 66% of the highest r

2 mean. After 797

assignment, only seven mapped SNPs were removed due to high LD estimates with SNPs in 798

other LGs. 799

Two additional filtering steps were performed on the SNPs mapped to an LG. Firstly, to 800

ensure only a single SNP was selected to represent a GBS sequence tag, SNPs were mapped 801

to the reference genome and placed into bins if separated by less than 180 base pairs. From 802

each bin, only the SNP with the highest mean r2 value across the LG was retained for further 803

analysis. The second filtering step removed SNPs that mapped to multiple loci in the genome. 804

Briefly, SNPs were mapped to a version of the reference genome in which positions that had 805

sufficient similarity that could result in tags being mapped to multiple locations were masked. 806

SNPs were only retained that mapped to the same place in both the masked and unmasked 807

genomes. After filtering, parental LGs comprised 3,492 and 3,785 SNPs for parents PGen3 808

and PGen9, respectively. 809

The single-locus homoeologue-specific SSR data described above were also incorporated to 810

align LGs with a previous genetic linkage map (Griffiths et al., 2013). Pairwise r2 values 811

between the SSRs and the mapped SNPs were estimated where the read depth for the SSR 812

genotype calls were set to infinite. Each SSR was then assigned to the LG group where there 813

were pairwise r2 values greater than 0.5. Depending on the parental configuration, some of 814

the SSRs were mapped in both parental LGs. 815

Ordering the SNPs and SSRs in each LG was achieved using the Sorting Points into 816

Neighbourhoods (SPIN) algorithm (Tsafrir et al., 2005) within the R package “seriation” 817

(Hahsler et al., 2016). Adjustments to the implementation of the neighbourhood algorithm 818

within R were made which consisted of solving the linear assignment problem using the 819

Hungarian algorithm and running the algorithm using multiple starting orders to ensure the 820

global maximum was obtained. The standard deviation parameter for the weighting matrix 821

was set to 15 for all LGs except two where the parameter was set at 10. Orientation of the 822

ordered LGs was determined by SNP alignment to the reference genome and was confirmed 823

by SSR position relative to the genetic linkage map (Griffiths et al., 2013). The order of the 824

Page 26: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

26

SSR alleles among the 3,492 and 3,785 SNPs in the PGen3 and PGen9 parental maps, 825

respectively, show conservation of position within the LGs relative to previous linkage maps 826

(Supplemental Figure 2). 827

The GBS tags harbouring the ordered SNPs from the biparental population were mapped to 828

the white clover S9 sequence data and used to group and order an anchor set of 3,364 of the 829

22,100 scaffolds. This anchor set of pseudomolecules was assigned to the appropriate TrTo or 830

TrTp subgenome by BLAST alignment with the progenitor assemblies. To incorporate 831

remaining scaffolds not linked to the anchor set by LD, the anchor scaffolds were BLAST-832

aligned to the model forage legume Medicago truncatula genome (Mt4.0) (Tang et al., 2014) 833

which has general macrosynteny with some major rearrangements relative to white clover 834

(Griffiths et al., 2013). The non-linked scaffolds were assigned a pseudomolecule and order 835

relative to the anchor scaffolds based on proximity on the Mt4.0 genome. A small number 836

(4%; Supplemental Table 3) of white clover scaffolds had no strong alignment to Mt4.0, and 837

were assigned to Chromosome 0. In summary, the white clover scaffolds were ordered based 838

on LD of mapped GBS SNPs, assigned to subgenomes, and then supplemented with non-LD-839

linked scaffolds by alignment to the M. truncatula genome. The progenitor pseudomolecules 840

were assembled based on synteny to white clover. 841

Gene annotation: RNA-seq reads were mapped to three respective genomes using Bowtie 842

v2.1.0 (Langmead and Salzberg, 2012) / TopHat v2.0.9 (Kim et al., 2013) with a maximum 843

intron size of 30 kb. Genome-guided transcripts were then reconstructed using Cufflinks 844

v2.1.1 (Trapnell et al., 2012) and those transcripts were used as primary source for final set of 845

gene models (Sanggaard et al., 2014) (Supplemental Table 5). Gene annotation was 846

performed using a custom pipeline, which combined gene model evidence from experimental 847

data and ab initio methods. Gene model evidence was obtained using five parallel 848

approaches, 1) From Cufflinks as described in previous section, 2) Evidence from the 849

mapping of the de novo assembled transcripts using GMAP v20/07/12 algorithm (Wu and 850

Watanabe, 2005), 3) From AUGUSTUS v2.6.1 (Stanke et al., 2006), 4) GeneMark –E 851

(Lomsadze et al., 2005) and 5) GlimmerHMM v3.0.1 (Majoros et al., 2004) (Supplemental 852

Table 5). The five layers of evidence were merged to create a non-redundant gene model set 853

using a hierarchical filtering approach as described (Sanggaard et al., 2014). Cufflinks gene 854

models were given highest priority, followed by de novo assembled transcripts (GMAP). 855

Three ab initio predictors had lower priority in the following order: AUGUSTUS, GeneMark 856

and Glimmer. Glimmer appears to have been superseded by other prediction programs and 857

Page 27: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

27

contributed only very few gene models. AUGUSTUS was trained using protein-coding 858

transcripts (>1000 nt) from Cufflinks. GeneMark uses ~10 Mbp of genome to fine-tune gene 859

model prediction parameters while GlimmerHMM was trained with an Arabidopsis gene set. 860

The final set of gene models were then divided into “Protein coding” and “Unclassified”. 861

Gene models were labeled as “Protein coding” if gene model had a coding sequence length 862

(CDS) larger than 300 or had a functional annotation based on the homology against non-863

redundant (nr) plant database. The “Unclassified” category is the remainder of transcripts 864

which did not fit the “Protein coding” category. 865

The annotation was further improved using the BRAKER2 pipeline tool (Hoff et al., 2017), 866

which is based on AUGUSTUS. The RNA transcription data were mapped using Spliced 867

Transcripts Alignment to a Reference (STAR) (Dobin et al., 2013). The references for the 868

white clover and the progenitors were soft-masked using RepeatMasker (Smit et al., 2015). 869

The BRAKER2 pipeline produced an abundance of genes, due to the AUGUSTUS 870

predictions. Many of these additional genes occurred in regions where we had no RNA-seq 871

mapping. Using the gene models from the BRAKER2 annotation, we calculated confidence 872

scores (0 to 1) based on the amount of overlap in exons found in the previous annotation. The 873

final curated gene set with confidence scores greater than 0.5 did not reduce Benchmarking 874

Universal Single-Copy Orthologs (BUSCO v3.0.2) (Simão et al., 2015) scores after filtering. 875

BUSCO was run against the embryophyta odb9 database downloaded on 28 August 2017. 876

Any genes not supported by the previous annotation were removed. Functional annotation 877

was done by blasting against all protein plant sequences from the NCBI database. The most 878

likely match which was not labelled as uncharacterised, hypothetical and predicted proteins 879

were chosen. 880

ITS and chloroplast analysis: ITS sequences (ITS 1 and 2 together with the 5.8S rRNA) 881

from T. pallescens (GenBank DQ312111) and T. occidentale (GenBank AF053168) were 882

used to search the collection of white clover Illumina TruSeq Synthetic Long Reads (TSLRs) 883

to identify diagnostic SNPs. The T. occidentale chloroplast sequence (GenBank KJ788289), 884

along with T. pallescens and T. repens chloroplast sequences (current work) were aligned 885

using MAUVE (Darling et al., 2004), using the ProgressiveMauve algorithm from Geneious 886

7 (Kearse et al., 2012), and LASTZ (Harris, 2007), using default parameters with inclusion of 887

the --chain parameter and a hit window of 10,000. A fourth chloroplast genome from T. 888

hybridum (GenBank KJ788286.1) was added to serve as an outgroup for comparison. 889

Page 28: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

28

Expanding the chloroplast analysis to encompass more white clover individuals, four 890

resequenced clover individuals (described below), were mapped on to the chloroplast 891

sequences using BWA (Li and Durbin, 2011). Variants were called using the mapped reads 892

using the Genome Analysis Tool Kit (GATK) (DePristo et al., 2011) for all four individuals 893

combined and separately. The fewer the SNP variants identified with each chloroplast 894

comparison indicated greater similarity, hence the most likely chloroplast sequence the 895

individual white clover had inherited. 896

Recombination analysis: The assembled, unscaffolded white clover contigs were aligned to 897

the assembled genomes of the progenitors using BLAST to identify contigs where the top 898

high-scoring segment pairs (HSPs) alternated between To and Tp along the white clover 899

contig. Subsequently, all mapping TSLRs were compared against the reference genomes of T. 900

occidentale and T. pallescens using LASTZ (Harris, 2007), and the resulting alignments were 901

filtered to find individual TSLRs having high scoring matches against alternate ancestors on 902

each end of the read. 903

Whole-genome divergence estimation: Whole-chromosome alignments were generated 904

using LASTZ v1.02.00 (Harris, 2007) with the following settings: --hspthresh=12500 --905

strand=both --format=maf --ambiguous=iupac --chain. The resulting MAF alignments were 906

then masked at genic positions, to focus on neutral polymorphic sites, and used as input for 907

the IMCoalHMM isolation model software (Mailund et al., 2012) run with the multichain 908

Monte-Carlo mode on sets of 100 alignments for each of which the median split time was 909

recorded. 910

White clover resequencing, PSMC and MSMC analysis: Whole-genome resequencing 911

was performed using full Illumina sequencing (150 bp paired-end libraries) to ~49× coverage 912

for four divergent individuals selected from among 20 cultivars (Supplemental Table 8). 913

Each individual was mapped using BWA (Li and Durbin, 2011) and the number of total reads 914

and mapped reads can be found in Supplemental Table 8. Reads not properly paired were 915

removed. Diploid FASTA files were generated were generated using samtools mpileup, 916

bcftools and vcftools. Pairwise sequentially Markovian coalescent (PSMC) analysis (Li and 917

Durbin, 2011) was conducted using the diploid fasta files for each individual, and the results 918

of the separate PSMCs were combined and then plotted. Additional PSMC analyses were 919

performed to assess the influence of different recombination rates. With a mutation rate of 1.8 920

x 10-8

the following recombination rates were used: (4+25*2+4+6) in which the first 921

Page 29: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

29

parameter spans 4 intervals, the next 25 parameters span 2 time intervals, the second to last 922

parameter spans 4 intervals and last parameter spans 6 intervals (Li and Durbin, 2011); or 923

(4+30*2+4+6+10) equivalent to a higher recombination rate in which the first parameter 924

spans 4 intervals, the next 30 parameters span 2 time intervals, the third to last parameter 925

spans 4 intervals, the second to last spans 6 intervals, and last spans 10 intervals 926

(Nadachowska-Brzyska et al., 2016). All scripts and settings can be found in the PSMC 927

folder at (https://github.com/MarniTausen/CloverAnalysisPipeline). Multiple sequentially 928

Markovian coalescent (MSMC) analysis (Schiffels and Durbin, 2014) was performed on the 929

same individuals as PSMC. MSMC in comparison to PSMC, allows the use of multiple 930

individuals (haplotypes) which increases resolution of population estimation in more recent 931

timeframes (2k - 30k years ago). MSMC was run with 8 haplotypes (4 individuals), 6 932

haplotypes, 4 haplotypes and 2 haplotypes. In the case of two haplotypes MSMC is similar to 933

PSMC, referred to as PSMC’. For each individual all of the pre-processing was run per 934

chromosome, as described on the MSMC manual 935

(https://github.com/stschiff/msmc/blob/master/guide.md). The final results of the MSMC 936

analysis contained unscaled time and effective population sizes, these were then scaled using 937

a generation time of 1 and the mutation rate of 1.8×10-8

. To compare the MSMC analyses 938

with various recombination rate:mutation rate ratios compared to the default ratio (0.25), we 939

assessed MSMC using increased and reduced ratios (0.35 and 0.15, respectively). 940

GBS sequencing and diversity profiling: A panel of 200 white clover genotypes was 941

sampled from 20 populations (Supplemental Dataset 1) and genotyped using genotyping by 942

sequencing (GBS) (Elshire et al., 2011). Briefly, 100 ng of DNA isolated using a CTAB 943

method was digested with ApeKI, ligated to modified Illumina adaptors, pooled to create 944

libraries, each consisting of up to 88 individually bar-coded DNA samples, then PCR-945

amplified as described (Elshire et al., 2011). Libraries were assessed for quality on an Agilent 946

DNA 1000 Assay (Agilent Technologies, Deutschland GmbH) prior to sequencing each 947

library on multiple lanes of an Illumina Hi-Seq flow cell to generate single-end sequences of 948

101 bp. Post-sequencing QC included removal of adaptor contamination using Scythe 949

(https://github.com/vsbuffalo/scythe) with a prior contamination rate set to 0.10. Sickle 950

(https://github.com/najoshi/sickle) was used to trim reads when the average quality score in a 951

sliding window (of 20 bp) fell below a phred (Ewing and Green, 1998) score of 20. At this 952

point reads shorter than 40 bp were also discarded. The reads were demultiplexed using sabre 953

(https://github.com/najoshi/sabre), and all reads originating from the sample were 954

Page 30: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

30

concatenated. BWA (Li and Durbin, 2011) was used to align the reads against the white 955

clover reference and generate an alignment file in SAM format. Alignments were sorted by 956

coordinate with picard tools (http://broadinstitute.github.io/picard), and the Genome Analysis 957

Tool Kit (GATK) (DePristo et al., 2011) was used to generate a list of putative INDELs 958

(RealignerTargetCreator), perform local realignment around these putative INDELs 959

(IndelRealigner), and identify variants and call genotypes (UnifiedGenotyper). Only biallelic 960

variant sites with a mean mapping quality of 30 were retained. The SNP density was 961

calculated as the number of variants divided by the number of callable sites in bins of 100 kb. 962

All scripts and commands used in the analysis are available on Github in the GBS and GBS 963

visualization folders (https://github.com/MarniTausen/CloverAnalysisPipeline). 964

Simulation of mutation accumulation: Simulations were performed using msprime 965

(Kelleher et al., 2016) using mutation rates based on Arabidopsis (6.5×10-9

(Ossowski et al., 966

2010)) or genome size (1.1×10-8

(progenitors), 1.8×10-8

(white clover) (Lynch, 2010); 967

Supplemental Figure 8), a sample size of 400 (alleles), and a recombination rate 1×10-7

. 968

Multiple demographic models were run testing different hypotheses. The general structure 969

was a fast bottleneck, modeling the start of the population with a rapid increase to full 970

population size. For the Hybridizing and Doubling hypothesis, the population initiated with a 971

single hybrid individual and assuming it attains the same effective population size (Ne) as its 972

progenitors (120,000), the population achieves this in 20 generations of exponential growth. 973

The hypotheses of unreduced gametes and mild bottlenecks used the same model, assuming a 974

constant population size and introducing a bottleneck followed by 20 generations with rapid 975

growth. Each of the simulations produced a VCF file, where the SNP density was estimated 976

in the same manner as the GBS data was analysed. Site frequency spectra were also produced 977

and used to compare the simulated to the observed data. The scripts used for the simulations 978

and the precise settings can be found at 979

(https://github.com/MarniTausen/CloverAnalysisPipeline). 980

981

Homoeologue-specific transcriptional profiling: Raw genomic DNA and RNAseq reads 982

for all tissues were assessed for quality with SolexaQA++ version 3.14 (Cox et al., 2010). 983

The largest contiguous sections of reads with phred (Ewing and Green, 1998) quality scores 984

>13 were retained and reads with fewer than 25 bp discarded. 985

We applied the Hybrid Lineage Transcriptome Explorer (HyLiTE v1.6.2) (Duchemin et al., 986

2015) to obtain read count matrices for differential expression analysis. As input to HyLiTE, 987

Page 31: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

31

we provided Sequence Alignment/Map (SAM) (Li et al., 2009) files. These SAM files were 988

generated by mapping of mRNA reads both paired and unpaired with Bowtie 2 (Langmead et 989

al., 2009; Langmead and Salzberg, 2012) against a set of 36,638 T. occidentale reference 990

gene models. For each tissue type a separate set of SAM files was created for each species of 991

clover. To create a combined analysis, these separate SAM files were merged across all 992

tissues. For increasing coverage for SNP calling, genomic DNA sequence data were mapped 993

under the same conditions to the reference gene models. From a pileup file generated with 994

SAMtools (Li et al., 2009), HyLiTE then identified polymorphisms indicative of parental 995

origin for RNAseq reads and classified reads to parental origin in the hybrid species. The 996

resulting output from HyLITE comprises tables with read counts for parent species to gene 997

models (counts for orthologues) and read counts of unambiguously assignable reads for the 998

hybrid for each homoeologue. The scripts and the workflow for running HyLiTE are 999

available at (https://github.com/MarniTausen/CloverAnalysisPipeline#hylite-pipeline). 1000

Analysis of subgenome expression ratio outliers: Expression ratios were first normalized 1001

across all progenitor and white clover tissue samples, using TMM (Robinson et al., 2010), 1002

which we have previously shown is well suited for correcting for compositional biases in 1003

cross-tissue comparisons (Munch et al., 2018). For white clover, the sum of the reads 1004

assigned to subgenomes was normalized, and the normalized sum was then redistributed on 1005

the white clover subgenomes, maintaining the original subgenome ratio, and then multiplied 1006

by two to maintain normalization with respect to the progenitor samples. PCA analysis was 1007

carried out using the prcomp R function and results were plotted using ggplot2. Pearson 1008

correlation coefficients, deviance and variance were calculated using standard R (version 1009

3.4.3) functions. To identify outliers, a subset of 19,954 genes were used, which showed an 1010

average of more than 1 TPM per sample after TMM normalization and for which at least 40% 1011

of the reads were informative for subgenome assignment in the HyLiTE analysis. The 1012

variance in loge(TrTo/TrTp) subgenome expression across tissues was calculated. Expression 1013

ratio outliers were identified for each of the three experiments (Pooled, S9, S10) as the genes 1014

with a loge(TrTo/TrTp) variance across tissues larger than 1.5 standard deviations above the 1015

mean. ANOVA analysis comparing groups of root, leaf, and stolon samples from all 1016

experiments, flower samples from Pooled and S9 experiments, and young leaf samples from 1017

the S10 experiment, was carried out using R version 3.4.3. Ten control gene sets were 1018

generated by random sampling from the set of 19,954 genes that fulfilled all read count and 1019

assignment requirements. All scripts used for the analysis can be found in Supplemental 1020

Page 32: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

32

Dataset 3. Enzyme category (EC) numbers were assigned using the Pathway Tools 1021

(http://bioinformatics.ai.sri.com/ptools/) with the Metacyc database (v.2.2.6) by searching 1022

with the acquired Metacyc gene IDs. Pathway enrichment was evaluated for matches to the 1023

MetaCyc database with e-value less than 10e-10 using the “Genes enriched for pathways” 1024

function of the MetaCyc website (http://metacyc.org/), with the enrichment setting, Fisher’s 1025

exact test and Benjamini-Hochberg correction for multiple testing. 1026

The amount of diversity of the outlier genes between subgenomes was estimated using the 1027

SNPs called through the HyLiTE pipeline. The gene sequences were “mutated” according to 1028

the SNPs and translated into protein, if the amino acid change had occurred the sequence 1029

before mutating and after did not match. 1030

Accession Numbers: Original sequence data from this article can be found in the 1031

EMBL/Genbank data libraries with the following accession numbers which are also detailed 1032

in Supplemental Tables 16-18. 1033

White clover (Bioproject PRJNA523044): 1034

Genome sequencing: SRR8670776, SRR8670777, SRR8670778, SRR8670779, 1035

SRR8670780, SRR8670781, SRR8670782, SRR8670783, SRR8670784, SRR8670785, 1036

SRR8670786, SRR8670787, SRR8670788, SRR8670789, SRR8670790, SRR8670791, 1037

SRR8670792, SRR8670793, SRR8670794, SRR8691041 1038

RNA-seq Pooled: SRR8691037, SRR8691038, SRR8691039, SRR8691040 1039

RNA-seq S9: SRR8691836, SRR8691837, SRR8691838, SRR8691839, SRR8691840, 1040

SRR8691841, SRR8691842, SRR8691843, SRR8691844, SRR8691845, SRR8691846, 1041

SRR8691847 1042

RNA-seq S10: SRR8693957, SRR8693958, SRR8693959, SRR8693960, SRR8693961, 1043

SRR8693962, SRR8693963, SRR8693964, SRR8693965, SRR8693966, SRR8693967, 1044

SRR8693968 1045

1046

Trifolium occidentale (Bioproject PRJNA521254): 1047

Page 33: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

33

Genome sequence: SRR8593467, SRR8593468, SRR8593469, SRR8593470, SRR8593471, 1048

SRR8593472, SRR8593473, SRR8593474, SRR8593475. 1049

RNA-seq Pooled: SRR8691453, SRR8692350, SRR8692351, SRR8692352. 1050

RNA-seq S10: SRR8692338, SRR8692339, SRR8692340, SRR8692341, SRR8692342, 1051

SRR8692343, SRR8692344, SRR8692345, SRR8692346, SRR8692347, SRR8692348, 1052

SRR8692349. 1053

1054

Trifolium pallescens (Bioproject PRJNA523043): 1055

Genome sequence: SRR8617465, SRR8617466, SRR8617467. 1056

RNA-seq Pooled: SRR8675752, SRR8675753, SRR8675754, SRR8675755. 1057

RNA-seq S10: SRR8692353, SRR8692354, SRR8692355, SRR8692356, SRR8692357, 1058

SRR8692358, SRR8692359, SRR8692360, SRR8692361, SRR8692362, SRR8692363, 1059

SRR8692364. 1060

1061

SUPPLEMENTAL DATA 1062

Supplemental Figure 1: K-mer based genome size estimates. 1063

Supplemental Figure 2: White clover assembly ordering and alignment to a genetic linkage 1064

map. 1065

Supplemental Figure 3: Chloroplast genome alignments of white clover and extant 1066

progenitors. 1067

Supplemental Figure 4:. Internal Transcribed Spacer 2 alignment for Trifolium occidentale, T. 1068

pallescens compared with Illumina TruSeq Synthetic Long Read sequences from white 1069

clover. 1070

Supplemental Figure 5: Trifolium occidentale (To) vs T. pallescens (Tp) chromosome 1071

alignment. 1072

Page 34: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

34

Supplemental Figure 6: Trifolium occidentale (To) vs white clover TrTo subgenome 1073

chromosome alignment. 1074

Supplemental Figure 7: Trifolium pallescens (Tp) vs white clover TrTp subgenome 1075

chromosome alignment. 1076

Supplemental Figure 8: Estimation of mutation rates. 1077

Supplemental Figure 9: Pairwise Sequentially Markovian Coalescent (PSMC) analysis of 1078

four sequenced white clover individuals using different recombination rates. 1079

Supplemental Figure 10: Multiple Sequential Markovian Coalescent (MSMC) analysis of 1080

four sequenced white clover individuals using different settings. 1081

Supplemental Figure 11: Site frequency spectra and SNP density simulations of white clover 1082

with different effective population (Ne) size expansions. 1083

Supplemental Figure 12: Putative inter-homoeologous recombination breakpoints identified 1084

in white clover. 1085

Supplemental Figure 13: Protein level divergence among progenitors and white clover 1086

subgenomes for gene expression ratio outliers. 1087

Supplemental Figure 14: Comparison of extant progenitor and white clover expression 1088

patterns. 1089

Supplemental Figures 1-14 1090

Supplemental Tables 1-18 1091

Supplemental Table 1: Sequencing library statistics. 1092

Supplemental Table 2: Mapped single locus, homoeologue-specific microsatellite markers. 1093

Supplemental Table 3: Pseudomolecule statistics. 1094

Supplemental Table 4: RNA-seq data used in gene annotation and genomic 180 bp library 1095

statistics. 1096

Supplemental Table 5: Gene annotation sources. 1097

Page 35: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

35

Supplemental Table 6: Gene annotation summary. 1098

Supplemental Table 7: Chloroplast alignment summary statistics. 1099

Supplemental Table 8: Read statistics for re-sequenced white clover individuals. 1100

Supplemental Table 9: Chloroplast mapping summary statistics for the four fully sequenced 1101

white clover individuals. 1102

Supplemental Table 10: Private progenitor and subgenome diagnostic SNPs. 1103

Supplemental Table 11: Clover genome alignment statistics and estimated divergence times. 1104

Supplemental Table 12: PSMC analysis and mutation accumulation simulation using a range 1105

of mutation rates. 1106

Supplemental Table 13: Summary of RNA-seq libraries from the Pooled, S9 and S10 1107

experiments. 1108

Supplemental Table 14: HyLiTE RNA-seq read assignment summary. 1109

Supplemental Table 15: Pathways enriched for outlier genes. 1110

Supplemental Table 16: T. occidentale accession numbers 1111

Supplemental Table 17: T. pallescens accession numbers 1112

Supplemental Table 18: T. repens accession numbers 1113

1114

Supplemental Dataset 1: GBS summary for white clover individuals 1115

Supplemental Dataset 2: HyLiTE RNA-seq read counts 1116

Supplemental Dataset 3: Expression ratio outliers 1117

Supplemental Dataset 4: R scripts used for expression analysis 1118

1119

Page 36: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

36

ACKNOWLEDGEMENTS 1120

We thank Greig Cousins for the white clover S8 inbred cv. ‘Crau’ derivative, and Cindy 1121

Lawley and Karl Sluis of Illumina Inc for early access to the TruSeq Synthetic Long-Read 1122

sequencing technology. The research was funded by: Pastoral Genomics, a joint venture co-1123

funded by DairyNZ, Beef+Lamb New Zealand, Dairy Australia, AgResearch Ltd, New 1124

Zealand Agriseeds Ltd, Grasslands Innovation Ltd, DEEResearch, and the Ministry of 1125

Business, Innovation and Employment (New Zealand); Contract C10X1306 'Genomics for 1126

Production & Security in a Biological Economy'-Ministry of Business, Innovation and 1127

Employment (New Zealand); grant no. 4105-00007A from Innovation Fund Denmark 1128

(S.U.A.); and grant no. 10-081677 from The Danish Council for Independent Research | 1129

Technology and Production Sciences (S.U.A.). 1130

1131

AUTHORS’ CONTRIBUTIONS 1132

A.G.G. initiated the white clover genome sequencing project, co-developed the sequencing 1133

strategy, and selected the germplasm for sequencing. R.M. co-developed the sequencing 1134

strategy, performed genome assembly, synteny, and progenitor validation analyses. M.T. 1135

performed analysis of GBS, resequencing data and carried out PSMC and MSNC analysis, 1136

site frequency spectrum simulations, gene annotation and HyLiTE analysis. V.G. performed 1137

gene annotation. R.M. and V.G. performed subgenome recombination analysis. T.P.B. 1138

carried out LD analysis and prepared the genetic map. M.A.C. performed HyLiTE analysis of 1139

subgenome specific expression. R.A. performed GBS SNP discovery for the white clover 1140

biparental population. I.N. prepared GBS libraries for 200 clover accessions, extracted DNA 1141

for re-sequencing, and analysed data. A.K. co-developed the sequencing strategy. A.L. 1142

optimised RNA extraction methodology and prepared RNA for sequencing. C.A. and B.F 1143

produced the white clover S9 germplasm, extracted DNA for genome sequencing and from 1144

the biparental population, and carried out SSR genotyping. K.H. prepared RNA for 1145

sequencing. A.S. developed the white clover biparental population. N.W.E assembled the 1146

chloroplast genomes. M.P.C. supervised HyLiTE analysis. T.A. designed the GBS protocol 1147

used for the 200 accession diversity panel and analysed data. T.M. prepared scripts for 1148

divergence time analysis. M.H.S. conceived and supervised mutation accumulation 1149

simulations. S.U.A. performed divergence time and expression ratio outlier analysis. A.G.G. 1150

Page 37: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

37

and S.U.A. designed and supervised the study, interpreted results and wrote the manuscript 1151

with input from all authors. 1152

1153

References 1154

(IWGSC)., I.W.G.S.C. (2014). A chromosome-based draft sequence of the hexaploid bread 1155

wheat (Triticum aestivum) genome. Science 345, 1251788. 1156

Aasmo Finne, M., Rognli, O.A., and Schjelderup, I. (2000). Genetic variation in a 1157

Norwegian germplasm collection of white clover (Trifolium repens L.). Euphytica 1158

112, 45-56. 1159

Abberton, M.T., Fothergill, M., Collins, R.P., and Marshall, A.H. (2006). Breeding forage 1160

legumes for sustainable and profitable farming systems. Asp. Appl. Biol. 80, 81-87. 1161

Adams, K.L., Cronn, R., Percifield, R., and Wendel, J.F. (2003). Genes duplicated by 1162

polyploidy show unequal contributions to the transcriptome and organ-specific 1163

reciprocal silencing. Proc. Natl. Acad. Sci. USA 100, 4649-4654. 1164

Ainouche, M.L., Fortune, P.M., Salmon, A., Parisod, C., Grandbastien, M.-A., 1165

Fukunaga, K., Ricou, M., and Misset, M.-T. (2009). Hybridization, polyploidy and 1166

invasion: lessons from Spartina (Poaceae). Biol. Invasions 11, 1159-1173. 1167

Akama, S., Shimizu-Inatsugi, R., Shimizu, K.K., and Sese, J. (2014). Genome-wide 1168

quantification of homeolog expression ratio revealed nonstochastic gene regulation in 1169

synthetic allopolyploid Arabidopsis. Nucleic Acids Res 42, e46. 1170

Álvarez, I., and Wendel, J.F. (2003). Ribosomal ITS sequences and plant phylogenetic 1171

inference. Mol. Phylogen. Evol. 29, 417-434. 1172

Anderson, C.B., Franzmayr, B.K., Hong, S.W., Larking, A.C., van Stijn, T.C., Tan, R., 1173

Moraga, R., Faville, M.J., and Griffiths, A.G. (2018). Protocol: a versatile, 1174

inexpensive, high-throughput plant genomic DNA extraction method suitable for 1175

genotyping-by-sequencing. Plant Methods 14, 75. 1176

Ansari, H.A., Ellison, N.W., and Williams, W.M. (2008). Molecular and cytogenetic 1177

evidence for an allotetraploid origin of Trifolium dubium (Leguminosae). 1178

Chromosoma 117, 159-167. 1179

Ansari, H.A., Ellison, N.W., Reader, S.M., Badaeva, E.D., Friebe, B., Miller, T.E., and 1180

Williams, W.M. (1999). Molecular cytogenetic organization of 5S and 18S-26S 1181

rDNA loci in white clover (Trifolium repens L.) and related species. Ann. Bot. 83, 1182

199-206. 1183

Bardil, A., de Almeida, J.D., Combes, M.C., Lashermes, P., and Bertrand, B. (2011). 1184

Genomic expression dominance in the natural allopolyploid Coffea arabica is 1185

massively affected by growth temperature. New Phytol. 192, 760-774. 1186

Bennett, M.D., and Leitch, I.J. (2011). Nuclear DNA amounts in angiosperms: targets, 1187

trends and tomorrow. Ann. Bot. 107, 467-590. 1188

Bertioli, D.J., Cannon, S.B., Froenicke, L., Huang, G., Farmer, A.D., Cannon, E.K., Liu, 1189

X., Gao, D., Clevenger, J., Dash, S., Ren, L., Moretzsohn, M.C., Shirasawa, K., 1190

Huang, W., Vidigal, B., Abernathy, B., Chu, Y., Niederhuth, C.E., Umale, P., 1191

Araujo, A.C., Kozik, A., Kim, K.D., Burow, M.D., Varshney, R.K., Wang, X., 1192

Zhang, X., Barkley, N., Guimaraes, P.M., Isobe, S., Guo, B., Liao, B., Stalker, 1193

H.T., Schmitz, R.J., Scheffler, B.E., Leal-Bertioli, S.C., Xun, X., Jackson, S.A., 1194

Page 38: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

38

Michelmore, R., and Ozias-Akins, P. (2016). The genome sequences of Arachis 1195

duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat. 1196

Genet. 48, 438-446. 1197

Bilton, T.P., McEwan, J.C., Clarke, S.M., Brauning, R., van Stijn, T.C., Rowe, S.J., and 1198

Dodds, K.G. (2018). Linkage Disequilibrium Estimation in Low Coverage High-1199

Throughput Sequencing Data. Genetics 209, 389-400. 1200

Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., and Pirovano, W. (2011). Scaffolding 1201

pre-assembled contigs using SSPACE. Bioinformatics 27, 578-579. 1202

Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for 1203

Illumina sequence data. Bioinformatics 30, 2114-2120. 1204

Bottley, A., Xia, G.M., and Koebner, R.M.D. (2006). Homoeologous gene silencing in 1205

hexaploid wheat. Plant J. 47, 897-906. 1206

Branca, A., Paape, T.D., Zhou, P., Briskine, R., Farmer, A.D., Mudge, J., Bharti, A.K., 1207

Woodward, J.E., May, G.D., Gentzbittel, L., Ben, C., Denny, R., Sadowsky, M.J., 1208

Ronfort, J., Bataillon, T., Young, N.D., and Tiffin, P. (2011). Whole-genome 1209

nucleotide diversity, recombination, and linkage disequilibrium in the model legume 1210

Medicago truncatula. Proc. Natl. Acad. Sci. USA 108, E864-870. 1211

Brochmann, C., Brysting, A.K., Alsos, I.G., Borgen, L., Grundt, H.H., Scheen, A.C., and 1212

Elven, R. (2004). Polyploidy in arctic plants. Biol. J. Linn. Soc. 82, 521-536. 1213

Buggs, R.J.A., Elliott, N.M., Zhang, L., Koh, J., Viccini, L.F., Soltis, D.E., and Soltis, 1214

P.S. (2010). Tissue-specific silencing of homoeologs in natural populations of the 1215

recent allopolyploid Tragopogon mirus. New Phytol. 186, 175-183. 1216

Cao, M., Fraser, K., Jones, C., Stewart, A., Lyons, T., Faville, M., and Barrett, B. 1217

(2017). Untargeted metabotyping Lolium perenne reveals population-level variation 1218

in plant flavonoids and alkaloids. Front. Plant Sci. 8, 133. 1219

Caspi, R., Billington, R., Ferrer, L., Foerster, H., Fulcher, C.A., Keseler, I.M., Kothari, 1220

A., Krummenacker, M., Latendresse, M., Mueller, L.A., Ong, Q., Paley, S., 1221

Subhraveti, P., Weaver, D.S., and Karp, P.D. (2016). The MetaCyc database of 1222

metabolic pathways and enzymes and the BioCyc collection of pathway/genome 1223

databases. Nucleic Acids Res. 44, 471-480. 1224

Catchen, J., Hohenlohe, P.A., Bassham, S., Amores, A., and Cresko, W.A. (2013). 1225

Stacks: an analysis tool set for population genomics. Mol. Ecol. 22, 3124-3140. 1226

Cenci, A., Combes, M.C., and Lashermes, P. (2012). Genome evolution in diploid and 1227

tetraploid Coffea species as revealed by comparative analysis of orthologous genome 1228

segments. Plant Mol. Biol. 78, 135-145. 1229

Chelaifa, H., Monnier, A., and Ainouche, M. (2010). Transcriptomic changes following 1230

recent natural hybridization and allopolyploidy in the salt marsh species Spartina x 1231

townsendii and Spartina anglica (Poaceae). New Phytol. 186, 161-174. 1232

Chen, J., Glémin, S., and Lascoux, M. (2017). Genetic diversity and the efficacy of 1233

purifying selection across plant and animal species. Mol. Biol. Evol. 34, 1417-1428. 1234

Cheng, F., Wu, J., Fang, L., Sun, S., Liu, B., Lin, K., Bonnema, G., and Wang, X. (2012). 1235

Biased gene fractionation and dominant gene expression among the subgenomes of 1236

Brassica rapa. PloS one 7, e36442. 1237

Chester, M., Gallagher, J.P., Symonds, V.V., Cruz da Silva, A.V., Mavrodiev, E.V., 1238

Leitch, A.R., Soltis, P.S., and Soltis, D.E. (2012). Extensive chromosomal variation 1239

in a recently formed natural allopolyploid species, Tragopogon miscellus 1240

(Asteraceae). Proc. Natl. Acad. Sci. USA 109, 1176-1181. 1241

Combes, M.-C., Dereeper, A., Severac, D., Bertrand, B., and Lashermes, P. (2013). 1242

Contribution of subgenomes to the transcriptome and their intertwined regulation in 1243

Page 39: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

39

the allopolyploid Coffea arabica grown at contrasted temperatures. New Phytol. 200, 1244

251-260. 1245

Combes, M.C., Hueber, Y., Dereeper, A., Rialle, S., Herrera, J.C., and Lashermes, P. 1246

(2015). Regulatory divergence between parental alleles determines gene expression 1247

patterns in hybrids. Genome biology and evolution 7, 1110-1121. 1248

Coombe, D. (1961). Trifolium occidentale, a new species related to T. repens L. Watsonia 5, 1249

68-87. 1250

Cousins, G., and Woodfield, D.R. (2006). Effect of inbreeding on growth of white clover. In 1251

13th Australasian Plant Breeding Conference, C.F. Mercer, ed (Christchurch, New 1252

Zealand: New Zealand Grassland Association), pp. 568-572. 1253

Cox, M.P., Peterson, D.A., and Biggs, P.J. (2010). SolexaQA: At-a-glance quality 1254

assessment of Illumina second-generation sequencing data. BMC Bioinformatics 11, 1255

485. 1256

Daday, H. (1958). Gene frequencies in wild populations of Trifolium repens L. III. World 1257

distribution. Heredity 12, 169-184. 1258

Darling, A.C.E., Mau, B., Blattner, F.R., and Perna, N.T. (2004). Mauve: multiple 1259

alignment of conserved genomic sequence with rearrangements. Genome Res. 14, 1260

1394-1403. 1261

Davies, K.M., Albert, N.W., Zhou, Y., and Schwinn, K.E. (2018). Functions of flavonoid 1262

and betalain pigments in abiotic stress tolerance in plants. Annual Plant Reviews 1263

online 1, 1-41. 1264

De Storme, N., and Geelen, D. (2014). The impact of environmental stress on male 1265

reproductive development in plants: biological processes and molecular mechanisms. 1266

Plant, Cell Environ. 37, 1-18. 1267

DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., 1268

Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., McKenna, A., Fennell, 1269

T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, 1270

D., and Daly, M.J. (2011). A framework for variation discovery and genotyping 1271

using next-generation DNA sequencing data. Nat. Genet. 43, 491-498. 1272

Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., 1273

Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq 1274

aligner. Bioinformatics 29, 15-21. 1275

Douglas, G.M., Gos, G., Steige, K.A., Salcedo, A., Holm, K., Josephs, E.B., Arunkumar, 1276

R., Agren, J.A., Hazzouri, K.M., Wang, W., Platts, A.E., Williamson, R.J., 1277

Neuffer, B., Lascoux, M., Slotte, T., and Wright, S.I. (2015). Hybrid origins and 1278

the earliest stages of diploidization in the highly successful recent polyploid Capsella 1279

bursa-pastoris. Proc. Natl. Acad. Sci. USA 112, 2806-2811. 1280

Duchemin, W., Dupont, P.Y., Campbell, M.A., Ganley, A.R., and Cox, M.P. (2015). 1281

HyLiTE: accurate and flexible analysis of gene expression in hybrid and allopolyploid 1282

species. BMC Bioinformatics 16, 8. 1283

Ellison, N.W., Liston, A., Steiner, J.J., Williams, W.M., and Taylor, N.L. (2006). 1284

Molecular phylogenetics of the clover genus (Trifolium - Leguminosae). Mol. 1285

Phylogen. Evol. 39, 688-705. 1286

Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., and 1287

Mitchell, S.E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach 1288

for high diversity species. PloS one 6, e19379. 1289

Ewing, B., and Green, P. (1998). Base-calling of automated sequencer traces using phred. 1290

II. Error probabilities. Genome Res. 8, 186-194. 1291

Page 40: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

40

Fawcett, J.A., Maere, S., and Van de Peer, Y. (2009). Plants with double genomes might 1292

have had a better chance to survive the Cretaceous–Tertiary extinction event. Proc. 1293

Natl. Acad. Sci. USA 106, 5737-5742. 1294

Feldman, M., Liu, B., Segal, G., Abbo, S., Levy, A.A., and Vega, J.M. (1997). Rapid 1295

elimination of low-copy DNA sequences in polyploid wheat: a possible mechanism 1296

for differentiation of homoeologous chromosomes. Genetics 147, 1381-1387. 1297

Ferreira de Carvalho, J., Poulain, J., Da Silva, C., Wincker, P., Michon-Coudouel, S., 1298

Dheilly, A., Naquin, D., Boutte, J., Salmon, A., and Ainouche, M. (2013). 1299

Transcriptome de novo assembly from next-generation sequencing and comparative 1300

analyses in the hexaploid salt marsh species Spartina maritima and Spartina 1301

alterniflora (Poaceae). Heredity 110, 181-193. 1302

Franzmayr, B.K., Rasmussen, S., Fraser, K.M., and Jameson, P.E. (2012). Expression 1303

and functional characterization of a white clover isoflavone synthase in tobacco. Ann. 1304

Bot. 110, 1291-1301. 1305

Garsmeur, O., Schnable, J.C., Almeida, A., Jourda, C., D’Hont, A., and Freeling, M. 1306

(2014). Two evolutionarily distinct classes of paleopolyploidy. Mol. Biol. Evol. 31, 1307

448-454. 1308

Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., 1309

Sharpe, T., Hall, G., Shea, T.P., Sykes, S., Berlin, A.M., Aird, D., Costello, M., 1310

Daza, R., Williams, L., Nicol, R., Gnirke, A., Nusbaum, C., Lander, E.S., and 1311

Jaffe, D.B. (2011). High-quality draft assemblies of mammalian genomes from 1312

massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513-1518. 1313

Griffiths, A.G., Barrett, B.A., Simon, D., Khan, A.K., Bickerstaff, P., Anderson, C.B., 1314

Franzmayr, B.K., Hancock, K.R., and Jones, C.S. (2013). An integrated genetic 1315

linkage map for white clover (Trifolium repens L.) with alignment to Medicago. BMC 1316

Genomics 14, 388. 1317

Gross, B.L., Kane, N.C., Lexer, C., Rosenthal, L., Fulco, D.M., Donovan, L.A., and 1318

Rieseberg, L.H. (2004). Reconstructing the Origin of Helianthus deserticola: 1319

Survival and Selection on the Desert Floor. Am. Nat. 164, 145-156. 1320

Grover, C.E., Gallagher, J.P., Szadkowski, E.P., Yoo, M.J., Flagel, L.E., and Wendel, 1321

J.F. (2012). Homoeolog expression bias and expression level dominance in 1322

allopolyploids. New Phytol. 196, 966-971. 1323

Hahsler, M., Buchta, C., and Hornik, K. (2016). Infrastructure for seriation. R package 1324

version 1.2-1. . 1325

Harris, R.S. (2007). Improved pairwise alignment of genomic DNA. In College of 1326

Engineering (The Pennsylvania State University), pp. 84. 1327

Herten, K., Hestand, M.S., Vermeesch, J.R., and Van Houdt, J.K. (2015). GBSX: a 1328

toolkit for experimental design and demultiplexing genotyping by sequencing 1329

experiments. BMC Bioinformatics 16, 73. 1330

Hoff, K.J., Lange, S., Lomsadze, A., Borodovsky, M., and Stanke, M. (2017). BRAKER2 1331

(http://github.com/Gaius-Augustus/BRAKER Downloaded September 28, 2018). 1332

Hoffmann, M.H. (2005). Evolution of the realized climatic niche in the genus Arabidopsis 1333

(Brassicaceae). Evolution 59, 1425-1436. 1334

Hovav, R., Udall, J.A., Chaudhary, B., Rapp, R., Flagel, L., and Wendel, J.F. (2008). 1335

Partitioned expression of duplicated genes during development and evolution of a 1336

single cell in a polyploid plant. Proc. Natl. Acad. Sci. USA 105, 6191-6195. 1337

Jamalnasir, H., Wagiran, A., Shaharuddin, N.A., and Samad, A.A. (2013). Isolation of 1338

high quality RNA from plant rich in flavonoids, Melastoma decemfidum Roxb ex. 1339

Jack. Australian Journal of Crop Science 7, 911-916. 1340

Page 41: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

41

Jia, J., Zhao, S., Kong, X., Li, Y., Zhao, G., He, W., Appels, R., Pfeifer, M., Tao, Y., 1341

Zhang, X., Jing, R., Zhang, C., Ma, Y., Gao, L., Gao, C., Spannagl, M., Mayer, 1342

K.F., Li, D., Pan, S., Zheng, F., Hu, Q., Xia, X., Li, J., Liang, Q., Chen, J., 1343

Wicker, T., Gou, C., Kuang, H., He, G., Luo, Y., Keller, B., Xia, Q., Lu, P., 1344

Wang, J., Zou, H., Zhang, R., Xu, J., Gao, J., Middleton, C., Quan, Z., Liu, G., 1345

Yang, H., Liu, X., He, Z., and Mao, L. (2013). Aegilops tauschii draft genome 1346

sequence reveals a gene repertoire for wheat adaptation. Nature 496, 91-95. 1347

Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, 1348

S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Meintjes, P., 1349

and Drummond, A. (2012). Geneious Basic: An integrated and extendable desktop 1350

software platform for the organization and analysis of sequence data. Bioinformatics 1351

28, 1647-1649. 1352

Kelleher, J., Etheridge, A.M., and McVean, G. (2016). Efficient coalescent simulation and 1353

genealogical analysis for large sample sizes. PLoS Comp. Biol. 12, e1004842. 1354

Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). 1355

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions 1356

and gene fusions. Genome Biol 14, R36. 1357

Kovarik, A., Dadejova, M., Lim, Y.K., Chase, M.W., Clarkson, J.J., Knapp, S., and 1358

Leitch, A.R. (2008). Evolution of rDNA in Nicotiana allopolyploids: a potential link 1359

between rDNA homogenization and epigenetics. Ann. Bot. 101, 815-823. 1360

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. 1361

Methods 9, 357-359. 1362

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-1363

efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, 1364

1-10. 1365

Leach, L.J., Belfield, E.J., Jiang, C., Brown, C., Mithani, A., and Harberd, N.P. (2014). 1366

Patterns of homoeologous gene expression shown by RNA sequencing in hexaploid 1367

bread wheat. BMC Genomics 15, 276. 1368

Leitch, A.R., and Leitch, I.J. (2008). Genomic plasticity and the diversity of polyploid 1369

plants. Science 320, 481-483. 1370

Li, H., and Durbin, R. (2011). Inference of human population history from individual 1371

whole-genome sequences. Nature 475, 493-496. 1372

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., 1373

Abecasis, G., Durbin, R., and Subgroup, G.P.D.P. (2009). The Sequence 1374

Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079. 1375

Li, Z., Baniaga, A.E., Sessa, E.B., Scascitelli, M., Graham, S.W., Rieseberg, L.H., and 1376

Barker, M.S. (2015). Early genome duplications in conifers and other seed plants. 1377

Sci. Adv. 1, e1501084. 1378

Liu, Z., and Adams, K.L. (2007). Expression partitioning between genes duplicated by 1379

polyploidy under abiotic stress and during organ development. Curr. Biol. 17, 1669-1380

1674. 1381

Liu, Z., Xin, M., Qin, J., Peng, H., Ni, Z., Yao, Y., and Sun, Q. (2015). Temporal 1382

transcriptome profiling reveals expression partitioning of homeologous genes 1383

contributing to heat and drought acclimation in wheat (Triticum aestivum L.). BMC 1384

Plant Biol. 15, 152. 1385

Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y.O., and Borodovsky, M. (2005). Gene 1386

identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids 1387

Res. 33, 6494-6506. 1388

Lynch, M. (2010). Evolution of the mutation rate. Trends Genet. 26, 345-352. 1389

Page 42: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

42

Madlung, A. (2013). Polyploidy and its effect on evolutionary success: old questions 1390

revisited with new tools. Heredity 110, 99-104. 1391

Mailund, T., Halager, A.E., Westergaard, M., Dutheil, J.Y., Munch, K., Andersen, L.N., 1392

Lunter, G., Prüfer, K., Scally, A., Hobolth, A., and Schierup, M.H. (2012). A new 1393

Isolation with migration model along complete genomes infers very different 1394

divergence processes among closely related great ape species. PLoS Genet. 8, 1395

e1003125. 1396

Majoros, W.H., Pertea, M., and Salzberg, S.L. (2004). TigrScan and GlimmerHMM: two 1397

open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878-2879. 1398

Mangerud, J., Jakobsson, M., Alexanderson, H., Astakhov, V., Clarke, G.K.C., 1399

Henriksen, M., Hjort, C., Krinner, G., Lunkka, J.-P., Möller, P., Murray, A., 1400

Nikolskaya, O., Saarnisto, M., and Svendsen, J.I. (2004). Ice-dammed lakes and 1401

rerouting of the drainage of northern Eurasia during the Last Glaciation. Quatern. Sci. 1402

Rev. 23, 1313-1332. 1403

Marçais, G., and Kingsford, C. (2011). A fast, lock-free approach for efficient parallel 1404

counting of occurrences of k-mers. Bioinformatics 27, 764-770. 1405

Mason, A.S., and Pires, J.C. (2015). Unreduced gametes: meiotic mishap or evolutionary 1406

mechanism? Trends Genet. 31, 5-10. 1407

McClintock, B. (1984). The significance of responses of the genome to challenge. Science 1408

226, 792-801. 1409

Mierziak, J., Kostyn, K., and Kulma, A. (2014). Flavonoids as important molecules of 1410

plant interactions with the environment. Molecules 19, 16240-16265. 1411

Mouradov, A., and Spangenberg, G. (2014). Flavonoids: a metabolic network mediating 1412

plants adaptation to their real estate. Front. Plant Sci. 5, 620. 1413

Munch, D., Gupta, V., Bachmann, A., Busch, W., Kelly, S., Mun, T., and Andersen, S.U. 1414

(2018). The Brassicaceae family displays divergent, shoot-skewed NLR resistance 1415

gene expression. Plant Physiol. 176, 1598-1609. 1416

Nadachowska-Brzyska, K., Burri, R., Smeds, L., and Ellegren, H. (2016). PSMC analysis 1417

of effective population sizes in molecular ecology and its application to black-and-1418

white Ficedula flycatchers. Mol. Ecol. 25, 1058-1072. 1419

Novikova, P.Y., Hohmann, N., Nizhynska, V., Tsuchimatsu, T., Ali, J., Muir, G., 1420

Guggisberg, A., Paape, T., Schmid, K., Fedorenko, O.M., Holm, S., Sall, T., 1421

Schlotterer, C., Marhold, K., Widmer, A., Sese, J., Shimizu, K.K., Weigel, D., 1422

Kramer, U., Koch, M.A., and Nordborg, M. (2016). Sequencing of the genus 1423

Arabidopsis identifies a complex history of nonbifurcating speciation and abundant 1424

trans-specific polymorphism. Nat. Genet. 48, 1077-1082. 1425

Ossowski, S., Schneeberger, K., Lucas-Lledó, J.I., Warthmann, N., Clark, R.M., Shaw, 1426

R.G., Weigel, D., and Lynch, M. (2010). The rate and molecular spectrum of 1427

spontaneous mutations in Arabidopsis thaliana. Science 327, 92-94. 1428

Otto, S.P. (2007). The evolutionary consequences of polyploidy. Cell 131, 452-462. 1429

Ownbey, M. (1950). Natural hybridization and amphiploidy in the genus Tragopogon. Am. J. 1430

Bot. 37, 487-499. 1431

Paape, T., Hatakeyama, M., Shimizu-Inatsugi, R., Cereghetti, T., Onda, Y., Kenta, T., 1432

Sese, J., and Shimizu, K.K. (2016). Conserved but Attenuated Parental Gene 1433

Expression in Allopolyploids: Constitutive Zinc Hyperaccumulation in the 1434

Allotetraploid Arabidopsis kamchatica. Mol Biol Evol 33, 2781-2800. 1435

Raffl, C., Holderegger, R., Parson, W., and Erschbamer, B. (2008). Patterns in genetic 1436

diversity of Trifolium pallescens populations do not reflect chronosequence on alpine 1437

glacier forelands. Heredity 100, 526-532. 1438

Page 43: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

43

Ramirez-Gonzalez, R.H., Borrill, P., Lang, D., Harrington, S.A., Brinton, J., Venturini, 1439

L., Davey, M., Jacobs, J., van Ex, F., Pasha, A., Khedikar, Y., Robinson, S.J., 1440

Cory, A.T., Florio, T., Concia, L., Juery, C., Schoonbeek, H., Steuernagel, B., 1441

Xiang, D., Ridout, C.J., Chalhoub, B., Mayer, K.F.X., Benhamed, M., Latrasse, 1442

D., Bendahmane, A., Wulff, B.B.H., Appels, R., Tiwari, V., Datla, R., Choulet, F., 1443

Pozniak, C.J., Provart, N.J., Sharpe, A.G., Paux, E., Spannagl, M., Brautigam, 1444

A., and Uauy, C. (2018). The transcriptional landscape of polyploid wheat. Science 1445

361. 1446

Ramsey, J., and Schemske, D.W. (1998). Pathways, mechanisms, and rates of polyploid 1447

formation in flowering plants. Annu. Rev. Ecol. Syst. 29, 467-501. 1448

Rapp, R.A., Udall, J.A., and Wendel, J.F. (2009). Genomic expression dominance in 1449

allopolyploids. BMC Biol. 7, 18. 1450

Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor 1451

package for differential expression analysis of digital gene expression data. 1452

Bioinformatics 26, 139-140. 1453

Sanggaard, K.W., Bechsgaard, J.S., Fang, X., Duan, J., Dyrlund, T.F., Gupta, V., Jiang, 1454

X., Cheng, L., Fan, D., Feng, Y., Han, L., Huang, Z., Wu, Z., Liao, L., Settepani, 1455

V., Thøgersen, I.B., Vanthournout, B., Wang, T., Zhu, Y., Funch, P., Enghild, 1456

J.J., Schauser, L., Andersen, S.U., Villesen, P., Schierup, M.H., Bilde, T., and 1457

Wang, J. (2014). Spider genomes provide insight into composition and evolution of 1458

venom and silk. Nature Communications 5, 3765. 1459

Schiffels, S., and Durbin, R. (2014). Inferring human population size and separation history 1460

from multiple genome sequences. Nat Genet 46, 919-925. 1461

Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation of the maize 1462

subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. 1463

Natl. Acad. Sci. USA 108, 4069-4074. 1464

Selmecki, A.M., Maruvka, Y.E., Richmond, P.A., Guillet, M., Shoresh, N., Sorenson, 1465

A.L., De, S., Kishony, R., Michor, F., Dowell, R., and Pellman, D. (2015). 1466

Polyploidy can drive rapid adaptation in yeast. Nature 519, 349-352. 1467

Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. 1468

(2015). BUSCO: assessing genome assembly and annotation completeness with 1469

single-copy orthologs. Bioinformatics 31, 3210-3212. 1470

Smit, A.F.A., Hubley, R., and Green, P. (2015). RepeatMasker Open-4.0 1471

(http://www.repeatmasker.org Downloaded December 2018). 1472

Soltis, D.E., Visger, C.J., Marchant, D.B., and Soltis, P.S. (2016). Polyploidy: Pitfalls and 1473

paths to a paradigm. Am. J. Bot. 103, 1146-1166. 1474

Soltis, D.E., Soltis, P.S., Pires, J.C., Kovarik, A., Tate, J.A., and Mavrodiev, E. (2004). 1475

Recent and recurrent polyploidy in Tragopogon (Asteraceae): cytogenetic, genomic 1476

and genetic comparisons. Biol. J. Linn. Soc. 82, 485-501. 1477

Soltis, P.S., and Soltis, D.E. (2016). Ancient WGD events as drivers of key innovations in 1478

angiosperms. Curr. Opin. Plant Biol. 30, 159-165. 1479

Soltis, P.S., Marchant, D.B., Van de Peer, Y., and Soltis, D.E. (2015). Polyploidy and 1480

genome evolution in plants. Curr. Opin. Genet. Dev. 35, 119-125. 1481

Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B. (2006). 1482

AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, 1483

W435-439. 1484

Tang, H., Krishnakumar, V., Bidwell, S., Rosen, B., Chan, A., Zhou, S., Gentzbittel, L., 1485

Childs, K.L., Yandell, M., Gundlach, H., Mayer, K.F., Schwartz, D.C., and 1486

Town, C.D. (2014). An improved genome release (version Mt4.0) for the model 1487

legume Medicago truncatula. BMC Genomics 15, 312. 1488

Page 44: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

44

Tayalé, A., and Parisod, C. (2013). Natural pathways to polyploidy in plants and 1489

consequences for genome reorganization. Cytogenet. Genome Res. 140, 79-96. 1490

Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., 1491

Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript 1492

expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Prot. 7, 1493

562-578. 1494

Tsafrir, D., Tsafrir, I., Ein-Dor, L., Zuk, O., Notterman, D.A., and Domany, E. (2005). 1495

Sorting points into neighborhoods (SPIN): data analysis and visualization by ordering 1496

distance matrices. Bioinformatics 21, 2301-2308. 1497

Vanneste, K., Maere, S., and Van de Peer, Y. (2014a). Tangled up in two: a burst of 1498

genome duplications at the end of the Cretaceous and the consequences for plant 1499

evolution. Philos. Trans. R. Soc. B 369, 20130353. 1500

Vanneste, K., Baele, G., Maere, S., and Van de Peer, Y. (2014b). Analysis of 41 plant 1501

genomes supports a wave of successful genome duplications in association with the 1502

Cretaceous–Paleogene boundary. Genome Res. 24, 1334-1347. 1503

Wang, J., Tian, L., Madlung, A., Lee, H.S., Chen, M., Lee, J.J., Watson, B., Kagochi, T., 1504

Comai, L., and Chen, Z.J. (2004). Stochastic and epigenetic changes of gene 1505

expression in Arabidopsis polyploids. Genetics 167, 1961-1973. 1506

Wang, X., Wang, H., Wang, J., Sun, R., Wu, J., Liu, S., Bai, Y., Mun, J.H., Bancroft, I., 1507

Cheng, F., Huang, S., Li, X., Hua, W., Freeling, M., Pires, J.C., Paterson, A.H., 1508

Chalhoub, B., Wang, B., Hayward, A., Sharpe, A.G., Park, B.S., Weisshaar, B., 1509

Liu, B., Li, B., Tong, C., Song, C., Duran, C., Peng, C., Geng, C., Koh, C., Lin, 1510

C., Edwards, D., Mu, D., Shen, D., Soumpourou, E., Li, F., Fraser, F., Conant, 1511

G., Lassalle, G., King, G.J., Bonnema, G., Tang, H., Belcram, H., Zhou, H., 1512

Hirakawa, H., Abe, H., Guo, H., Jin, H., Parkin, I.A., Batley, J., Kim, J.S., Just, 1513

J., Li, J., Xu, J., Deng, J., Kim, J.A., Yu, J., Meng, J., Min, J., Poulain, J., 1514

Hatakeyama, K., Wu, K., Wang, L., Fang, L., Trick, M., Links, M.G., Zhao, M., 1515

Jin, M., Ramchiary, N., Drou, N., Berkman, P.J., Cai, Q., Huang, Q., Li, R., 1516

Tabata, S., Cheng, S., Zhang, S., Sato, S., Sun, S., Kwon, S.J., Choi, S.R., Lee, 1517

T.H., Fan, W., Zhao, X., Tan, X., Xu, X., Wang, Y., Qiu, Y., Yin, Y., Li, Y., Du, 1518

Y., Liao, Y., Lim, Y., Narusaka, Y., Wang, Z., Li, Z., Xiong, Z., and Zhang, Z. 1519

(2011). The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43, 1520

1035-1039. 1521

Williams, C.A., and Grayer, R.J. (2004). Anthocyanins and other flavonoids. Natural 1522

product reports 21, 539-573. 1523

Williams, W.M., Mason, K.M., and Williamson, M.L. (1998). Genetic analysis of 1524

shikimate dehydrogenase allozymes in Trifolium repens L. Theor. Appl. Genet. 96, 1525

859-868. 1526

Williams, W.M., Ellison, N.W., Ansari, H.A., Verry, I.M., and Hussain, S.W. (2012). 1527

Experimental evidence for the ancestry of allotetraploid Trifolium repens and creation 1528

of synthetic forms with value for plant breeding. BMC Plant Biol. 12, 55. 1529

Wood, D.E., and Salzberg, S.L. (2014). Kraken: ultrafast metagenomic sequence 1530

classification using exact alignments. Genome Biol. 15, R46. 1531

Woodhouse, M.R., Cheng, F., Pires, J.C., Lisch, D., Freeling, M., and Wang, X. (2014). 1532

Origin, inheritance, and gene regulatory consequences of genome dominance in 1533

polyploids. Proc. Natl. Acad. Sci. USA 111, 5283-5288. 1534

Wu, T.D., and Watanabe, C.K. (2005). GMAP: a genomic mapping and alignment program 1535

for mRNA and EST sequences. Bioinformatics 21, 1859-1875. 1536

Yang, J., Liu, D., Wang, X., Ji, C., Cheng, F., Liu, B., Hu, Z., Chen, S., Pental, D., Ju, 1537

Y., Yao, P., Li, X., Xie, K., Zhang, J., Wang, J., Liu, F., Ma, W., Shopan, J., 1538

Page 45: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

45

Zheng, H., Mackenzie, S.A., and Zhang, M. (2016). The genome sequence of 1539

allopolyploid Brassica juncea and analysis of differential homoeolog gene expression 1540

influencing selection. Nat. Genet. 48, 1225-1232. 1541

Yokoyama, Y., Lambeck, K., De Deckker, P., Johnston, P., and Fifield, L.K. (2000). 1542

Timing of the Last Glacial Maximum from observed sea-level minima. Nature 406, 1543

713-716. 1544

Yoo, M.J., Szadkowski, E., and Wendel, J.F. (2013). Homoeolog expression bias and 1545

expression level dominance in allopolyploid cotton. Heredity 110, 171-180. 1546

Zerbino, D.R., and Birney, E. (2008). Velvet: Algorithms for de novo short read assembly 1547

using de Bruijn graphs. Genome Res. 18, 821-829. 1548

Zhang, D., Pan, Q., Cui, C., Tan, C., Ge, X., Shao, Y., and Li, Z. (2015a). Genome-1549

specific differential gene expressions in resynthesized Brassica allotetraploids from 1550

pair-wise crosses of three cultivated diploids revealed by RNA-seq. Front. Plant Sci. 1551

6, 957. 1552

Zhang, T., Hu, Y., Jiang, W., Fang, L., Guan, X., Chen, J., Zhang, J., Saski, C.A., 1553

Scheffler, B.E., Stelly, D.M., Hulse-Kemp, A.M., Wan, Q., Liu, B., Liu, C., Wang, 1554

S., Pan, M., Wang, Y., Wang, D., Ye, W., Chang, L., Zhang, W., Song, Q., 1555

Kirkbride, R.C., Chen, X., Dennis, E., Llewellyn, D.J., Peterson, D.G., Thaxton, 1556

P., Jones, D.C., Wang, Q., Xu, X., Zhang, H., Wu, H., Zhou, L., Mei, G., Chen, 1557

S., Tian, Y., Xiang, D., Li, X., Ding, J., Zuo, Q., Tao, L., Liu, Y., Li, J., Lin, Y., 1558

Hui, Y., Cao, Z., Cai, C., Zhu, X., Jiang, Z., Zhou, B., Guo, W., Li, R., and Chen, 1559

Z.J. (2015b). Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) 1560

provides a resource for fiber improvement. Nat. Biotechnol. 33, 531-537. 1561

Zhang, X., Zhang, Y., Yan, R., Han, J., Fuzeng, H., Wang, J., and Cao, K. (2010). 1562

Genetic variation of white clover (Trifolium repens L.) collections from China 1563

detected by morphological traits, RAPD and SSR. Afr. J. Biotechnol. 9, 3033-3041. 1564

Zimin, A.V., Marçais, G., Puiu, D., Roberts, M., Salzberg, S.L., and Yorke, J.A. (2013). 1565

The MaSuRCA genome assembler. Bioinformatics 29, 2669-2677. 1566

1567

1568

Page 46: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

46

FIGURE LEGENDS 1569

Figure 1. The range of white clover and extant relatives of its progenitors. The present 1570

day ranges of white clover (Trifolium repens) (Daday, 1958) (green), and extant relatives of 1571

its diploid progenitors T. occidentale (blue) and T. pallescens (orange). T. occidentale is 1572

found within ~100 m of the seashore whereas T. pallescens grows in alpine regions between 1573

1800 and 2700 m. 1574

1575

Figure 2. White clover genetic map and synteny with extant progenitor and model 1576

forage legume genomes. A) Genetic map based on linkage disequilibrium (LD) analysis of 1577

~7,300 scaffold-mapped markers from a F1 biparental mapping population (n=93). Red 1578

colour indicates a high level of pairwise linkage disequilibrium (r2) between markers. Black 1579

dots indicate single locus homoeologue-specific microsatellite markers from an existing, co-1580

linear genetic linkage map (Griffiths et al., 2013) (Supplemental Figure 2). Progenitor 1581

origin of linkage group homoeologues/subgenomes is denoted by O (Trifolium occidentale) 1582

and P (T. pallescens). B) Circos diagram showing inter-pseudomolecule relationships 1583

between the subgenomes of white clover (TrTo; TrTp) and their progenitors (To; Tp) in these 1584

assemblies. The outer ring (green shaded) represents pseudomolecules in megabases (Mb), 1585

and the inner ring (blue) depicts gene density as a proportion along the pseudomolecules (%). 1586

Coloured lines represent synteny blocks constructed by whole genome alignment using 1587

LASTZ (Harris, 2007) comprising matches over 33 kb in length. Blocks within 100 kb 1588

windows were merged and represented by a single line. Cross-progenitor links indicate 1589

regions of high conservation between the subgenome and both progenitors, not putative 1590

recombination events. C) A matrix plot assessment of synteny between white clover 1591

subgenomes (O and P) and the reference genome of Medicago truncatula, a model forage 1592

legume with eight pseudomolecules (Mt 1 to Mt 8). Synteny was based on alignment of the 1593

3,364 LD-ordered white clover anchor scaffolds with M. truncatula genome Mt4.0 (Tang et 1594

al., 2014). 1595

1596

Figure 3. The white clover allopolyploidization event. A) Schematic phylogenetic tree for 1597

white clover (Tr) and its progenitors T. occidentale (To) and T. pallescens (Tp). A common 1598

ancestor (A) gave rise to To and Tp (grey line), which hybridized (Hyb, blue line) to give rise 1599

to the two sub-genomes TrTo and TrTp in allopolyploid white clover. B) Blue dots indicate 1600

split time estimates based on sets of 100 alignment blocks derived from the pairwise genome 1601

Page 47: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

47

comparisons indicated, e.g. To vs Tp. Dots are superimposed on box-and-whiskers plots, 1602

where the median is indicated by a black vertical line. Using a mutation rate of 1.8×10-8

, split 1603

time estimates suggest To and Tp diverged from a common ancestor ~192k years ago, 1604

whereas white clover TrTo and TrTp subgenomes split from their extant progenitors 15k and 1605

17k years ago, respectively, suggesting genesis of white clover ~16k years ago. C) White 1606

clover progenitor speciation (grey block) and hybridization giving rise to white clover (blue 1607

block) aligned with global temperature variation relative to present day average temperature 1608

based on ice core data (Jouzel et al., 2007). The blocks indicate the extent of possible 1609

divergence times using mutation rates in the range 1.1-1.8×10-8

(Supplemental Table 9). D) 1610

Pairwise sequentially Markovian coalescent (PSMC) curve based on whole-genome 1611

resequencing data from each of four white clover individuals. Each curve indicates inferred 1612

population size history through time. The analysis depicted here was carried out using a 1613

mutation rate of 1.8 x 10-8

. See Supplemental Table 11 for results with lower mutation rates. 1614

E) Multiple sequentially Markovian coalescent (MSMC) figure based on the same data as the 1615

PSMC figure. The number of haplotypes corresponds to the number of individuals included. 1616

8 haplotypes comprises all 4. 6 haplotypes comprises individuals 81, 122 and 183, all of the 1617

individuals which had the most pronounced bottlenecks. 4 haplotypes have 3 combinations of 1618

2 individuals, [1] includes 81 and 122, [2] includes 122 and 183, [3] includes 81 and 183. The 1619

results were scaled to a mutation rate of 1.8 x 10-8

. See Supplemental Figure 10 for MSMC 1620

using different mutations rates. F) Site frequency spectra (SFS) of simulations across 20k 1621

generations under the different demographic scenarios indicated on the right. EPS: Effective 1622

population size. The observed SFS is scaled to match total polymorphism count for the 1623

simulations. Dashed line represents the expected SFS under a constant effective population 1624

size. Green numbers indicate simulated SNP densities (%) and goodness of fit between 1625

observed and simulated data. The goodness of fit was calculated by first dividing the 1626

simulated and observed values for each bin with the maximum value of the two and then 1627

using the formula: Σ(observed-simulated)2/simulated. Lower values indicate a better fit. The 1628

observed SNP density was 7.0%. 1629

Page 48: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

48

Figure 4. Homoeologue-specific expression analysis. A) Genes classified based on the 1630

number of tissues in which the majority of reads were derived from the TrTo or the TrTp 1631

homoeologue. The numbers underneath the graph indicate the number of tissues for which 1632

the statement on the left is true. Most of the genes (69%) show the same direction of bias 1633

across all four tissues. B) Log(TrTo/TrTp) expression ratios in flowers versus leaves. C) 1634

Spearman correlation coefficients for cross-tissue comparisons of the total expression level 1635

for both homoeologues (TrTo+TrTp) and for the homoeologue expression ratios (TrTo/TrTp). Fl: 1636

flower. Le: leaf. St: stolon/shoot. Ro: root. Error bars indicate 95% confidence intervals 1637

calculated using the formula tanh(arctanh(r2)±1.96/sqrt(n−3)), where r2 is the Pearson 1638

correlation estimate and n=19,954 is the number of observations. D) Venn diagram showing 1639

the overlaps between the expression outlier genes detected in the Pooled, S9 and S10 1640

experiments. E) Log(TrTo/TrTp) difference from the mean values from all three experiments 1641

(Pooled, S9, S10) plotted for a randomly selected gene and an expression ratio outlier. The 1642

grey horizontal line indicates the mean. m-leaf: mature leaf. leaf: emerging young leaf. F) 1643

Clustered heatmap showing row-normalized log(TrTo/TrTp) values for all 909 expression 1644

outliers detected using the ANOVA method. m-leaf: mature leaf. leaf: emerging young leaf. 1645

G) Smoothed scatter plot showing the log(TrTo/TrTp) difference from the mean for all 19,954 1646

genes and the 909 ANOVA outliers, respectively. H) Expression ratio outliers in flavonoid 1647

metabolism. Blue text highlights enzymes found as expression ratio outliers. Numbers above 1648

enzyme names are enzyme category (EC) identifiers. 1649

1650

Page 49: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

49

Tables 1651

Table 1: Genome assembly statistics for white clover (Trifolium repens) and extant relatives 1652

of its diploid progenitors, T. occidentale and T. pallescens. 1653 1654

T. occidentale T. pallescens T. repens

Estimated Genome Size (bp)

ALLPATHS-LG estimate 530,459,686 534,411,997 1,174,038,073

k-mer (k = 17) 581,040,135 591,946,893 1,105,102,999

1C Genome size1 ND ND 1,093,000,000

Total Assembly Length 436,797,749 382,382,469 841,400,916

Estimated Genome Coverage 82% 72% 72%

Chloroplast Genome 133,780 130,394 132,086

Assembly Metrics

Assembled Scaffolds (>1000 bp) 7,229 6,923 22,100

Longest Scaffold (bp) 2,576,867 1,323,392 734,507

Average Length (bp) 61,288 55,234 38,072

N10 (bp; number in category) 599,981; 49 571,804; 48 318,557; 201

N50 (bp; number in category) 191,714; 624 172,944; 586 121,657; 2016

N90 (bp; number in category) 41,953; 2416 34,989; 2451 19,736; 8100

Total Undefined bases (Ns) 51,627,539 65,642,595 86,144,561

Gap Ratio 11.8% 17.2% 10.2%

BUSCO2 Scores of Assembly (based on 1440 reference genes)

Complete genes - Total 1353 (94%) 1353 (94%) 1321 (92%)

Complete genes - Single Copy 1203 (84%) 1197 (83%) 494 (34%)

Page 50: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

50

Complete genes - Duplicated 150 (10%) 156 (11%) 827 (57%)

Fragmented genes 37 (3%) 33 (2%) 39 (3%)

Missing genes 50 (3%) 54 (4%) 80 (6%)

1Flow cytometry (Bennett and Leitch, 2011); ND=Not Determined 1655

2BUSCO=Benchmarked Universal Single-Copy Orthologs (Simão et al., 2015). 1656

1657

1658

1659

Page 51: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Figure 1. The range of white clover and extant relatives of its progenitors. The present-day ranges of white clover (Trifolium repens) (Daday, 1958) (green), and extant relatives of its diploid progenitors T. occidentale (blue) and T. pallescens (orange). T. occidentale is found within ~100 m of the seashore whereas T. pallescens grows in alpine regions between 1800 and 2700 m.

Page 52: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization
Page 53: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Figure 2. White clover genetic map and synteny with extant progenitor and model forage legume genomes. A) Genetic map based on linkage disequilibrium (LD) analysis of ~7,300 scaffold-mapped markers from a F1 biparental mapping population (n=93). Red colour indicates a high level of pairwise linkage disequilibrium (r2) between markers. Black dots indicate single locus homoeologue-specific microsatellite markers from an existing, co-linear genetic linkage map (Griffiths et al., 2013) (Supplemental Figure 2). Progenitor origin of linkage group homoeologues/subgenomes is denoted by O (Trifolium occidentale) and P (T. pallescens). B) Circos diagram showing inter-pseudomolecule relationships between the subgenomes of white clover (TrTo; TrTp) and their progenitors (To; Tp) in these assemblies. The outer ring (green shaded) represents pseudomolecules in megabases (Mb), and the inner ring (blue) depicts gene density as a proportion along the pseudomolecules (%). Coloured lines represent synteny blocks constructed by whole genome alignment using LASTZ (Harris, 2007) comprising matches over 33 kb in length. Blocks within 100 kb windows were merged and represented by a single line. Cross-progenitor links indicate regions of high conservation between the subgenome and both progenitors, not putative recombination events. C) A matrix plot assessment of synteny between white clover subgenomes (O and P) and the reference genome of Medicago truncatula, a model forage legume with eight pseudomolecules (Mt 1 to Mt 8). Synteny was based on alignment of the 3,364 LD-ordered white clover anchor scaffolds with M. truncatula genome Mt4.0 (Tang et al., 2014).

Page 54: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization
Page 55: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Figure 3. The white clover allopolyploidization event. A) Schematic phylogenetic tree for white clover (Tr) and its progenitors T. occidentale (To) and T. pallescens (Tp). A common ancestor (A) gave rise to To and Tp (grey line), which hybridized (Hyb, blue line) to give rise to the two sub-genomes TrTo and TrTp in allopolyploid white clover. B) Blue dots indicate split time estimates based on sets of 100 alignment blocks derived from the pairwise genome comparisons indicated, e.g. To vs Tp. Dots are superimposed on box-and-whiskers plots, where the median is indicated by a black vertical line. Using a mutation rate of 1.8×10-

8, split time estimates suggest To and Tp diverged from a common ancestor ~192k years ago, whereas white clover TrTo and TrTp subgenomes split from their extant progenitors 15k and 17k years ago, respectively, suggesting genesis of white clover ~16k years ago. C) White clover progenitor speciation (grey block) and hybridization giving rise to white clover (blue block) aligned with global temperature variation relative to present day average temperature based on ice core data (Jouzel et al., 2007). The blocks indicate the extent of possible divergence times using mutation rates in the range 1.1-1.8×10-8 (Supplemental Table 9). D) Pairwise sequentially Markovian coalescent (PSMC) curve based on whole-genome resequencing data from each of four white clover individuals. Each curve indicates inferred population size history through time. The analysis depicted here was carried out using a mutation rate of 1.8 x 10-8. See Supplemental Table 11 for results with lower mutation rates. E) Multiple sequentially Markovian coalescent (MSMC) figure based on the same data as the PSMC figure. The number of haplotypes corresponds to the number of individuals included. 8 haplotypes comprises all 4. 6 haplotypes comprises individuals 81, 122 and 183, all of the individuals which had the most pronounced bottlenecks. 4 haplotypes have 3 combinations of 2 individuals, [1] includes 81 and 122, [2] includes 122 and 183, [3] includes 81 and 183. The results were scaled to a mutation rate of 1.8 x 10-8. See Supplemental Figure 10 for MSMC using different mutations rates. F) Site frequency spectra (SFS) of simulations across 20k generations under the different demographic scenarios indicated on the right. EPS: Effective population size. The observed SFS is scaled to match total polymorphism count for the simulations. Dashed line represents the expected SFS under a constant effective population size. Green numbers indicate simulated SNP densities (%) and goodness of fit between observed and simulated data. The goodness of fit was calculated by first dividing the simulated and observed values for each bin with the maximum value of the two and then using the formula: Σ(observed-simulated)2/simulated. Lower values indicate a better fit. The observed SNP density was 7.0%.

Page 56: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization
Page 57: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Figure 4. Homoeologue-specific expression analysis. A) Genes classified based on the number of tissues in which the majority of reads were derived from the TrTo or the TrTp homoeologue. The numbers underneath the graph indicate the number of tissues for which the statement on the left is true. Most of the genes (69%) show the same direction of bias across all four tissues. B) Log(TrTo/TrTp) expression ratios in flowers versus leaves. C) Spearman correlation coefficients for cross-tissue comparisons of the total expression level for both homoeologues (TrTo+TrTp) and for the homoeologue expression ratios (TrTo/TrTp). Fl: flower. Le: leaf. St: stolon/shoot. Ro: root. Error bars indicate 95% confidence intervals calculated using the formula tanh(arctanh(r2)±1.96/sqrt(n−3)), where r2 is the Pearson correlation estimate and n=19,954 is the number of observations. D) Venn diagram showing the overlaps between the expression outlier genes detected in the Pooled, S9 and S10 experiments. E) Log(TrTo/TrTp) difference from the mean values from all three experiments (Pooled, S9, S10) plotted for a randomly selected gene and an expression ratio outlier. The grey horizontal line indicates the mean. m-leaf: mature leaf. leaf: emerging young leaf. F) Clustered heatmap showing row-normalized log(TrTo/TrTp) values for all 909 expression outliers detected using the ANOVA method. m-leaf: mature leaf. leaf: emerging young leaf. G) Smoothed scatter plot showing the log(TrTo/TrTp) difference from the mean for all 19,954 genes and the 909 ANOVA outliers, respectively. H) Expression ratio outliers in flavonoid metabolism. Blue text highlights enzymes found as expression ratio outliers. Numbers above enzyme names are enzyme category (EC) identifiers.

Page 58: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Parsed CitationsIWGSC)., I.W.G.S.C. (2014). A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science345, 1251788.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Aasmo Finne, M., Rognli, O.A., and Schjelderup, I. (2000). Genetic variation in a Norwegian germplasm collection of white clover(Trifolium repens L.). Euphytica 112, 45-56.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Abberton, M.T., Fothergill, M., Collins, R.P., and Marshall, A.H. (2006). Breeding forage legumes for sustainable and profitable farmingsystems. Asp. Appl. Biol. 80, 81-87.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Adams, K.L., Cronn, R., Percifield, R., and Wendel, J.F. (2003). Genes duplicated by polyploidy show unequal contributions to thetranscriptome and organ-specific reciprocal silencing. Proc. Natl. Acad. Sci. USA 100, 4649-4654.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ainouche, M.L., Fortune, P.M., Salmon, A., Parisod, C., Grandbastien, M.-A., Fukunaga, K., Ricou, M., and Misset, M.-T. (2009).Hybridization, polyploidy and invasion: lessons from Spartina (Poaceae). Biol. Invasions 11, 1159-1173.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Akama, S., Shimizu-Inatsugi, R., Shimizu, K.K., and Sese, J. (2014). Genome-wide quantification of homeolog expression ratio revealednonstochastic gene regulation in synthetic allopolyploid Arabidopsis. Nucleic Acids Res 42, e46.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Álvarez, I., and Wendel, J.F. (2003). Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogen. Evol. 29, 417-434.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Anderson, C.B., Franzmayr, B.K., Hong, S.W., Larking, A.C., van Stijn, T.C., Tan, R., Moraga, R., Faville, M.J., and Griffiths, A.G. (2018).Protocol: a versatile, inexpensive, high-throughput plant genomic DNA extraction method suitable for genotyping-by-sequencing.Plant Methods 14, 75.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ansari, H.A., Ellison, N.W., and Williams, W.M. (2008). Molecular and cytogenetic evidence for an allotetraploid origin of Trifoliumdubium (Leguminosae). Chromosoma 117, 159-167.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ansari, H.A., Ellison, N.W., Reader, S.M., Badaeva, E.D., Friebe, B., Miller, T.E., and Williams, W.M. (1999). Molecular cytogeneticorganization of 5S and 18S-26S rDNA loci in white clover (Trifolium repens L.) and related species. Ann. Bot. 83, 199-206.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Bardil, A., de Almeida, J.D., Combes, M.C., Lashermes, P., and Bertrand, B. (2011). Genomic expression dominance in the naturalallopolyploid Coffea arabica is massively affected by growth temperature. New Phytol. 192, 760-774.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Bennett, M.D., and Leitch, I.J. (2011). Nuclear DNA amounts in angiosperms: targets, trends and tomorrow. Ann. Bot. 107, 467-590.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Bertioli, D.J., Cannon, S.B., Froenicke, L., Huang, G., Farmer, A.D., Cannon, E.K., Liu, X., Gao, D., Clevenger, J., Dash, S., Ren, L.,Moretzsohn, M.C., Shirasawa, K., Huang, W., Vidigal, B., Abernathy, B., Chu, Y., Niederhuth, C.E., Umale, P., Araujo, A.C., Kozik, A., Kim,K.D., Burow, M.D., Varshney, R.K., Wang, X., Zhang, X., Barkley, N., Guimaraes, P.M., Isobe, S., Guo, B., Liao, B., Stalker, H.T., Schmitz,R.J., Scheffler, B.E., Leal-Bertioli, S.C., Xun, X., Jackson, S.A., Michelmore, R., and Ozias-Akins, P. (2016). The genome sequences ofArachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat. Genet. 48, 438-446.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Bilton, T.P., McEwan, J.C., Clarke, S.M., Brauning, R., van Stijn, T.C., Rowe, S.J., and Dodds, K.G. (2018). Linkage DisequilibriumEstimation in Low Coverage High-Throughput Sequencing Data. Genetics 209, 389-400.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Page 59: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D., and Pirovano, W. (2011). Scaffolding pre-assembled contigs using SSPACE.Bioinformatics 27, 578-579.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Bottley, A., Xia, G.M., and Koebner, R.M.D. (2006). Homoeologous gene silencing in hexaploid wheat. Plant J. 47, 897-906.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Branca, A., Paape, T.D., Zhou, P., Briskine, R., Farmer, A.D., Mudge, J., Bharti, A.K., Woodward, J.E., May, G.D., Gentzbittel, L., Ben, C.,Denny, R., Sadowsky, M.J., Ronfort, J., Bataillon, T., Young, N.D., and Tiffin, P. (2011). Whole-genome nucleotide diversity,recombination, and linkage disequilibrium in the model legume Medicago truncatula. Proc. Natl. Acad. Sci. USA 108, E864-870.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Brochmann, C., Brysting, A.K., Alsos, I.G., Borgen, L., Grundt, H.H., Scheen, A.C., and Elven, R. (2004). Polyploidy in arctic plants. Biol.J. Linn. Soc. 82, 521-536.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Buggs, R.J.A., Elliott, N.M., Zhang, L., Koh, J., Viccini, L.F., Soltis, D.E., and Soltis, P.S. (2010). Tissue-specific silencing of homoeologsin natural populations of the recent allopolyploid Tragopogon mirus. New Phytol. 186, 175-183.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Cao, M., Fraser, K., Jones, C., Stewart, A., Lyons, T., Faville, M., and Barrett, B. (2017). Untargeted metabotyping Lolium perennereveals population-level variation in plant flavonoids and alkaloids. Front. Plant Sci. 8, 133.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Caspi, R., Billington, R., Ferrer, L., Foerster, H., Fulcher, C.A., Keseler, I.M., Kothari, A., Krummenacker, M., Latendresse, M., Mueller,L.A., Ong, Q., Paley, S., Subhraveti, P., Weaver, D.S., and Karp, P.D. (2016). The MetaCyc database of metabolic pathways and enzymesand the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, 471-480.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Catchen, J., Hohenlohe, P.A., Bassham, S., Amores, A., and Cresko, W.A. (2013). Stacks: an analysis tool set for population genomics.Mol. Ecol. 22, 3124-3140.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Cenci, A., Combes, M.C., and Lashermes, P. (2012). Genome evolution in diploid and tetraploid Coffea species as revealed bycomparative analysis of orthologous genome segments. Plant Mol. Biol. 78, 135-145.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Chelaifa, H., Monnier, A., and Ainouche, M. (2010). Transcriptomic changes following recent natural hybridization and allopolyploidy inthe salt marsh species Spartina x townsendii and Spartina anglica (Poaceae). New Phytol. 186, 161-174.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Chen, J., Glémin, S., and Lascoux, M. (2017). Genetic diversity and the efficacy of purifying selection across plant and animal species.Mol. Biol. Evol. 34, 1417-1428.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Cheng, F., Wu, J., Fang, L., Sun, S., Liu, B., Lin, K., Bonnema, G., and Wang, X. (2012). Biased gene fractionation and dominant geneexpression among the subgenomes of Brassica rapa. PloS one 7, e36442.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Chester, M., Gallagher, J.P., Symonds, V.V., Cruz da Silva, A.V., Mavrodiev, E.V., Leitch, A.R., Soltis, P.S., and Soltis, D.E. (2012).Extensive chromosomal variation in a recently formed natural allopolyploid species, Tragopogon miscellus (Asteraceae). Proc. Natl.Acad. Sci. USA 109, 1176-1181.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Combes, M.-C., Dereeper, A., Severac, D., Bertrand, B., and Lashermes, P. (2013). Contribution of subgenomes to the transcriptomeand their intertwined regulation in the allopolyploid Coffea arabica grown at contrasted temperatures. New Phytol. 200, 251-260.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Page 60: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Google Scholar: Author Only Title Only Author and Title

Combes, M.C., Hueber, Y., Dereeper, A., Rialle, S., Herrera, J.C., and Lashermes, P. (2015). Regulatory divergence between parentalalleles determines gene expression patterns in hybrids. Genome biology and evolution 7, 1110-1121.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Coombe, D. (1961). Trifolium occidentale, a new species related to T. repens L. Watsonia 5, 68-87.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Cousins, G., and Woodfield, D.R. (2006). Effect of inbreeding on growth of white clover. In 13th Australasian Plant BreedingConference, C.F. Mercer, ed (Christchurch, New Zealand: New Zealand Grassland Association), pp. 568-572.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Cox, M.P., Peterson, D.A., and Biggs, P.J. (2010). SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencingdata. BMC Bioinformatics 11, 485.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Daday, H. (1958). Gene frequencies in wild populations of Trifolium repens L. III. World distribution. Heredity 12, 169-184.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Darling, A.C.E., Mau, B., Blattner, F.R., and Perna, N.T. (2004). Mauve: multiple alignment of conserved genomic sequence withrearrangements. Genome Res. 14, 1394-1403.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Davies, K.M., Albert, N.W., Zhou, Y., and Schwinn, K.E. (2018). Functions of flavonoid and betalain pigments in abiotic stress tolerancein plants. Annual Plant Reviews online 1, 1-41.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

De Storme, N., and Geelen, D. (2014). The impact of environmental stress on male reproductive development in plants: biologicalprocesses and molecular mechanisms. Plant, Cell Environ. 37, 1-18.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M.,McKenna, A., Fennell, T.J., Kernytsky, A.M., Sivachenko, A.Y., Cibulskis, K., Gabriel, S.B., Altshuler, D., and Daly, M.J. (2011). Aframework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491-498.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., and Gingeras, T.R. (2013). STAR:ultrafast universal RNA-seq aligner. Bioinformatics 29, 15-21.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Douglas, G.M., Gos, G., Steige, K.A., Salcedo, A., Holm, K., Josephs, E.B., Arunkumar, R., Agren, J.A., Hazzouri, K.M., Wang, W., Platts,A.E., Williamson, R.J., Neuffer, B., Lascoux, M., Slotte, T., and Wright, S.I. (2015). Hybrid origins and the earliest stages of diploidizationin the highly successful recent polyploid Capsella bursa-pastoris. Proc. Natl. Acad. Sci. USA 112, 2806-2811.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Duchemin, W., Dupont, P.Y., Campbell, M.A., Ganley, A.R., and Cox, M.P. (2015). HyLiTE: accurate and flexible analysis of geneexpression in hybrid and allopolyploid species. BMC Bioinformatics 16, 8.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ellison, N.W., Liston, A., Steiner, J.J., Williams, W.M., and Taylor, N.L. (2006). Molecular phylogenetics of the clover genus (Trifolium -Leguminosae). Mol. Phylogen. Evol. 39, 688-705.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Elshire, R.J., Glaubitz, J.C., Sun, Q., Poland, J.A., Kawamoto, K., Buckler, E.S., and Mitchell, S.E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS one 6, e19379.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ewing, B., and Green, P. (1998). Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186-194.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Page 61: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Fawcett, J.A., Maere, S., and Van de Peer, Y. (2009). Plants with double genomes might have had a better chance to survive theCretaceous–Tertiary extinction event. Proc. Natl. Acad. Sci. USA 106, 5737-5742.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Feldman, M., Liu, B., Segal, G., Abbo, S., Levy, A.A., and Vega, J.M. (1997). Rapid elimination of low-copy DNA sequences in polyploidwheat: a possible mechanism for differentiation of homoeologous chromosomes. Genetics 147, 1381-1387.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ferreira de Carvalho, J., Poulain, J., Da Silva, C., Wincker, P., Michon-Coudouel, S., Dheilly, A., Naquin, D., Boutte, J., Salmon, A., andAinouche, M. (2013). Transcriptome de novo assembly from next-generation sequencing and comparative analyses in the hexaploidsalt marsh species Spartina maritima and Spartina alterniflora (Poaceae). Heredity 110, 181-193.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Franzmayr, B.K., Rasmussen, S., Fraser, K.M., and Jameson, P.E. (2012). Expression and functional characterization of a white cloverisoflavone synthase in tobacco. Ann. Bot. 110, 1291-1301.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Garsmeur, O., Schnable, J.C., Almeida, A., Jourda, C., D'Hont, A., and Freeling, M. (2014). Two evolutionarily distinct classes ofpaleopolyploidy. Mol. Biol. Evol. 31, 448-454.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., Sharpe, T., Hall, G., Shea, T.P., Sykes, S., Berlin, A.M.,Aird, D., Costello, M., Daza, R., Williams, L., Nicol, R., Gnirke, A., Nusbaum, C., Lander, E.S., and Jaffe, D.B. (2011). High-quality draftassemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513-1518.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Griffiths, A.G., Barrett, B.A., Simon, D., Khan, A.K., Bickerstaff, P., Anderson, C.B., Franzmayr, B.K., Hancock, K.R., and Jones, C.S.(2013). An integrated genetic linkage map for white clover (Trifolium repens L.) with alignment to Medicago. BMC Genomics 14, 388.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Gross, B.L., Kane, N.C., Lexer, C., Rosenthal, L., Fulco, D.M., Donovan, L.A., and Rieseberg, L.H. (2004). Reconstructing the Origin ofHelianthus deserticola: Survival and Selection on the Desert Floor. Am. Nat. 164, 145-156.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Grover, C.E., Gallagher, J.P., Szadkowski, E.P., Yoo, M.J., Flagel, L.E., and Wendel, J.F. (2012). Homoeolog expression bias andexpression level dominance in allopolyploids. New Phytol. 196, 966-971.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Hahsler, M., Buchta, C., and Hornik, K. (2016). Infrastructure for seriation. R package version 1.2-1. .Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Harris, R.S. (2007). Improved pairwise alignment of genomic DNA. In College of Engineering (The Pennsylvania State University), pp.84.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Herten, K., Hestand, M.S., Vermeesch, J.R., and Van Houdt, J.K. (2015). GBSX: a toolkit for experimental design and demultiplexinggenotyping by sequencing experiments. BMC Bioinformatics 16, 73.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Hoff, K.J., Lange, S., Lomsadze, A., Borodovsky, M., and Stanke, M. (2017). BRAKER2 (http://github.com/Gaius-Augustus/BRAKERDownloaded September 28, 2018).

Hoffmann, M.H. (2005). Evolution of the realized climatic niche in the genus Arabidopsis (Brassicaceae). Evolution 59, 1425-1436.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Hovav, R., Udall, J.A., Chaudhary, B., Rapp, R., Flagel, L., and Wendel, J.F. (2008). Partitioned expression of duplicated genes duringdevelopment and evolution of a single cell in a polyploid plant. Proc. Natl. Acad. Sci. USA 105, 6191-6195.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Jamalnasir, H., Wagiran, A., Shaharuddin, N.A., and Samad, A.A. (2013). Isolation of high quality RNA from plant rich in flavonoids,

Page 62: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Melastoma decemfidum Roxb ex. Jack. Australian Journal of Crop Science 7, 911-916.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Jia, J., Zhao, S., Kong, X., Li, Y., Zhao, G., He, W., Appels, R., Pfeifer, M., Tao, Y., Zhang, X., Jing, R., Zhang, C., Ma, Y., Gao, L., Gao, C.,Spannagl, M., Mayer, K.F., Li, D., Pan, S., Zheng, F., Hu, Q., Xia, X., Li, J., Liang, Q., Chen, J., Wicker, T., Gou, C., Kuang, H., He, G., Luo,Y., Keller, B., Xia, Q., Lu, P., Wang, J., Zou, H., Zhang, R., Xu, J., Gao, J., Middleton, C., Quan, Z., Liu, G., Yang, H., Liu, X., He, Z., andMao, L. (2013). Aegilops tauschii draft genome sequence reveals a gene repertoire for wheat adaptation. Nature 496, 91-95.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., Thierer,T., Ashton, B., Meintjes, P., and Drummond, A. (2012). Geneious Basic: An integrated and extendable desktop software platform for theorganization and analysis of sequence data. Bioinformatics 28, 1647-1649.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kelleher, J., Etheridge, A.M., and McVean, G. (2016). Efficient coalescent simulation and genealogical analysis for large sample sizes.PLoS Comp. Biol. 12, e1004842.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes inthe presence of insertions, deletions and gene fusions. Genome Biol 14, R36.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kovarik, A., Dadejova, M., Lim, Y.K., Chase, M.W., Clarkson, J.J., Knapp, S., and Leitch, A.R. (2008). Evolution of rDNA in Nicotianaallopolyploids: a potential link between rDNA homogenization and epigenetics. Ann. Bot. 101, 815-823.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357-359.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to thehuman genome. Genome Biol. 10, 1-10.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Leach, L.J., Belfield, E.J., Jiang, C., Brown, C., Mithani, A., and Harberd, N.P. (2014). Patterns of homoeologous gene expression shownby RNA sequencing in hexaploid bread wheat. BMC Genomics 15, 276.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Leitch, A.R., and Leitch, I.J. (2008). Genomic plasticity and the diversity of polyploid plants. Science 320, 481-483.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Li, H., and Durbin, R. (2011). Inference of human population history from individual whole-genome sequences. Nature 475, 493-496.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Subgroup, G.P.D.P. (2009).The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Li, Z., Baniaga, A.E., Sessa, E.B., Scascitelli, M., Graham, S.W., Rieseberg, L.H., and Barker, M.S. (2015). Early genome duplications inconifers and other seed plants. Sci. Adv. 1, e1501084.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Liu, Z., and Adams, K.L. (2007). Expression partitioning between genes duplicated by polyploidy under abiotic stress and during organdevelopment. Curr. Biol. 17, 1669-1674.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Liu, Z., Xin, M., Qin, J., Peng, H., Ni, Z., Yao, Y., and Sun, Q. (2015). Temporal transcriptome profiling reveals expression partitioning ofhomeologous genes contributing to heat and drought acclimation in wheat (Triticum aestivum L.). BMC Plant Biol. 15, 152.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Lomsadze, A., Ter-Hovhannisyan, V., Chernoff, Y.O., and Borodovsky, M. (2005). Gene identification in novel eukaryotic genomes by

Page 63: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

self-training algorithm. Nucleic Acids Res. 33, 6494-6506.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Lynch, M. (2010). Evolution of the mutation rate. Trends Genet. 26, 345-352.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Madlung, A. (2013). Polyploidy and its effect on evolutionary success: old questions revisited with new tools. Heredity 110, 99-104.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Mailund, T., Halager, A.E., Westergaard, M., Dutheil, J.Y., Munch, K., Andersen, L.N., Lunter, G., Prüfer, K., Scally, A., Hobolth, A., andSchierup, M.H. (2012). A new Isolation with migration model along complete genomes infers very different divergence processesamong closely related great ape species. PLoS Genet. 8, e1003125.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Majoros, W.H., Pertea, M., and Salzberg, S.L. (2004). TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.Bioinformatics 20, 2878-2879.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Mangerud, J., Jakobsson, M., Alexanderson, H., Astakhov, V., Clarke, G.K.C., Henriksen, M., Hjort, C., Krinner, G., Lunkka, J.-P., Möller,P., Murray, A., Nikolskaya, O., Saarnisto, M., and Svendsen, J.I. (2004). Ice-dammed lakes and rerouting of the drainage of northernEurasia during the Last Glaciation. Quatern. Sci. Rev. 23, 1313-1332.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Marçais, G., and Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics27, 764-770.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Mason, A.S., and Pires, J.C. (2015). Unreduced gametes: meiotic mishap or evolutionary mechanism? Trends Genet. 31, 5-10.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

McClintock, B. (1984). The significance of responses of the genome to challenge. Science 226, 792-801.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Mierziak, J., Kostyn, K., and Kulma, A. (2014). Flavonoids as important molecules of plant interactions with the environment. Molecules19, 16240-16265.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Mouradov, A., and Spangenberg, G. (2014). Flavonoids: a metabolic network mediating plants adaptation to their real estate. Front.Plant Sci. 5, 620.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Munch, D., Gupta, V., Bachmann, A., Busch, W., Kelly, S., Mun, T., and Andersen, S.U. (2018). The Brassicaceae family displaysdivergent, shoot-skewed NLR resistance gene expression. Plant Physiol. 176, 1598-1609.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Nadachowska-Brzyska, K., Burri, R., Smeds, L., and Ellegren, H. (2016). PSMC analysis of effective population sizes in molecularecology and its application to black-and-white Ficedula flycatchers. Mol. Ecol. 25, 1058-1072.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Novikova, P.Y., Hohmann, N., Nizhynska, V., Tsuchimatsu, T., Ali, J., Muir, G., Guggisberg, A., Paape, T., Schmid, K., Fedorenko, O.M.,Holm, S., Sall, T., Schlotterer, C., Marhold, K., Widmer, A., Sese, J., Shimizu, K.K., Weigel, D., Kramer, U., Koch, M.A., and Nordborg, M.(2016). Sequencing of the genus Arabidopsis identifies a complex history of nonbifurcating speciation and abundant trans-specificpolymorphism. Nat. Genet. 48, 1077-1082.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ossowski, S., Schneeberger, K., Lucas-Lledó, J.I., Warthmann, N., Clark, R.M., Shaw, R.G., Weigel, D., and Lynch, M. (2010). The rateand molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92-94.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Otto, S.P. (2007). The evolutionary consequences of polyploidy. Cell 131, 452-462.

Page 64: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ownbey, M. (1950). Natural hybridization and amphiploidy in the genus Tragopogon. Am. J. Bot. 37, 487-499.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Paape, T., Hatakeyama, M., Shimizu-Inatsugi, R., Cereghetti, T., Onda, Y., Kenta, T., Sese, J., and Shimizu, K.K. (2016). Conserved butAttenuated Parental Gene Expression in Allopolyploids: Constitutive Zinc Hyperaccumulation in the Allotetraploid Arabidopsiskamchatica. Mol Biol Evol 33, 2781-2800.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Raffl, C., Holderegger, R., Parson, W., and Erschbamer, B. (2008). Patterns in genetic diversity of Trifolium pallescens populations donot reflect chronosequence on alpine glacier forelands. Heredity 100, 526-532.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ramirez-Gonzalez, R.H., Borrill, P., Lang, D., Harrington, S.A., Brinton, J., Venturini, L., Davey, M., Jacobs, J., van Ex, F., Pasha, A.,Khedikar, Y., Robinson, S.J., Cory, A.T., Florio, T., Concia, L., Juery, C., Schoonbeek, H., Steuernagel, B., Xiang, D., Ridout, C.J.,Chalhoub, B., Mayer, K.F.X., Benhamed, M., Latrasse, D., Bendahmane, A., Wulff, B.B.H., Appels, R., Tiwari, V., Datla, R., Choulet, F.,Pozniak, C.J., Provart, N.J., Sharpe, A.G., Paux, E., Spannagl, M., Brautigam, A., and Uauy, C. (2018). The transcriptional landscape ofpolyploid wheat. Science 361.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ramsey, J., and Schemske, D.W. (1998). Pathways, mechanisms, and rates of polyploid formation in flowering plants. Annu. Rev. Ecol.Syst. 29, 467-501.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Rapp, R.A., Udall, J.A., and Wendel, J.F. (2009). Genomic expression dominance in allopolyploids. BMC Biol. 7, 18.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor package for differential expression analysis of digitalgene expression data. Bioinformatics 26, 139-140.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Sanggaard, K.W., Bechsgaard, J.S., Fang, X., Duan, J., Dyrlund, T.F., Gupta, V., Jiang, X., Cheng, L., Fan, D., Feng, Y., Han, L., Huang, Z.,Wu, Z., Liao, L., Settepani, V., Thøgersen, I.B., Vanthournout, B., Wang, T., Zhu, Y., Funch, P., Enghild, J.J., Schauser, L., Andersen, S.U.,Villesen, P., Schierup, M.H., Bilde, T., and Wang, J. (2014). Spider genomes provide insight into composition and evolution of venomand silk. Nature Communications 5, 3765.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Schiffels, S., and Durbin, R. (2014). Inferring human population size and separation history from multiple genome sequences. NatGenet 46, 919-925.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation of the maize subgenomes by genome dominance and bothancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA 108, 4069-4074.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Selmecki, A.M., Maruvka, Y.E., Richmond, P.A., Guillet, M., Shoresh, N., Sorenson, A.L., De, S., Kishony, R., Michor, F., Dowell, R., andPellman, D. (2015). Polyploidy can drive rapid adaptation in yeast. Nature 519, 349-352.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. (2015). BUSCO: assessing genome assembly andannotation completeness with single-copy orthologs. Bioinformatics 31, 3210-3212.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Smit, A.F.A., Hubley, R., and Green, P. (2015). RepeatMasker Open-4.0 (http://www.repeatmasker.org Downloaded December 2018).

Soltis, D.E., Visger, C.J., Marchant, D.B., and Soltis, P.S. (2016). Polyploidy: Pitfalls and paths to a paradigm. Am. J. Bot. 103, 1146-1166.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Soltis, D.E., Soltis, P.S., Pires, J.C., Kovarik, A., Tate, J.A., and Mavrodiev, E. (2004). Recent and recurrent polyploidy in Tragopogon(Asteraceae): cytogenetic, genomic and genetic comparisons. Biol. J. Linn. Soc. 82, 485-501.

Page 65: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Soltis, P.S., and Soltis, D.E. (2016). Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol. 30, 159-165.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Soltis, P.S., Marchant, D.B., Van de Peer, Y., and Soltis, D.E. (2015). Polyploidy and genome evolution in plants. Curr. Opin. Genet. Dev.35, 119-125.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B. (2006). AUGUSTUS: ab initio prediction of alternativetranscripts. Nucleic Acids Res 34, W435-439.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Tang, H., Krishnakumar, V., Bidwell, S., Rosen, B., Chan, A., Zhou, S., Gentzbittel, L., Childs, K.L., Yandell, M., Gundlach, H., Mayer,K.F., Schwartz, D.C., and Town, C.D. (2014). An improved genome release (version Mt4.0) for the model legume Medicago truncatula.BMC Genomics 15, 312.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Tayalé, A., and Parisod, C. (2013). Natural pathways to polyploidy in plants and consequences for genome reorganization. Cytogenet.Genome Res. 140, 79-96.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012).Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Prot. 7, 562-578.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Tsafrir, D., Tsafrir, I., Ein-Dor, L., Zuk, O., Notterman, D.A., and Domany, E. (2005). Sorting points into neighborhoods (SPIN): dataanalysis and visualization by ordering distance matrices. Bioinformatics 21, 2301-2308.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Vanneste, K., Maere, S., and Van de Peer, Y. (2014a). Tangled up in two: a burst of genome duplications at the end of the Cretaceousand the consequences for plant evolution. Philos. Trans. R. Soc. B 369, 20130353.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Vanneste, K., Baele, G., Maere, S., and Van de Peer, Y. (2014b). Analysis of 41 plant genomes supports a wave of successful genomeduplications in association with the Cretaceous–Paleogene boundary. Genome Res. 24, 1334-1347.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Wang, J., Tian, L., Madlung, A., Lee, H.S., Chen, M., Lee, J.J., Watson, B., Kagochi, T., Comai, L., and Chen, Z.J. (2004). Stochastic andepigenetic changes of gene expression in Arabidopsis polyploids. Genetics 167, 1961-1973.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Wang, X., Wang, H., Wang, J., Sun, R., Wu, J., Liu, S., Bai, Y., Mun, J.H., Bancroft, I., Cheng, F., Huang, S., Li, X., Hua, W., Freeling, M.,Pires, J.C., Paterson, A.H., Chalhoub, B., Wang, B., Hayward, A., Sharpe, A.G., Park, B.S., Weisshaar, B., Liu, B., Li, B., Tong, C., Song,C., Duran, C., Peng, C., Geng, C., Koh, C., Lin, C., Edwards, D., Mu, D., Shen, D., Soumpourou, E., Li, F., Fraser, F., Conant, G., Lassalle,G., King, G.J., Bonnema, G., Tang, H., Belcram, H., Zhou, H., Hirakawa, H., Abe, H., Guo, H., Jin, H., Parkin, I.A., Batley, J., Kim, J.S., Just,J., Li, J., Xu, J., Deng, J., Kim, J.A., Yu, J., Meng, J., Min, J., Poulain, J., Hatakeyama, K., Wu, K., Wang, L., Fang, L., Trick, M., Links, M.G.,Zhao, M., Jin, M., Ramchiary, N., Drou, N., Berkman, P.J., Cai, Q., Huang, Q., Li, R., Tabata, S., Cheng, S., Zhang, S., Sato, S., Sun, S.,Kwon, S.J., Choi, S.R., Lee, T.H., Fan, W., Zhao, X., Tan, X., Xu, X., Wang, Y., Qiu, Y., Yin, Y., Li, Y., Du, Y., Liao, Y., Lim, Y., Narusaka, Y.,Wang, Z., Li, Z., Xiong, Z., and Zhang, Z. (2011). The genome of the mesopolyploid crop species Brassica rapa. Nat. Genet. 43, 1035-1039.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Williams, C.A., and Grayer, R.J. (2004). Anthocyanins and other flavonoids. Natural product reports 21, 539-573.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Williams, W.M., Mason, K.M., and Williamson, M.L. (1998). Genetic analysis of shikimate dehydrogenase allozymes in Trifolium repens L.Theor. Appl. Genet. 96, 859-868.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

https://www.ncbi.nlm.nih.gov/pubmed?term=Wang%2C X%2E%2C Wang%2C H%2E%2C Wang%2C J%2E%2C Sun%2C R%2E%2C Wu%2C J%2E%2C Liu%2C S%2E%2C Bai%2C Y%2E%2C Mun%2C J%2EH%2E%2C Bancroft%2C I%2E%2C Cheng%2C F%2E%2C Huang%2C S%2E%2C Li%2C X%2E%2C Hua%2C W%2E%2C Freeling%2C M%2E%2C Pires%2C J%2EC%2E%2C Paterson%2C A%2EH%2E%2C Chalhoub%2C B%2E%2C Wang%2C B%2E%2C Hayward%2C A%2E%2C Sharpe%2C A%2EG%2E%2C Park%2C B%2ES%2E%2C Weisshaar%2C B%2E%2C Liu%2C B%2E%2C Li%2C B%2E%2C Tong%2C C%2E%2C Song%2C C%2E%2C Duran%2C C%2E%2C Peng%2C C%2E%2C Geng%2C C%2E%2C Koh%2C C%2E%2C Lin%2C C%2E%2C Edwards%2C D%2E%2C Mu%2C D%2E%2C Shen%2C D%2E%2C Soumpourou%2C E%2E%2C Li%2C F%2E%2C Fraser%2C F%2E%2C Conant%2C G%2E%2C Lassalle%2C G%2E%2C King%2C G%2EJ%2E%2C Bonnema%2C G%2E%2C Tang%2C H%2E%2C Belcram%2C H%2E%2C Zhou%2C H%2E%2C Hirakawa%2C H%2E%2C Abe%2C H%2E%2C Guo%2C H%2E%2C Jin%2C H%2E%2C Parkin%2C I%2EA%2E%2C Batley%2C J%2E%2C Kim%2C J%2ES%2E%2C Just%2C J%2E%2C Li%2C J%2E%2C Xu%2C J%2E%2C Deng%2C J%2E%2C Kim%2C J%2EA%2E%2C Yu%2C J%2E%2C Meng%2C J%2E%2C Min%2C J%2E%2C Poulain%2C J%2E%2C Hatakeyama%2C K%2E%2C Wu%2C K%2E%2C Wang%2C L%2E%2C Fang%2C L%2E%2C Trick%2C M%2E%2C Links%2C M%2EG%2E%2C Zhao%2C M%2E%2C Jin%2C M%2E%2C Ramchiary%2C N%2E%2C Drou%2C N%2E%2C Berkman%2C P%2EJ%2E%2C Cai%2C Q%2E%2C Huang%2C Q%2E%2C Li%2C R%2E%2C Tabata%2C S%2E%2C Cheng%2C S%2E%2C Zhang%2C S%2E%2C Sato%2C S%2E%2C Sun%2C S%2E%2C Kwon%2C S%2EJ%2E%2C Choi%2C S%2ER%2E%2C Lee%2C T%2EH%2E%2C Fan%2C W%2E%2C Zhao%2C X%2E%2C Tan%2C X%2E%2C Xu%2C X%2E%2C Wang%2C Y%2E%2C Qiu%2C Y%2E%2C Yin%2C Y%2E%2C Li%2C Y%2E%2C Du%2C Y%2E%2C Liao%2C Y%2E%2C Lim%2C Y%2E%2C Narusaka%2C Y%2E%2C Wang%2C Z%2E%2C Li%2C Z%2E%2C Xiong%2C Z%2E%2C and Zhang%2C Z%2E %282011%29%2E The genome of the mesopolyploid crop species Brassica rapa%2E Nat%2E Genet&dopt=abstract
Page 66: Breaking Free: the Genomics of Allopolyploidy-facilitated ... · 79 biodiversity (Leitch and Leitch, 2008). Genome duplication leads to autopolyploidy, whereas 80 interspecific hybridization

Williams, W.M., Ellison, N.W., Ansari, H.A., Verry, I.M., and Hussain, S.W. (2012). Experimental evidence for the ancestry of allotetraploidTrifolium repens and creation of synthetic forms with value for plant breeding. BMC Plant Biol. 12, 55.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Wood, D.E., and Salzberg, S.L. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15,R46.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Woodhouse, M.R., Cheng, F., Pires, J.C., Lisch, D., Freeling, M., and Wang, X. (2014). Origin, inheritance, and gene regulatoryconsequences of genome dominance in polyploids. Proc. Natl. Acad. Sci. USA 111, 5283-5288.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Wu, T.D., and Watanabe, C.K. (2005). GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics21, 1859-1875.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Yang, J., Liu, D., Wang, X., Ji, C., Cheng, F., Liu, B., Hu, Z., Chen, S., Pental, D., Ju, Y., Yao, P., Li, X., Xie, K., Zhang, J., Wang, J., Liu, F.,Ma, W., Shopan, J., Zheng, H., Mackenzie, S.A., and Zhang, M. (2016). The genome sequence of allopolyploid Brassica juncea andanalysis of differential homoeolog gene expression influencing selection. Nat. Genet. 48, 1225-1232.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Yokoyama, Y., Lambeck, K., De Deckker, P., Johnston, P., and Fifield, L.K. (2000). Timing of the Last Glacial Maximum from observedsea-level minima. Nature 406, 713-716.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Yoo, M.J., Szadkowski, E., and Wendel, J.F. (2013). Homoeolog expression bias and expression level dominance in allopolyploidcotton. Heredity 110, 171-180.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Zerbino, D.R., and Birney, E. (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821-829.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Zhang, D., Pan, Q., Cui, C., Tan, C., Ge, X., Shao, Y., and Li, Z. (2015a). Genome-specific differential gene expressions in resynthesizedBrassica allotetraploids from pair-wise crosses of three cultivated diploids revealed by RNA-seq. Front. Plant Sci. 6, 957.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Zhang, T., Hu, Y., Jiang, W., Fang, L., Guan, X., Chen, J., Zhang, J., Saski, C.A., Scheffler, B.E., Stelly, D.M., Hulse-Kemp, A.M., Wan, Q.,Liu, B., Liu, C., Wang, S., Pan, M., Wang, Y., Wang, D., Ye, W., Chang, L., Zhang, W., Song, Q., Kirkbride, R.C., Chen, X., Dennis, E.,Llewellyn, D.J., Peterson, D.G., Thaxton, P., Jones, D.C., Wang, Q., Xu, X., Zhang, H., Wu, H., Zhou, L., Mei, G., Chen, S., Tian, Y., Xiang,D., Li, X., Ding, J., Zuo, Q., Tao, L., Liu, Y., Li, J., Lin, Y., Hui, Y., Cao, Z., Cai, C., Zhu, X., Jiang, Z., Zhou, B., Guo, W., Li, R., and Chen, Z.J.(2015b). Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat.Biotechnol. 33, 531-537.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Zhang, X., Zhang, Y., Yan, R., Han, J., Fuzeng, H., Wang, J., and Cao, K. (2010). Genetic variation of white clover (Trifolium repens L.)collections from China detected by morphological traits, RAPD and SSR. Afr. J. Biotechnol. 9, 3033-3041.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Zimin, A.V., Marçais, G., Puiu, D., Roberts, M., Salzberg, S.L., and Yorke, J.A. (2013). The MaSuRCA genome assembler. Bioinformatics29, 2669-2677.

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title