Upload
jrossibarra
View
368
Download
0
Embed Size (px)
Citation preview
Adaptation in maize: domestication and beyond
Jeffrey Ross-Ibarra @jrossibarra • www.rilab.org
Dept. Plant Sciences • Center for Population Biology • Genome Center University of California Davis
photo by lady_lbrty
Brandon Gaut
Hocholdinger & Hoecker 2007 TIP Genetics
maizeteosinte
Suketoshi Taba
Geo
grap
hica
l Bre
adth
0
0.02
0.04
0.06
0.08
0.1
mai
ze
pota
to
whe
at
soyb
ean
sorg
hum
barle
y
sunfl
ower
rice
grou
ndnu
t
rape
seed
cass
ava
mill
et rye
suga
rcan
e
oilp
alm
suga
rbee
tHake & Ross-Ibarra 2015 eLife
hard sweepD
iver
sity
hard sweep multiple mutations
Div
ersit
ystanding variation
“soft” sweeps
hard sweep multiple mutations polygenic adaptation
Div
ersit
ystanding variation
“soft” sweeps
M T G P H R L
GGTCGAC ATG ACT GGT CCA CAT CGA CTG TAG
M T G P H R L
GGTCGAC ATG ACT GGT CCA CAT CGA CTG TAG
M T N P H R L
GGTCGAC ATG ACT GAT CCA CAT CGA CTG TAG
M T G P H R L
GGTAAAC ATG ACT GGT CCA CAT CGA CTG TAG
GG—-AC ATG ACT GGT CCA CAT CGA CTG TAG
Meyerowitz 1994 Current Biology Duvick et al. 1999 US 6639132 B1
maize origins
Tripsacum extinct maizeF1F1
teosinte (Z. mays ssp. parviglumis)
maize (Z. mays ssp. mays)
F1
Beadle 1979 Field Museum of Nat. Hist. Bull.
F1
Beadle 1979 Field Museum of Nat. Hist. Bull.
F1
Beadle 1979 Field Museum of Nat. Hist. Bull.
F1
Beadle 1979 Field Museum of Nat. Hist. Bull.
Briggs et al. 2007 Genetics
1 2 3 4 5
6 7 8 9 10
Wang et al. 2005 Nature
1 2 3 4 5
6 7 8 9 10
Figure 1.Phenotypes. a. Maize ear showing the cob (cb) exposed at top. b. Teosinte ear with the rachisinternode (in) and glume (gl) labeled. c. Teosinte ear from a plant with a maize allele of tga1introgressed into it. d. Close-up of a single teosinte fruitcase. e. Close-up of a fruitcase fromteosinte plant with a maize allele of tga1 introgressed into it. f. Ear of maize inbred W22(Tga1-maize allele) with the cob exposed showing the small white glumes at the base. g. Earof maize inbred W22:tga1 which carries the teosinte allele, showing enlarged (white) glumes.h. Ear of maize inbred W22 carrying the tga1-ems1 allele, showing enlarged glumes. For highermagnification copies of f–h see Supplementary Information.
Wang et al. Page 10
Nature. Author manuscript; available in PMC 2006 May 23.
NIH
-PA
Author M
anuscriptN
IH-P
A A
uthor Manuscript
NIH
-PA
Author M
anuscript
tga1
Wang et al. 2005 Nature
1 2 3 4 5
6 7 8 9 10
Figure 1.Phenotypes. a. Maize ear showing the cob (cb) exposed at top. b. Teosinte ear with the rachisinternode (in) and glume (gl) labeled. c. Teosinte ear from a plant with a maize allele of tga1introgressed into it. d. Close-up of a single teosinte fruitcase. e. Close-up of a fruitcase fromteosinte plant with a maize allele of tga1 introgressed into it. f. Ear of maize inbred W22(Tga1-maize allele) with the cob exposed showing the small white glumes at the base. g. Earof maize inbred W22:tga1 which carries the teosinte allele, showing enlarged (white) glumes.h. Ear of maize inbred W22 carrying the tga1-ems1 allele, showing enlarged glumes. For highermagnification copies of f–h see Supplementary Information.
Wang et al. Page 10
Nature. Author manuscript; available in PMC 2006 May 23.
NIH
-PA
Author M
anuscriptN
IH-P
A A
uthor Manuscript
NIH
-PA
Author M
anuscript
tga1
1 2 3 4 5
6 7 8 9 10
gt1 tga1
Wills et al. 2013 PLoS Genetics
1 2 3 4 5
6 7 8 9 10
gt1 tga1
Wills et al. 2013 PLoS Genetics
teosinte maizeClint Whipple, BYU
1 2 3 4 5
6 7 8 9 10
gt1 tga1
Wills et al. 2013 PLoS Genetics
5’ control region 3’ UTR
1 2 3 4 5
6 7 8 9 10
tb1
Studer et al. 2011 Nat. Gen.; Vann et al. 2015 PeerJ
tga1gt1
1 2 3 4 5
6 7 8 9 10
tb1
Studer et al. 2011 Nat. Gen.; Vann et al. 2015 PeerJ
tga1
©20
11 N
atur
e A
mer
ica,
Inc.
All
righ
ts r
eser
ved.
NATURE GENETICS ADVANCE ONLINE PUBLICATION 3
L E T T E R S
mutation rate21, strongly suggesting that the Hopscotch insertion (and thus, the older Tourist as well) existed as standing genetic variation in the teosinte ancestor of maize. Thus, we conclude that the Hopscotch insertion likely predated domestication by more than 10,000 years and the Tourist insertion by an even greater amount of time.
We identified four fixed differences in the portion of the proximal and distal components of the control region that show evidence of selection. We used transient assays in maize leaf protoplasts to test all four differences for effects on gene expression. Maize and teosinte chromosomal segments for the portions of the proximal and distal components with these four differences were cloned into reporter constructs upstream of the minimal promoter of the cauliflower mosaic virus (mpCaMV), the firefly luciferase ORF and the nopaline synthase (NOS) terminator (Fig. 4). Each construct was assayed for luminescence after transformation by electroporation into maize pro-toplast. The constructs for the distal component contrast the effects of the Tourist insertion plus the single fixed nucleotide substitution that distinguish maize and teosinte. Both the maize and teosinte constructs for the distal component repressed luciferase expression
relative to the minimal promoter alone. The maize construct with Tourist excised gave luciferase expression equivalent to the native maize and teosinte constructs and less expression than the minimal promoter alone. These results indicate that this segment is function-ally important, acting as a repressor of luciferase expression and, by inference, of tb1 expression in vivo. However, we did not observe any difference between the maize and teosinte constructs as anticipated. One possible cause for the lack of differences in expression between the maize and teosinte constructs might be that additional proteins required to cause these differences are not present in maize leaf pro-toplast. Another possibility is that the factor affecting phenotype in the distal component lies in the unselected region between −64.8 and −69.5 kb, which is not included in the construct. Nevertheless, the results do indicate that the distal component has a functional element that acts as a repressor. The functional importance of this segment is supported by its low level of nucleotide diversity (Fig. 3a), suggesting a history of purifying selection.
The constructs for the proximal component of the control region contrast the effects of the Hopscotch insertion plus a single fixed nucleo-tide substitution that distinguish maize and teosinte. The construct with the maize sequence including Hopscotch increased expression of the luciferase reporter twofold relative to the teosinte construct for the proximal control region and the minimal promoter alone (Fig. 4). Luciferase expression was returned to the level of the teosinte con-struct and the minimal promoter construct by deleting the Hopscotch element from the full maize construct. These results indicate that the Hopscotch element enhances luciferase expression and, by
a
b
0.06
A B C D M
T
P = 0.95 P = 0.41 P = 0.04
HKA neutrality tests
P 0.0001
0.04
0.02
0–67 kb –66 kb
Distalcomponent
Teosinte clusterhaplotype
Maize clusterhaplotype
Proximalcomponent
–65 kbTourist408 bp
Hopscotch4,885 bp
–64 kb –58 kb
Figure 3 Sequence diversity in maize and teosinte across the control region. (a) Nucleotide diversity across the tb1 upstream control region. Base-pair positions are relative to AGPv2 position 265,745,977 of the maize reference genome sequence. P values correspond to HKA neutrality tests for regions A–D, as defined by the dotted lines. Green shading signifies evidence of neutrality, and pink shading signifies regions of non-neutral evolution. Nucleotide diversity ( ) for maize (yellow line) and teosinte (green line) were calculated using a 500-bp sliding window with a 25-bp step. The distal and proximal components of the control region with four fixed sequence differences between the most common maize haplotype and teosinte haplotype are shown below. (b) A minimum spanning tree for the control region with 16 diverse maize and 17 diverse teosinte sequences. Size of the circles for each haplotype group (yellow, maize; green, teosinte) is proportional to the number of individuals within that haplotype.
Transient assay constructs
mpCaMV luc
luc
luc
luc
luc
luc
luc
luc
Hopscotch
Tourist
mpCaMV
T-dist
M-dist
T-prox
M-prox
0 0.5 1.0 1.5 2.0
∆M-dist
∆M-proxPro
xim
al c
ontr
ol r
egio
nD
ista
l con
trol
reg
ion
Relative expression
Figure 4 Constructs and corresponding normalized luciferase expression levels. Transient assays were performed in maize leaf protoplast. Each construct is drawn to scale. The construct backbone consists of the minimal promoter from the cauliflower mosaic virus (mpCaMV, gray box), luciferase ORF (luc, white box) and the nopaline synthase terminator (black box). Portions of the proximal and distal components of the control region (hatched boxes) from maize and teosinte were cloned into restriction sites upstream of the minimal promoter. “ ” denotes the excision of either the Tourist or Hopscotch element from the maize construct. Horizontal green bars show the normalized mean with s.e.m. for each construct.
relative expressionconstruct
gt1
1 2 3 4 5
6 7 8 9 10
tb1Figure 2 Map of parviglumis Populations and Hopscotch allele frequency. Map showing the frequencyof the Hopscotch allele in populations of parviglumis where we sampled more than 6 individuals. Size ofcircles reflects number of individuals sampled. The Balsas River is shown, as the Balsas River Basin isbelieved to be the center of domestication of maize.
as our independent trait for phenotyping analyses. SAS code used for analysis is available athttp://dx.doi.org/10.6084/m9.figshare.1166630.
RESULTSGenotyping for the Hopscotch insertionThe genotype at the Hopscotch insertion was confirmed with two PCRs for 837 individualsof the 1,100 screened (Table S1 and Table S2). Among the 247 maize landrace accessionsgenotyped, all but eight were homozygous for the presence of the insertion Withinour parviglumis and mexicana samples we found the Hopscotch insertion segregatingin 37 (n = 86) and four (n = 17) populations, respectively, and at highest frequencywithin populations in the states of Jalisco, Colima, and Michoacan in central-westernMexico (Fig. 2). Using our Hopscotch genotyping, we calculated diVerentiation betweenpopulations (FST) and subspecies (FCT) for populations in which we sampled sixteenor more chromosomes. We found that FCT = 0, and levels of FST among populationswithin each subspecies (0.22) and among all populations (0.23) (Table 1) are similar togenome-wide estimates from previous studies Pyhajarvi, HuVord & Ross-Ibarra, 2013.Although we found large variation in Hopscotch allele frequency among our populations,BayEnv analysis did not indicate a correlation between the Hopscotch insertion andenvironmental variables (all Bayes Factors < 1).
Vann et al. (2015), PeerJ, DOI 10.7717/peerj.900 8/21
Studer et al. 2011 Nat. Gen.; Vann et al. 2015 PeerJ
tga1
©20
11 N
atur
e A
mer
ica,
Inc.
All
righ
ts r
eser
ved.
NATURE GENETICS ADVANCE ONLINE PUBLICATION 3
L E T T E R S
mutation rate21, strongly suggesting that the Hopscotch insertion (and thus, the older Tourist as well) existed as standing genetic variation in the teosinte ancestor of maize. Thus, we conclude that the Hopscotch insertion likely predated domestication by more than 10,000 years and the Tourist insertion by an even greater amount of time.
We identified four fixed differences in the portion of the proximal and distal components of the control region that show evidence of selection. We used transient assays in maize leaf protoplasts to test all four differences for effects on gene expression. Maize and teosinte chromosomal segments for the portions of the proximal and distal components with these four differences were cloned into reporter constructs upstream of the minimal promoter of the cauliflower mosaic virus (mpCaMV), the firefly luciferase ORF and the nopaline synthase (NOS) terminator (Fig. 4). Each construct was assayed for luminescence after transformation by electroporation into maize pro-toplast. The constructs for the distal component contrast the effects of the Tourist insertion plus the single fixed nucleotide substitution that distinguish maize and teosinte. Both the maize and teosinte constructs for the distal component repressed luciferase expression
relative to the minimal promoter alone. The maize construct with Tourist excised gave luciferase expression equivalent to the native maize and teosinte constructs and less expression than the minimal promoter alone. These results indicate that this segment is function-ally important, acting as a repressor of luciferase expression and, by inference, of tb1 expression in vivo. However, we did not observe any difference between the maize and teosinte constructs as anticipated. One possible cause for the lack of differences in expression between the maize and teosinte constructs might be that additional proteins required to cause these differences are not present in maize leaf pro-toplast. Another possibility is that the factor affecting phenotype in the distal component lies in the unselected region between −64.8 and −69.5 kb, which is not included in the construct. Nevertheless, the results do indicate that the distal component has a functional element that acts as a repressor. The functional importance of this segment is supported by its low level of nucleotide diversity (Fig. 3a), suggesting a history of purifying selection.
The constructs for the proximal component of the control region contrast the effects of the Hopscotch insertion plus a single fixed nucleo-tide substitution that distinguish maize and teosinte. The construct with the maize sequence including Hopscotch increased expression of the luciferase reporter twofold relative to the teosinte construct for the proximal control region and the minimal promoter alone (Fig. 4). Luciferase expression was returned to the level of the teosinte con-struct and the minimal promoter construct by deleting the Hopscotch element from the full maize construct. These results indicate that the Hopscotch element enhances luciferase expression and, by
a
b
0.06
A B C D M
T
P = 0.95 P = 0.41 P = 0.04
HKA neutrality tests
P 0.0001
0.04
0.02
0–67 kb –66 kb
Distalcomponent
Teosinte clusterhaplotype
Maize clusterhaplotype
Proximalcomponent
–65 kbTourist408 bp
Hopscotch4,885 bp
–64 kb –58 kb
Figure 3 Sequence diversity in maize and teosinte across the control region. (a) Nucleotide diversity across the tb1 upstream control region. Base-pair positions are relative to AGPv2 position 265,745,977 of the maize reference genome sequence. P values correspond to HKA neutrality tests for regions A–D, as defined by the dotted lines. Green shading signifies evidence of neutrality, and pink shading signifies regions of non-neutral evolution. Nucleotide diversity ( ) for maize (yellow line) and teosinte (green line) were calculated using a 500-bp sliding window with a 25-bp step. The distal and proximal components of the control region with four fixed sequence differences between the most common maize haplotype and teosinte haplotype are shown below. (b) A minimum spanning tree for the control region with 16 diverse maize and 17 diverse teosinte sequences. Size of the circles for each haplotype group (yellow, maize; green, teosinte) is proportional to the number of individuals within that haplotype.
Transient assay constructs
mpCaMV luc
luc
luc
luc
luc
luc
luc
luc
Hopscotch
Tourist
mpCaMV
T-dist
M-dist
T-prox
M-prox
0 0.5 1.0 1.5 2.0
∆M-dist
∆M-proxPro
xim
al c
ontr
ol r
egio
nD
ista
l con
trol
reg
ion
Relative expression
Figure 4 Constructs and corresponding normalized luciferase expression levels. Transient assays were performed in maize leaf protoplast. Each construct is drawn to scale. The construct backbone consists of the minimal promoter from the cauliflower mosaic virus (mpCaMV, gray box), luciferase ORF (luc, white box) and the nopaline synthase terminator (black box). Portions of the proximal and distal components of the control region (hatched boxes) from maize and teosinte were cloned into restriction sites upstream of the minimal promoter. “ ” denotes the excision of either the Tourist or Hopscotch element from the maize construct. Horizontal green bars show the normalized mean with s.e.m. for each construct.
relative expressionconstruct
gt1
hard sweep
M T N P H R L
GGTCGA ATG ACT GAT CCA CAT CGA CTG TAG
tga1
hard sweep
M T N P H R L
GGTCGA ATG ACT GAT CCA CAT CGA CTG TAG
tga1 gt1 tb1
Multiple Mutations
Standing Variation
M T G P H R L
GGTAAA ATG ACT GGT CCA CAT CGA CTG TAG
Vann et al. 2015 PeerJ
polygenic adaptation
30% phenotypic variance
0% phenotypic variance
Hufford et al. 2012 Nat. Gen. Chia et al. 2012 Nat. Gen
13 teosinte 23 maize
~500 genes (2%) 11M shared SNPs
3,000 fixed genomes:
Hufford et al. 2012 Nat. Gen. Chia et al. 2012 Nat. Gen
13 teosinte 23 maizegenomes:
Swanson-Wagner et al. 2012 PNAS
whereas others are lost after domestication (Fig. 3B). It should benoted that many of these genes have unique coexpression edges inmaize that are not observed in teosinte (Fig. S4B).
Expression data provide an opportunity to investigate furtherfunctional alterations to genes located within genomic regionsthat population genomic analyses identify as targets of selective
E
DE(n=612)
AEC(n=1115)
Dom/Imp genes(n=1761)
292 230750
894644
1582
A
B
Teosinte network edges Maize network edges
D
C
GRMZM2G068436
GRMZM2G137947
GRMZM2G375302
Mb
Mb
Fig. 3. Analysis of genes with altered expression or conservation and targets of selection during improvement and/or domestication. (A) Venn diagramshowing the overlap between DE genes, AEC genes, and the genes that occur in genomic regions that have evidence for selective sweeps during maizedomestication or improvement (Dom/Imp genes). (B) Teosinte coexpression networks for three genes (GRMZM2G068436, GRMZM2G137947, andGRMZM2G375302). (Right) Edges that are maintained in maize coexpression networks are shown. Although the differentially expressed gene (red node) ishighly connected in teosinte, most of these connections are lost in maize. However, some parts of the teosinte network are still conserved in maize. (C) Cross-population composite likelihood ratio test (XP-CLR) plot shows the evidence for a selective sweep that occurs on chromosome 9. The tick marks along the xaxis represent genes, and the red tick mark indicates the gene (GRMZM2G448355) that was chosen as the candidate target of selection and is differentiallyexpressed in maize and teosinte. The bar plot underneath the graph shows the expression levels of all maize (blue) and teosinte (red) samples. (D) XP-CLR plotfor a large region on chromosome 5. The candidate target of selection is indicated in green and shows similar expression in maize and teosinte. Two othergenes (red) exhibit DE. (E) Neighbor-joining tree shows the relationships among the haplotypes at GRMZM2G141858. (Right) Bar plot shows expression levelsfor each genotype; red bars indicate teosinte genotypes, and blue bars represent maize genotypes. At least one teosinte genotype (TIL15) contains thehaplotype that has been selected in maize and has expression levels similar to maize genotypes.
Table 2. Genes in selected regions with evidence for DE or AEC
Gene listNo. genes selectedduring dom/imp
% up-regulatedin maize Significance
% higher connectedin maize % candidates
AEC and DE (n = 276) 46 76 0.0002 41.3 39.1DE only (n = 336) 44 61 0.0230 40.9 22.7AEC only (n = 839) 89 54 0.1837 57.3 32.6
dom, domestication; imp, improvement.
4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1201961109 Swanson-Wagner et al.
Beissinger et al. In Prep
nucl
eotid
e di
vers
ity
distance to nearest substitution (cM)
Beissinger et al. In Prep
nucl
eotid
e di
vers
ity
distance to nearest substitution (cM)
how to adapt: domestication
standing variation
M T G P H R L
GGTAAA ATG ACT GGT CCA CAT CGA CTG TAG
polygenic adaptation
regulatory variation
teosinte
maize
Mexico lowland
9,000 BP
Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009
Mexico highland6,000 BP
Mexico lowland
9,000 BP
Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009
Mexico highland6,000 BP
S. America lowland
6,000 BP
Mexico lowland
9,000 BP
Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009
Mexico highland6,000 BP
S. America lowland
6,000 BP
S. America Highland
4,000 BP
Mexico lowland
9,000 BP
Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009
Mexico
phot
o by
Mon
thon
Wac
hira
sett
akul
Andes
phot
o by
Mat
t H
uffo
rd
SA MEX SA MEX
SA MEX SA MEX SA MEX SA MEX
Ear Height Plant Height
Tassel Br. Number
TW
Days to AnthesisSA MEX SA MEX
SA MEX SA MEX
LowlandHighland
ResultsPatterns of Genetic Structure and Differentiation. Principal com-ponents analysis (PCA) (17) of the maize SNP data identifies 58significant principal components (PCs) (explaining 37.6% oftotal variance), probably reflecting isolation by distance (18) andlinkage effects (19). We use the first nine PCs, which present thestrongest spatial autocorrelation (Fig. S2) and explain a largeportion of the total variance (18.7%), to cluster the accessionsinto 10 geographically distinct groups (Fig. 1A). Meso-Americanmaize falls into three groups: the Meso-American Lowlandgroup, which includes predominantly lowland accessions fromsoutheast Mexico and the Caribbean; the West Mexico group,representing both lowlands and highlands; and the MexicanHighland group, encompassing most of Matsuoka et al.’s high-land Mexican accessions (5) as well as accessions from highlandGuatemala. These clusters also confirm the presence of US-de-rived varieties in South America (20); we excluded these acces-sions from further analysis.In the joint PCA analysis of the three subspecies, the first PC
(10.8% of variance) separates maize from its wild relatives andconfirms the similarity between maize from the Mexican Highlandgroup and parviglumis (Fig. 1B). The second PC (4.8%of variance)mainly separates the genetic groups of maize along a north–southaxis, with the Northern United States and Andean Highlands atthe extremes. The third PC (2.7% of variance) predominatelyreflects the difference between parviglumis and mexicana. TheMexican Highland cluster extends toward mexicana along bothPC 1 and 3, suggesting that the similarity of highland maize toparviglumis may reflect admixture with mexicana.
Admixture Analysis. Simulation of gene flow of mexicana into theMeso-American Lowland maize group suggests that 13% cu-mulative historical introgression is sufficient to explain observeddifferences between lowland and highland maize in terms ofheterozygosity and differentiation from parviglumis (Fig. S3).Structure analysis (21) of all Mexican accessions lends supportfor this magnitude of introgression (Fig. 2). The three subspeciesform clearly separated clusters, but evidence of admixture is
evident in all three groups, and the two wild relatives show clearsigns of bidirectional introgression at altitudes where theirranges overlap (Fig. 2). Highland maize shows strong signs ofmexicana introgression, with 20% admixture observed in theMexican Highland cluster, but below 1,500 m mexicana in-trogression drops to less than 1%. Introgression from parviglumisinto maize is much lower overall, reaching its highest averagevalue (3%) in the lowland West Mexico group.
Drift Analysis. Because introgression from mexicana may affectancestry inference based on genetic distance from parviglumis, wetook an approach that does not require reference to the wild rel-atives. Under models of historical range expansion, genetic dif-ferentiation increases away from the population of origin (22, 23),and estimates of drift from ancestral frequencies have been appliedsuccessfully to identify ancestral populations (24). We thereforeapplied the method of Nicholson et al. (25) to estimate simulta-neously ancestral frequencies and F, a measure of genetic drift ofaway from these frequencies, for sets of predefined populations.To illustrate the potential impact ofmexicana introgression, we
first performed a standard analysis that includes each maizepopulation in turn in conjunction with the two wild relatives.Average drift away from the inferred common ancestor of maize,parviglumis, and mexicana is higher for maize (F = 0.24) than formexicana (F = 0.15) or parviglumis (F = 0.07), probably due tochanges in allele frequency following the domestication bottle-neck. Because the inferred ancestral frequencies are closer tothose of the wild relatives than to present-day maize, comparisonwith this ancestor is sensitive to introgression from these sub-species. It therefore is not surprising that estimates of F betweenindividual maize populations and the common ancestor of allthree taxa identify the Mexican Highland group as being mostsimilar (Fig. 3A). This pattern is maintained in an analysis ex-cluding mexicana, in which Mexican Highland maize is tied withtheWestMexico group as themost ancestral population (Fig. 3B).To mitigate the impact of introgression, we used a slightly
modified approach that excludes both parviglumis and mexicanaand calculates genetic drift with respect to ancestral frequenciesinferred from domesticated maize alone. Because the genetic
Fig. 1. (A) Map of sampled maize accessions colored by genetic group. (B) First three genetic PCs of all sampled accessions.
van Heerwaarden et al. PNAS | January 18, 2011 | vol. 108 | no. 3 | 1089
EVOLU
TION
van Heerwaarden et al. 2011 PNAS
ResultsPatterns of Genetic Structure and Differentiation. Principal com-ponents analysis (PCA) (17) of the maize SNP data identifies 58significant principal components (PCs) (explaining 37.6% oftotal variance), probably reflecting isolation by distance (18) andlinkage effects (19). We use the first nine PCs, which present thestrongest spatial autocorrelation (Fig. S2) and explain a largeportion of the total variance (18.7%), to cluster the accessionsinto 10 geographically distinct groups (Fig. 1A). Meso-Americanmaize falls into three groups: the Meso-American Lowlandgroup, which includes predominantly lowland accessions fromsoutheast Mexico and the Caribbean; the West Mexico group,representing both lowlands and highlands; and the MexicanHighland group, encompassing most of Matsuoka et al.’s high-land Mexican accessions (5) as well as accessions from highlandGuatemala. These clusters also confirm the presence of US-de-rived varieties in South America (20); we excluded these acces-sions from further analysis.In the joint PCA analysis of the three subspecies, the first PC
(10.8% of variance) separates maize from its wild relatives andconfirms the similarity between maize from the Mexican Highlandgroup and parviglumis (Fig. 1B). The second PC (4.8%of variance)mainly separates the genetic groups of maize along a north–southaxis, with the Northern United States and Andean Highlands atthe extremes. The third PC (2.7% of variance) predominatelyreflects the difference between parviglumis and mexicana. TheMexican Highland cluster extends toward mexicana along bothPC 1 and 3, suggesting that the similarity of highland maize toparviglumis may reflect admixture with mexicana.
Admixture Analysis. Simulation of gene flow of mexicana into theMeso-American Lowland maize group suggests that 13% cu-mulative historical introgression is sufficient to explain observeddifferences between lowland and highland maize in terms ofheterozygosity and differentiation from parviglumis (Fig. S3).Structure analysis (21) of all Mexican accessions lends supportfor this magnitude of introgression (Fig. 2). The three subspeciesform clearly separated clusters, but evidence of admixture is
evident in all three groups, and the two wild relatives show clearsigns of bidirectional introgression at altitudes where theirranges overlap (Fig. 2). Highland maize shows strong signs ofmexicana introgression, with 20% admixture observed in theMexican Highland cluster, but below 1,500 m mexicana in-trogression drops to less than 1%. Introgression from parviglumisinto maize is much lower overall, reaching its highest averagevalue (3%) in the lowland West Mexico group.
Drift Analysis. Because introgression from mexicana may affectancestry inference based on genetic distance from parviglumis, wetook an approach that does not require reference to the wild rel-atives. Under models of historical range expansion, genetic dif-ferentiation increases away from the population of origin (22, 23),and estimates of drift from ancestral frequencies have been appliedsuccessfully to identify ancestral populations (24). We thereforeapplied the method of Nicholson et al. (25) to estimate simulta-neously ancestral frequencies and F, a measure of genetic drift ofaway from these frequencies, for sets of predefined populations.To illustrate the potential impact ofmexicana introgression, we
first performed a standard analysis that includes each maizepopulation in turn in conjunction with the two wild relatives.Average drift away from the inferred common ancestor of maize,parviglumis, and mexicana is higher for maize (F = 0.24) than formexicana (F = 0.15) or parviglumis (F = 0.07), probably due tochanges in allele frequency following the domestication bottle-neck. Because the inferred ancestral frequencies are closer tothose of the wild relatives than to present-day maize, comparisonwith this ancestor is sensitive to introgression from these sub-species. It therefore is not surprising that estimates of F betweenindividual maize populations and the common ancestor of allthree taxa identify the Mexican Highland group as being mostsimilar (Fig. 3A). This pattern is maintained in an analysis ex-cluding mexicana, in which Mexican Highland maize is tied withtheWestMexico group as themost ancestral population (Fig. 3B).To mitigate the impact of introgression, we used a slightly
modified approach that excludes both parviglumis and mexicanaand calculates genetic drift with respect to ancestral frequenciesinferred from domesticated maize alone. Because the genetic
Fig. 1. (A) Map of sampled maize accessions colored by genetic group. (B) First three genetic PCs of all sampled accessions.
van Heerwaarden et al. PNAS | January 18, 2011 | vol. 108 | no. 3 | 1089
EVOLU
TION
van Heerwaarden et al. 2011 PNAS
95 samples ~100K SNPs
Takuno et al. 2015 Genetics
-Log
p-v
alue
Fst
S. A
mer
ica
-Log p-value Fst Mexico
shared SNPs
unique S. America
unique Mexico
95 samples ~100K SNPs
Takuno et al. 2015 Genetics
-Log
p-v
alue
Fst
S. A
mer
ica
-Log p-value Fst Mexico
shared SNPs
unique S. America
unique Mexico
39%61%
IntergenicGenic
19%
81%
Standing VariationNew mutation
Takuno et al. 2015 Genetics
Beissinger et al. In Prep Berg & Coop 2014 PLoS Genetics
Beissinger et al. In Prep Berg & Coop 2014 PLoS Genetics
Z =LX
i=1
↵ipi
allele freq.population breeding value
effect size
Beissinger et al. In Prep Berg & Coop 2014 PLoS Genetics
Z =LX
i=1
↵ipi
allele freq.population breeding value
effect size
relatednessdispersion
add. genetic var.
QX =~Z 0TF�1 ~Z 0
2VA
Beissinger et al. In Prep Berg & Coop 2014 PLoS Genetics
Warm
Cold
Beissinger et al. In Prep Berg & Coop 2014 PLoS Genetics
Warm
Cold
Beissinger et al. In Prep Berg & Coop 2014 PLoS Genetics
Warm
domesticationhow to adapt: ——————————-
standing variation
M T G P H R L
GGTAAA ATG ACT GGT CCA CAT CGA CTG TAG
polygenic adaptation
regulatory variation
local adaptation
hard sweep multiple mutations
standing variation
polygenic adaptation
new mutation
gene flow new mutation
Piperno 2006; Perry et al. 2006; Piperno et al. 2009
Mexico highland6,000 BP
domestication in Mexico lowland
9,000 BP
PhotobyPesachLubinsky
mexicanamaize
mexicana parviglumis
Lauter et al. (2004) Genetics
Lowland
Photos: Ruairidh Sawers, LANGEBIO
Highland
maizeteosinte
K = 2
K = 7
K = 6
K = 5
K = 4
K = 3
K = 8
K = 9
K = 10
mexicana maize references
K = 2
K = 7
K = 6
K = 5
K = 4
K = 3
K = 8
K = 9
K = 10
mexicana maize references
Hufford et al. 2013 PLoS Genetics
0 1000 2000 3000 4000 5000
-412500
-410500
San Pedro Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-447000
-444000
Nabogame Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-452000-450000
Santa Clara Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-411000
-409000
El Porvenir Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-406500
-404500
Tenango del Aire Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-420000
-418500
Puruandiro Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-440000-436000
Ixtlan Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-418000
-416000
-414000
Xochimilco Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-418000
-416500
Opopeo Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-255500
-254000
San Pedro Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-292000
-290000
Nabogame Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-293000-291500-290000
Santa Clara Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-290000
-288000
El Porvenir Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-296500
-294500
Tenango del Aire Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-286000-284000
Puruandiro Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-311500
-310000
Ixtlan Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-222000
-220500
Xochimilco Likelihoods
generations
com
p. lo
g lik
elih
ood
0 1000 2000 3000 4000 5000
-292500
-291000
Opopeo Likelihoods
generations
com
p. lo
g lik
elih
ood
A
B
mexicana into maize
0 1000 2000 3000 4000 5000
-412500
-410500
San Pedro Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-447000
-444000
Nabogame Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-452000-450000
Santa Clara Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-411000
-409000
El Porvenir Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-406500
-404500
Tenango del Aire Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-420000
-418500
Puruandiro Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-440000-436000
Ixtlan Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-418000
-416000
-414000
Xochimilco Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-418000
-416500
Opopeo Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-255500
-254000
San Pedro Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-292000
-290000
Nabogame Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-293000-291500-290000
Santa Clara Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-290000
-288000
El Porvenir Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-296500
-294500
Tenango del Aire Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-286000-284000
Puruandiro Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-311500
-310000
Ixtlan Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-222000
-220500
Xochimilco Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
0 1000 2000 3000 4000 5000
-292500
-291000
Opopeo Likelihoods
generations
co
mp
. lo
g lik
elih
oo
d
A
B
maize into mexicana
years BPyears BP
El Porvenir
Opopeo
Xochimilco
Puruandiro
Tenango del Aire
Ixtlan
Nabogame
Santa Clara
San Pedro
Allopatric
El Porvenir
Opopeo
Xochimilco
Puruandiro
Tenango del Aire
Ixtlan
Nabogame
Santa Clara
San Pedro
Allopatric
Hufford et al. 2013 PLoS Genetics
El Porvenir
Opopeo
Xochimilco
Puruandiro
Tenango del Aire
Ixtlan
Nabogame
Santa Clara
San Pedro
Allopatric
Inv4n
El Porvenir
Opopeo
Xochimilco
Puruandiro
Tenango del Aire
Ixtlan
Nabogame
Santa Clara
San Pedro
Allopatric
Hufford et al. 2013 PLoS Genetics
El Porvenir
Opopeo
Xochimilco
Puruandiro
Tenango del Aire
Ixtlan
Nabogame
Santa Clara
San Pedro
Allopatric
Inv4n
El Porvenir
Opopeo
Xochimilco
Puruandiro
Tenango del Aire
Ixtlan
Nabogame
Santa Clara
San Pedro
Allopatric
Hufford et al. 2013 PLoS Genetics
Introgression
NoIntrogression
Hufford et al. 2013 PLoS Genetics
Lauter et al. 2004 Genetics
Hufford et al. 2013 PLoS Genetics
Lauter et al. 2004 Genetics
Inv4n
b1
Moose et al. 2004 Genetics
phot
o by
Ed
Coe
mhl1
Hufford et al. 2013 PLoS Genetics
Lauter et al. 2004 Genetics
Inv4n
b1
Moose et al. 2004 Genetics
phot
o by
Ed
Coe
mhl1
Hufford et al. 2013 PLoS Genetics
0 50 100 150 200 250 300
0.0
0.6
Chromosome 1
bp
pro
po
rtio
n o
f p
op
ula
tio
ns
0 50 100 150 200
0.0
0.6
Chromosome 2
bp
pro
po
rtio
n o
f p
op
ula
tio
ns
0 50 100 150 200
0.0
0.6
Chromosome 3
bp
pro
po
rtio
n o
f p
op
ula
tio
ns
0 50 100 150 200 250
0.0
0.6
Chromosome 4
bp
pro
po
rtio
n o
f p
op
ula
tio
ns
0 50 100 150 200
0.0
0.6
Chromosome 5
bp
pro
po
rtio
n o
f p
op
ula
tio
ns
0 50 100 150
0.0
0.6
Chromosome 6
bp
pro
po
rtio
n o
f p
op
ula
tio
ns
0 50 100 150
0.0
0.6
Chromosome 7
bp
pro
po
rtio
n o
f p
op
ula
tio
ns
0 50 100 150
0.0
0.6
Chromosome 8
bp
pro
po
rtio
n o
f p
op
ula
tio
ns
0 50 100 150
0.0
0.6
Chromosome 9
bp
pro
po
rtio
n o
f p
op
ula
tio
ns
0 50 100 150
0.0
0.6
Chromosome 10
bp
pro
po
rtio
n o
f p
op
ula
tio
ns
Mb
Mb
Mb
Mb
Mb
Mb
Mb
Mb
Mb
Mb
B
gt1 tb1bif2
zfl2 pbf1
zag2 ba1
su1tga1 bt2
ae1
zag1
ra1
ga1
tcb1
ga2
�)LJXUH�6�
resi
stan
ce
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science
distance to nearest substitution (cM)
scal
ed d
iver
sity
• Ne diploids
• µ beneficial mutations rate per haploid genome
• selection from standing variation when 2Neµ < 1
Messer and Petrov 2013 TIG
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science
distance to nearest substitution (cM)
scal
ed d
iver
sity
2Neµ > 1
2Neµ < 1
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science
distance to nearest substitution (cM)
scal
ed d
iver
sity
Ne ~ 150,000 Ne ~ 10,000*
Ne ~ 2,000,000 Ne ~ 600,000
2Neµ > 1
2Neµ < 1
M T G P H R L
ATG ACT GGT CCA CAT CGA CTG TAG
M T G P H R L
ATG ACT GGT CCA CAT CGA CTG TAG
M T N P H R L
ATG ACT GAT CCA CAT CGA CTG TAG
M T G P H R L
ATG ACT GGT CCA CAT CGA CTG TAG
M T N P H R L
ATG ACT GAT CCA CAT CGA CTG TAG
M T G P H R L
ATG ACT GGT CCA CAT CGA CTG TAG
M T N P H R L
ATG ACT GAT CCA CAT CGA CTG TAG
x x x
M T G P H R L
ATG ACT GGT CCA CAT CGA CTG TAG
M T N P H R L
ATG ACT GAT CCA CAT CGA CTG TAG
x xx x
M T G P H R L
ATG ACT GGT CCA CAT CGA CTG TAG
M T N P H R L
ATG ACT GAT CCA CAT CGA CTG TAG
x xx x
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science
distance to nearest substitution (cM)
scal
ed d
iver
sity
Ne ~ 150,000 Ne ~ 10,000*
Ne ~ 2,000,000 Ne ~ 600,000
2Neµ > 1
2Neµ < 1
Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science
distance to nearest substitution (cM)
scal
ed d
iver
sity
Ne ~ 150,000 Ne ~ 10,000*
Ne ~ 2,000,000 Ne ~ 600,000µ ∝ 130 x 106bp µ ∝ 220 x 106bp
µ ∝ 2,500 x 106bp µ ∝ 3,100 x 106bp
2Neµ > 1
2Neµ < 1
Brandon Gaut
maizeArabidopsisKew C-Value Database
log 1C genome size
Bilinski et al. In Prep
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis1C
Gen
ome
Size
(Gb)
Altitudehighland
lowland
2.50
2.75
3.00
3.25
3.50
3.75
MH ML SAH SAL mexicana parviglumis
1C G
enom
e Si
ze (G
b)
Altitudehighland
lowland
Bilinski et al. In Prep
Bilinski et al. In Prep
Bilinski et al. In Prep
knob 350 (Tr1)
knob 180
Bilinski et al. In Prep
altitude kinship errorrepeat abundance
Bilinski et al. In Prep
altitude kinship errorrepeat abundance
Bilinski et al. In Prep
knob 180knob 350
32 TE familiesoverall TE contenttotal genome size
altitude kinship errorrepeat abundance
genome size
Bilinski et al. In Prep
knob 180knob 350
32 TE familiesoverall TE contenttotal genome size
altitude kinship errorrepeat abundance
genome size
Bilinski et al. In Prep
total genome size
Rayburn et al. 1994 Plant Breeding Francis et al. 2008. Ann. Bot.
cycle time that did not exceed 20 h compared with a muchgreater spread of cycle times for the monocots. If DNAmass per se is the limiting factor for cell cycle time, wehypothesize that cycle times would be the same for dicotsand monocots of comparable C-value. This is so even ifthe data for Scilla sibirica and Trillium grandiflorum are
excluded. Indeed, if we ignore the marked discontinuityof the y-axis caused by their inclusion, then the nucleotypiceffect is strong for all species regardless of phylogeny. Totest the rigour of these hypotheses would require data toplug the gap between Trillium grandiflorum and themajority of C-value/cell cycle times analysed here.
Separate plots for diploids and polyploids show a strongnucleotypic effect on CCT in diploids (Fig. 3; Table 2).Removing the five diploid outliers (.25 pg) reduced theslope (b ¼ 0.27) by approximately four-fold but theregression continued to be significant (P , 0.001). Forthe polyploids, a nucleotypic effect on CCT was alsodetected (Fig. 3; Table 2); however, removing the two poly-ploid outliers rendered the regression non-significant (y ¼0.03x 2 13.5). This confirms previous work in which theslope/rate of increase in CCT with increasing DNA washigher in diploids than in autopolyploids (Evans et al.,1972). With the exception of Scilla sibirica, CCT in poly-ploids is generally more buffered than in diploids (Fig. 3).
We acknowledge that some traditionally classifieddiploids are not necessarily so (see Soltis and Soltis,1999). For example, there are strong arguments that Zeamays is actually an allotetraploid (2n ¼ 4x ¼ 20; Gaut andDoebly, 1997). However, in the data reported here wehave assigned ploidy level as listed by the authors of thepapers and reviews we have consulted.
The longest CCTs (.20 h) are exhibited by the peren-nials (Fig. 4). Indeed, the data for perennials overall had anearly seven-fold steeper slope (b ¼ 1.37) than a compar-able regression for annuals (b ¼ 0.20; Table 2). Thesedata are consistent with findings of Bennett (1972) wherethe mean CCT in 19 annuals was significantly shorterthan in eight obligate perennials. Where our analysesdiffer from Bennett (1972) is in relation to the broadrange of CCTs shown by perennials compared withannuals (Fig. 4). However, in Fig. 4 the longer CCTs
FI G. 3. DNA C-value (pg) and cell cycle time (h) in the root apical mer-istem of a range of diploid and polyploid angiosperms. See Table 2 for
regression analyses.
FI G. 2. DNA C-value (pg) and cell cycle time (h) in the root apical mer-istem of a range of (A) eudicots and monocots (n ¼ 110), and (B) eudicots
(n ¼ 60). See Table 2 for regression analyses.
TABLE 2. Regression analyses of all data presented inFigs. 2–4 together with the percentage variance accountedfor by the regression (R2), the level of probability (P) for
each regression
Regression (y ¼ bx þ a) R2 P n
All measurements y ¼ 1.09x þ 5.39 54.2 *** 110Monocots y ¼ 1.29x þ 2.44 58.7 *** 48Eudicots y ¼ 0.32x þ 10.2 15.4 *** 62Diploids y ¼ 1.04x þ 4.95 49.86 *** 86Polyploids y ¼ 1.14x þ 3.12 56.3 *** 24Annuals y ¼ 0.20x þ 10.7 19.9 *** 75Perennials y ¼ 1.37x þ 4.13 63.6 *** 35
*** P , 0.001; n, number of replicates.
Francis et al. — DNA C-value and the Cell Cycle750
at University of C
alifornia, Davis - Library on February 19, 2013
http://aob.oxfordjournals.org/D
ownloaded from
Francis et al. 2008. Ann. Bot.
0
10
20
30
100 105 110DNA
plants
cycle0
6
late flowering
early flowering
• “Soft sweeps” and polygenic selection predominate in maize
• Gene flow may provide a novel source of adaptive alleles
• Both population size and mutational target contribute
• Large, complex genomes may mean more targets and more soft sweeps in plants
• Genome size itself may be adaptive
Concluding Thoughts
Acknowledgments
Maize Diversity GroupPeter Bradbury
Ed Buckler John Doebley Theresa Fulton
Sherry Flint-Garcia Jim Holland
Sharon Mitchell Qi Sun
Doreen Ware
CollaboratorsJim Birchler Jeremy Berg
Graham Coop Nathan Springer
Lab AlumniTim Beissinger (USDA-ARS, Mizzou)
Matt Hufford (Iowa State) Tanja Pyhäjärvi (Oulu)
Shohei Takuno (Sokendai) Joost van Heerwaarden (Wageningen)