92
Evolutionary Genetics of a Complex Plant Genome Jeffrey Ross-Ibarra @jrossibarra www.rilab.org Dept. Plant Sciences • Center for Population Biology • Genome Center University of California Davis Opsnbm

Evolutionary Genetics of Complex Genome

Embed Size (px)

Citation preview

Page 1: Evolutionary Genetics of Complex Genome

Evolutionary Genetics of a Complex Plant Genome

Jeffrey Ross-Ibarra @jrossibarra • www.rilab.org

Dept. Plant Sciences • Center for Population Biology • Genome Center University of California Davis

Page 2: Evolutionary Genetics of Complex Genome

https://commons.wikimedia.org/wiki/File:Diversity_of_plants_image_version_5.png

Page 3: Evolutionary Genetics of Complex Genome

hard sweep

how do genomes adapt?

Page 4: Evolutionary Genetics of Complex Genome

hard sweep

how do genomes adapt?

Page 5: Evolutionary Genetics of Complex Genome

hard sweep

how do genomes adapt?

Page 6: Evolutionary Genetics of Complex Genome

hard sweep

multiple mutations

“soft” sweeps

how do genomes adapt?

Page 7: Evolutionary Genetics of Complex Genome

hard sweep

multiple mutations

standing variation

“soft” sweeps

how do genomes adapt?

Page 8: Evolutionary Genetics of Complex Genome

M T G P H R L

GGTCGAC ATG ACT GGT CCA CAT CGA CTG TAG

Page 9: Evolutionary Genetics of Complex Genome

M T G P H R L

GGTCGAC ATG ACT GGT CCA CAT CGA CTG TAG

M T N P H R L

GGTCGAC ATG ACT GAT CCA CAT CGA CTG TAG

structural change to protein

Page 10: Evolutionary Genetics of Complex Genome

M T G P H R L

GGTAAAC ATG ACT GGT CCA CAT CGA CTG TAG

GG—-AC ATG ACT GGT CCA CAT CGA CTG TAG

regulatory change to expression

Page 11: Evolutionary Genetics of Complex Genome

Lowry & Willis 2010 PLoS Biology

Page 12: Evolutionary Genetics of Complex Genome

Gaut and Ross-Ibarra 2008

Kew C-Value Database

Paris Japonica150GB Genome

Genlisia aurea63MB Genome Michal Rubeš

Page 13: Evolutionary Genetics of Complex Genome

Michal Rubeš

1.5

2.5

3.5

4.5

Angiosperm average

6400 Mb

Non-TE DNATE DNA

Lo

g (

ge

no

me

siz

e in

Mb

)

0

1,500

3,000

4,500

6,000

0 1500 3000 4500 6000

Genom

e s

ize (

Mb)

TE content (Mb)

r = 0.99

Ara

bid

opsis

thalia

na

Ara

bid

opsis

lyra

ta

Bra

chypodiu

m d

ista

chyon

Papaya

Ric

e

Lotu

s japonic

us

Bla

ck c

ottonw

ood

Gra

pevin

e

Cabbage

Medic

ago tru

ncula

ta

Sorg

hum

Soybean

Levant cotton

Maiz

e

Aegilo

ps s

peltoid

es

Barley

Thursday, May 6, 2010

Figure 1 _ Main Text

Tenaillon et al. 2010 TIP

Page 14: Evolutionary Genetics of Complex Genome

Suketoshi Taba

Page 15: Evolutionary Genetics of Complex Genome

44.5 Mb 44.6 Mb 44.7 Mb 44.8 Mb 44.9 Mb 45 Mb

Gen

eLT

R Re

trotra

nspo

son maize - 2300 Mb

50 kb

7.4 Mb 7.5 Mb 7.6 Mb 7.7 Mb 7.8 Mb 7.9 Mb

50 kb

Gen

eLT

R Re

trotra

nspo

son arabidopsis - 130 Mb

Page 16: Evolutionary Genetics of Complex Genome

Zea maysA. thaliana

Angiosperm 1C genome size (Mb)

Page 17: Evolutionary Genetics of Complex Genome

Mb

DN

A

1

10

100

1000

10000

Arabidopsis Maize

15.5

3.4

7050

2,300

135

GenomeCDsIntergenic open chromatin

Sullivan et al. Cell Reports 2014 Rodgers-Melnick et al. PNAS 2016

Page 18: Evolutionary Genetics of Complex Genome

Mb

DN

A

1

10

100

1000

10000

Arabidopsis Maize

15.5

3.4

7050

2,300

135

GenomeCDsIntergenic open chromatin "Functional" DNA

0%

25%

50%

75%

100%

Arabidopsis maize

81%93%

19%7%

IntergenicCDs

Sullivan et al. Cell Reports 2014 Rodgers-Melnick et al. PNAS 2016

Page 19: Evolutionary Genetics of Complex Genome

Ne individuals, µ beneficial mutation rate per trait

bigger genome, larger mutation target, higher µ

predict that larger genomes adapt via standing variation, noncoding variants

Page 20: Evolutionary Genetics of Complex Genome

Ne individuals, µ beneficial mutation rate per trait

bigger genome, larger mutation target, higher µ

predict that larger genomes adapt via standing variation, noncoding variants

selection from standing variation when 2Neµ > 1

Page 21: Evolutionary Genetics of Complex Genome

Ne individuals, µ beneficial mutation rate per trait

bigger genome, larger mutation target, higher µ

predict that larger genomes adapt via standing variation, noncoding variants

selection from standing variation when 2Neµ > 1

larger % of µ should be noncoding

Page 22: Evolutionary Genetics of Complex Genome

maizeteosinte

Page 23: Evolutionary Genetics of Complex Genome

1 2 3 4 5

6 7 8 9 10

Briggs et al. 2007 Genetics

Page 24: Evolutionary Genetics of Complex Genome

1 2 3 4 5

6 7 8 9 10

tb1

Studer et al. 2011 Nature Genetics.; Vann et al. 2015 PeerJ

©20

11 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

NATURE GENETICS ADVANCE ONLINE PUBLICATION 3

L E T T E R S

mutation rate21, strongly suggesting that the Hopscotch insertion (and thus, the older Tourist as well) existed as standing genetic variation in the teosinte ancestor of maize. Thus, we conclude that the Hopscotch insertion likely predated domestication by more than 10,000 years and the Tourist insertion by an even greater amount of time.

We identified four fixed differences in the portion of the proximal and distal components of the control region that show evidence of selection. We used transient assays in maize leaf protoplasts to test all four differences for effects on gene expression. Maize and teosinte chromosomal segments for the portions of the proximal and distal components with these four differences were cloned into reporter constructs upstream of the minimal promoter of the cauliflower mosaic virus (mpCaMV), the firefly luciferase ORF and the nopaline synthase (NOS) terminator (Fig. 4). Each construct was assayed for luminescence after transformation by electroporation into maize pro-toplast. The constructs for the distal component contrast the effects of the Tourist insertion plus the single fixed nucleotide substitution that distinguish maize and teosinte. Both the maize and teosinte constructs for the distal component repressed luciferase expression

relative to the minimal promoter alone. The maize construct with Tourist excised gave luciferase expression equivalent to the native maize and teosinte constructs and less expression than the minimal promoter alone. These results indicate that this segment is function-ally important, acting as a repressor of luciferase expression and, by inference, of tb1 expression in vivo. However, we did not observe any difference between the maize and teosinte constructs as anticipated. One possible cause for the lack of differences in expression between the maize and teosinte constructs might be that additional proteins required to cause these differences are not present in maize leaf pro-toplast. Another possibility is that the factor affecting phenotype in the distal component lies in the unselected region between −64.8 and −69.5 kb, which is not included in the construct. Nevertheless, the results do indicate that the distal component has a functional element that acts as a repressor. The functional importance of this segment is supported by its low level of nucleotide diversity (Fig. 3a), suggesting a history of purifying selection.

The constructs for the proximal component of the control region contrast the effects of the Hopscotch insertion plus a single fixed nucleo-tide substitution that distinguish maize and teosinte. The construct with the maize sequence including Hopscotch increased expression of the luciferase reporter twofold relative to the teosinte construct for the proximal control region and the minimal promoter alone (Fig. 4). Luciferase expression was returned to the level of the teosinte con-struct and the minimal promoter construct by deleting the Hopscotch element from the full maize construct. These results indicate that the Hopscotch element enhances luciferase expression and, by

a

b

0.06

A B C D M

T

P = 0.95 P = 0.41 P = 0.04

HKA neutrality tests

P 0.0001

0.04

0.02

0–67 kb –66 kb

Distalcomponent

Teosinte clusterhaplotype

Maize clusterhaplotype

Proximalcomponent

–65 kbTourist408 bp

Hopscotch4,885 bp

–64 kb –58 kb

Figure 3 Sequence diversity in maize and teosinte across the control region. (a) Nucleotide diversity across the tb1 upstream control region. Base-pair positions are relative to AGPv2 position 265,745,977 of the maize reference genome sequence. P values correspond to HKA neutrality tests for regions A–D, as defined by the dotted lines. Green shading signifies evidence of neutrality, and pink shading signifies regions of non-neutral evolution. Nucleotide diversity ( ) for maize (yellow line) and teosinte (green line) were calculated using a 500-bp sliding window with a 25-bp step. The distal and proximal components of the control region with four fixed sequence differences between the most common maize haplotype and teosinte haplotype are shown below. (b) A minimum spanning tree for the control region with 16 diverse maize and 17 diverse teosinte sequences. Size of the circles for each haplotype group (yellow, maize; green, teosinte) is proportional to the number of individuals within that haplotype.

Transient assay constructs

mpCaMV luc

luc

luc

luc

luc

luc

luc

luc

Hopscotch

Tourist

mpCaMV

T-dist

M-dist

T-prox

M-prox

0 0.5 1.0 1.5 2.0

∆M-dist

∆M-proxPro

xim

al c

ontr

ol r

egio

nD

ista

l con

trol

reg

ion

Relative expression

Figure 4 Constructs and corresponding normalized luciferase expression levels. Transient assays were performed in maize leaf protoplast. Each construct is drawn to scale. The construct backbone consists of the minimal promoter from the cauliflower mosaic virus (mpCaMV, gray box), luciferase ORF (luc, white box) and the nopaline synthase terminator (black box). Portions of the proximal and distal components of the control region (hatched boxes) from maize and teosinte were cloned into restriction sites upstream of the minimal promoter. “ ” denotes the excision of either the Tourist or Hopscotch element from the maize construct. Horizontal green bars show the normalized mean with s.e.m. for each construct.

relative expressionconstruct

Page 25: Evolutionary Genetics of Complex Genome

1 2 3 4 5

6 7 8 9 10

tb1

Figure 2 Map of parviglumis Populations and Hopscotch allele frequency. Map showing the frequencyof the Hopscotch allele in populations of parviglumis where we sampled more than 6 individuals. Size ofcircles reflects number of individuals sampled. The Balsas River is shown, as the Balsas River Basin isbelieved to be the center of domestication of maize.

as our independent trait for phenotyping analyses. SAS code used for analysis is available athttp://dx.doi.org/10.6084/m9.figshare.1166630.

RESULTSGenotyping for the Hopscotch insertionThe genotype at the Hopscotch insertion was confirmed with two PCRs for 837 individualsof the 1,100 screened (Table S1 and Table S2). Among the 247 maize landrace accessionsgenotyped, all but eight were homozygous for the presence of the insertion Withinour parviglumis and mexicana samples we found the Hopscotch insertion segregatingin 37 (n = 86) and four (n = 17) populations, respectively, and at highest frequencywithin populations in the states of Jalisco, Colima, and Michoacan in central-westernMexico (Fig. 2). Using our Hopscotch genotyping, we calculated diVerentiation betweenpopulations (FST) and subspecies (FCT) for populations in which we sampled sixteenor more chromosomes. We found that FCT = 0, and levels of FST among populationswithin each subspecies (0.22) and among all populations (0.23) (Table 1) are similar togenome-wide estimates from previous studies Pyhajarvi, HuVord & Ross-Ibarra, 2013.Although we found large variation in Hopscotch allele frequency among our populations,BayEnv analysis did not indicate a correlation between the Hopscotch insertion andenvironmental variables (all Bayes Factors < 1).

Vann et al. (2015), PeerJ, DOI 10.7717/peerj.900 8/21

Studer et al. 2011 Nature Genetics.; Vann et al. 2015 PeerJ

©20

11 N

atur

e A

mer

ica,

Inc.

All

righ

ts r

eser

ved.

NATURE GENETICS ADVANCE ONLINE PUBLICATION 3

L E T T E R S

mutation rate21, strongly suggesting that the Hopscotch insertion (and thus, the older Tourist as well) existed as standing genetic variation in the teosinte ancestor of maize. Thus, we conclude that the Hopscotch insertion likely predated domestication by more than 10,000 years and the Tourist insertion by an even greater amount of time.

We identified four fixed differences in the portion of the proximal and distal components of the control region that show evidence of selection. We used transient assays in maize leaf protoplasts to test all four differences for effects on gene expression. Maize and teosinte chromosomal segments for the portions of the proximal and distal components with these four differences were cloned into reporter constructs upstream of the minimal promoter of the cauliflower mosaic virus (mpCaMV), the firefly luciferase ORF and the nopaline synthase (NOS) terminator (Fig. 4). Each construct was assayed for luminescence after transformation by electroporation into maize pro-toplast. The constructs for the distal component contrast the effects of the Tourist insertion plus the single fixed nucleotide substitution that distinguish maize and teosinte. Both the maize and teosinte constructs for the distal component repressed luciferase expression

relative to the minimal promoter alone. The maize construct with Tourist excised gave luciferase expression equivalent to the native maize and teosinte constructs and less expression than the minimal promoter alone. These results indicate that this segment is function-ally important, acting as a repressor of luciferase expression and, by inference, of tb1 expression in vivo. However, we did not observe any difference between the maize and teosinte constructs as anticipated. One possible cause for the lack of differences in expression between the maize and teosinte constructs might be that additional proteins required to cause these differences are not present in maize leaf pro-toplast. Another possibility is that the factor affecting phenotype in the distal component lies in the unselected region between −64.8 and −69.5 kb, which is not included in the construct. Nevertheless, the results do indicate that the distal component has a functional element that acts as a repressor. The functional importance of this segment is supported by its low level of nucleotide diversity (Fig. 3a), suggesting a history of purifying selection.

The constructs for the proximal component of the control region contrast the effects of the Hopscotch insertion plus a single fixed nucleo-tide substitution that distinguish maize and teosinte. The construct with the maize sequence including Hopscotch increased expression of the luciferase reporter twofold relative to the teosinte construct for the proximal control region and the minimal promoter alone (Fig. 4). Luciferase expression was returned to the level of the teosinte con-struct and the minimal promoter construct by deleting the Hopscotch element from the full maize construct. These results indicate that the Hopscotch element enhances luciferase expression and, by

a

b

0.06

A B C D M

T

P = 0.95 P = 0.41 P = 0.04

HKA neutrality tests

P 0.0001

0.04

0.02

0–67 kb –66 kb

Distalcomponent

Teosinte clusterhaplotype

Maize clusterhaplotype

Proximalcomponent

–65 kbTourist408 bp

Hopscotch4,885 bp

–64 kb –58 kb

Figure 3 Sequence diversity in maize and teosinte across the control region. (a) Nucleotide diversity across the tb1 upstream control region. Base-pair positions are relative to AGPv2 position 265,745,977 of the maize reference genome sequence. P values correspond to HKA neutrality tests for regions A–D, as defined by the dotted lines. Green shading signifies evidence of neutrality, and pink shading signifies regions of non-neutral evolution. Nucleotide diversity ( ) for maize (yellow line) and teosinte (green line) were calculated using a 500-bp sliding window with a 25-bp step. The distal and proximal components of the control region with four fixed sequence differences between the most common maize haplotype and teosinte haplotype are shown below. (b) A minimum spanning tree for the control region with 16 diverse maize and 17 diverse teosinte sequences. Size of the circles for each haplotype group (yellow, maize; green, teosinte) is proportional to the number of individuals within that haplotype.

Transient assay constructs

mpCaMV luc

luc

luc

luc

luc

luc

luc

luc

Hopscotch

Tourist

mpCaMV

T-dist

M-dist

T-prox

M-prox

0 0.5 1.0 1.5 2.0

∆M-dist

∆M-proxPro

xim

al c

ontr

ol r

egio

nD

ista

l con

trol

reg

ion

Relative expression

Figure 4 Constructs and corresponding normalized luciferase expression levels. Transient assays were performed in maize leaf protoplast. Each construct is drawn to scale. The construct backbone consists of the minimal promoter from the cauliflower mosaic virus (mpCaMV, gray box), luciferase ORF (luc, white box) and the nopaline synthase terminator (black box). Portions of the proximal and distal components of the control region (hatched boxes) from maize and teosinte were cloned into restriction sites upstream of the minimal promoter. “ ” denotes the excision of either the Tourist or Hopscotch element from the maize construct. Horizontal green bars show the normalized mean with s.e.m. for each construct.

relative expressionconstruct

Page 26: Evolutionary Genetics of Complex Genome

Wang et al. 2005 Nature Wang et al 2015 Genetics

1 2 3 4 5

6 7 8 9 10

Figure 1.Phenotypes. a. Maize ear showing the cob (cb) exposed at top. b. Teosinte ear with the rachisinternode (in) and glume (gl) labeled. c. Teosinte ear from a plant with a maize allele of tga1introgressed into it. d. Close-up of a single teosinte fruitcase. e. Close-up of a fruitcase fromteosinte plant with a maize allele of tga1 introgressed into it. f. Ear of maize inbred W22(Tga1-maize allele) with the cob exposed showing the small white glumes at the base. g. Earof maize inbred W22:tga1 which carries the teosinte allele, showing enlarged (white) glumes.h. Ear of maize inbred W22 carrying the tga1-ems1 allele, showing enlarged glumes. For highermagnification copies of f–h see Supplementary Information.

Wang et al. Page 10

Nature. Author manuscript; available in PMC 2006 May 23.

NIH

-PA

Author M

anuscriptN

IH-P

A A

uthor Manuscript

NIH

-PA

Author M

anuscript

tga1tb1

Page 27: Evolutionary Genetics of Complex Genome

Wang et al. 2005 Nature Wang et al 2015 Genetics

1 2 3 4 5

6 7 8 9 10

Figure 1.Phenotypes. a. Maize ear showing the cob (cb) exposed at top. b. Teosinte ear with the rachisinternode (in) and glume (gl) labeled. c. Teosinte ear from a plant with a maize allele of tga1introgressed into it. d. Close-up of a single teosinte fruitcase. e. Close-up of a fruitcase fromteosinte plant with a maize allele of tga1 introgressed into it. f. Ear of maize inbred W22(Tga1-maize allele) with the cob exposed showing the small white glumes at the base. g. Earof maize inbred W22:tga1 which carries the teosinte allele, showing enlarged (white) glumes.h. Ear of maize inbred W22 carrying the tga1-ems1 allele, showing enlarged glumes. For highermagnification copies of f–h see Supplementary Information.

Wang et al. Page 10

Nature. Author manuscript; available in PMC 2006 May 23.

NIH

-PA

Author M

anuscriptN

IH-P

A A

uthor Manuscript

NIH

-PA

Author M

anuscript

tga1tb1

Page 28: Evolutionary Genetics of Complex Genome

1 2 3 4 5

6 7 8 9 10

gt1 tga1

Wills et al. 2013 PLoS Genetics

tb1

Page 29: Evolutionary Genetics of Complex Genome

1 2 3 4 5

6 7 8 9 10

gt1 tga1

Wills et al. 2013 PLoS Genetics

teosinte maizeClint Whipple, BYU

tb1

Page 30: Evolutionary Genetics of Complex Genome

1 2 3 4 5

6 7 8 9 10

gt1 tga1

Wills et al. 2013 PLoS Genetics

tb1

T/TM/TM/M

T/TM/TM/M

A B

T/TM/TM/M

T/TM/TM/M

A B

3’ UTR

5’ control region

Page 31: Evolutionary Genetics of Complex Genome

hard sweep

M T N P H R L

GGTCGA ATG ACT GAT CCA CAT CGA CTG TAG

tga1 gt1 tb1

Multiple Mutations

Standing Variation

M T G P H R L

GGTAAA ATG ACT GGT CCA CAT CGA CTG TAG

Page 32: Evolutionary Genetics of Complex Genome

Hufford et al. 2012 Nat. Gen. Chia et al. 2012 Nat. Gen

13 teosinte 23 maizegenomes:

Page 33: Evolutionary Genetics of Complex Genome

Hufford et al. 2012 Nat. Gen. Chia et al. 2012 Nat. Gen

13 teosinte 23 maizegenomes:

5-10% selected regions intergenic

Page 34: Evolutionary Genetics of Complex Genome

whereas others are lost after domestication (Fig. 3B). It should benoted that many of these genes have unique coexpression edges inmaize that are not observed in teosinte (Fig. S4B).

Expression data provide an opportunity to investigate furtherfunctional alterations to genes located within genomic regionsthat population genomic analyses identify as targets of selective

E

DE(n=612)

AEC(n=1115)

Dom/Imp genes(n=1761)

292 230750

894644

1582

A

B

Teosinte network edges Maize network edges

D

C

GRMZM2G068436

GRMZM2G137947

GRMZM2G375302

Mb

Mb

Fig. 3. Analysis of genes with altered expression or conservation and targets of selection during improvement and/or domestication. (A) Venn diagramshowing the overlap between DE genes, AEC genes, and the genes that occur in genomic regions that have evidence for selective sweeps during maizedomestication or improvement (Dom/Imp genes). (B) Teosinte coexpression networks for three genes (GRMZM2G068436, GRMZM2G137947, andGRMZM2G375302). (Right) Edges that are maintained in maize coexpression networks are shown. Although the differentially expressed gene (red node) ishighly connected in teosinte, most of these connections are lost in maize. However, some parts of the teosinte network are still conserved in maize. (C) Cross-population composite likelihood ratio test (XP-CLR) plot shows the evidence for a selective sweep that occurs on chromosome 9. The tick marks along the xaxis represent genes, and the red tick mark indicates the gene (GRMZM2G448355) that was chosen as the candidate target of selection and is differentiallyexpressed in maize and teosinte. The bar plot underneath the graph shows the expression levels of all maize (blue) and teosinte (red) samples. (D) XP-CLR plotfor a large region on chromosome 5. The candidate target of selection is indicated in green and shows similar expression in maize and teosinte. Two othergenes (red) exhibit DE. (E) Neighbor-joining tree shows the relationships among the haplotypes at GRMZM2G141858. (Right) Bar plot shows expression levelsfor each genotype; red bars indicate teosinte genotypes, and blue bars represent maize genotypes. At least one teosinte genotype (TIL15) contains thehaplotype that has been selected in maize and has expression levels similar to maize genotypes.

Table 2. Genes in selected regions with evidence for DE or AEC

Gene listNo. genes selectedduring dom/imp

% up-regulatedin maize Significance

% higher connectedin maize % candidates

AEC and DE (n = 276) 46 76 0.0002 41.3 39.1DE only (n = 336) 44 61 0.0230 40.9 22.7AEC only (n = 839) 89 54 0.1837 57.3 32.6

dom, domestication; imp, improvement.

4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1201961109 Swanson-Wagner et al.

ExpressionGenealogy

teosintemaize

• ~500 selected regions

• 11M shared vs 3000 fixed SNPs

• Candidates differentially expressed, decreased expression variation

selection on regulatory sequence, standing variation

Hufford et al. 2012 Nat. Gen. Swanson-Wagner et al. 2012 PNAS

Page 35: Evolutionary Genetics of Complex Genome

Mexico lowland

9,000 BP

Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009

Page 36: Evolutionary Genetics of Complex Genome

Mexico highland6,000 BP

Mexico lowland

9,000 BP

Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009

Page 37: Evolutionary Genetics of Complex Genome

Mexico highland6,000 BP

S.Americalowland

6,000BP

Mexico lowland

9,000 BP

Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009

Page 38: Evolutionary Genetics of Complex Genome

Mexico highland6,000 BP

S.Americalowland

6,000BP

S.AmericaHighland

4,000BP

Mexico lowland

9,000 BP

Matsuoka et al. 2002; Piperno 2006 Perry et al. 2006; Piperno et al. 2009

Page 39: Evolutionary Genetics of Complex Genome

Mexico

phot

o by

Mon

thon

Wac

hira

sett

akul

Andes

phot

o by

Mat

t H

uffo

rd

Beissinger et al. Unpublished

Page 40: Evolutionary Genetics of Complex Genome

SA MEX SA MEX

SA MEX SA MEX SA MEX SA

Ear Height Plant Height

Tassel Br. Number

T

Days to AnthesisSA MEX SA MEX

SA MEX SA MEX

LowlandHighland

Beissinger et al. Unpublished

Page 41: Evolutionary Genetics of Complex Genome

Mexico Lowland

Mexico Highland

NA

NB

NC

N1 N2

N2P

tD tE

tF

NA

NB

NC

N1 N2

N2P

tD tE

tF

tmex

Nmex

NA

NB

NC

N1 N2

tD tE

tF

N3 N4

NC �ĮNA

N1 �ȕNC

N2 ����ȕ�NC

N2P� �ȖN2

NC �ĮNA

N1 �ȕNC

N2 ����ȕ�NC

N2P� �ȖN2

NC �ĮNA

N1 �ȕ1NC

N2 ����ȕ1�NC

N3 �ȕ2N2

N4 ����ȕ2�N2

N4P �ȖN4

tG

N4P Lowland Highland mexicana Mexico

Lowland SA

Lowland SA

Highland

Model IA Model IB Model II

Figure 2 Demographic models of maize low- and high-land populations. Parameters in bold were estimated inthis study. See text for details.

A HWE cut-off of P < 0.005 was used for each subpopu-lation due to our under-calling of heterozygotes. In total, weincluded 18,745 silent SNPs for the Mexican populations inModels IA and IB, 14,508 for the S. American populations inModel I and 11,305 for the Mexican lowland population andthe S. American populations in Model II. We obtained similarresults under more or less stringent thresholds for significance(P < 0.05 ⇠ 0.0005; data not shown), though the number ofSNPs was very small at P < 0.005. Demographic parameterswere inferred with the software �a�i (Gutenkunst et al. 2009),which uses a diffusion method to calculate an expected JFDand evaluates the likelihood of the data using a multinomialassumption.Model IA: This model is applied to the Mexican and S. Amer-ican populations. We assume the ancestral diploid popula-tion representing parviglumis follows a standard Wright-Fishermodel with constant size. The size of the ancestral popula-tion is denoted by NA. At tD generations ago, the bottleneckevent begins at domestication, and at tE generations ago, thebottleneck ends. The population size and duration of the bot-tleneck are denoted by NB and tB = tD � tE , respectively.The population size recovers to NC = ↵NA in the lowlands.Then, the highland population is differentiated from the low-land population at tF generations ago. The size of the low- andhighland populations at time tF is determined by a parameter� such that the population is divided by �NC and (1� �)NC .We assume that the population size in the lowlands is constantbut that the highland population experiences exponential ex-pansion after divergence: its current population size is � timeslarger than that at tF .isn’t this really a shrinking population in the lowlands, since �NC < NC ? wouldn’t

we want instead for lowlands to stay at NC and a new population branching off? how

much do we worry about this? actually, our conclusion holds when Iassumed the pop size of lowlands stays at NC . However, the

likelihood is a bit better in my original model.Model IB: We expand Model IA for the Mexican populationsby incorporating admixture from the teosinte mexicana to thehighland Mexican maize population. do we say ”Mexico population” or

”Mexican” (and thus ”South American”) ”population” throughout? as long as we’re

consistent probably OK either way. vote to Mexican population second

The time of differentiation between parviglumis and mexicanaoccurs at tmex generations ago. The mexicana population sizeis assumed to be constant at Nmex. At tF generations ago,the Mexican highland population is derived from admixturebetween the Mexican lowland population and a portion Pmex

from the teosinte mexicana .

Model II: The final model is for the Mexican lowland, S.American lowland and highland populations. This modelwas used for simulating SNPs with ascertainment bias (seebelow). At time tF , the Mexican and S. American lowlandpopulations are differentiated, and the sizes of populationsafter splitting are determined by �1. At time tG, S. Amer-ican lowland and highland populations are differentiated,and the sizes of populations at this time are determined by�2. As in Model IA, the S. American highland population isassumed to experience population growth with the parameter �.

Estimates of a number of our model parameters were avail-able from previous work. NA was set to 150,000 using esti-mates of the composite parameter 4NAµ ⇠ 0.018 from parvig-lumis (Eyre-Walker et al. 1998; Tenaillon et al. 2001, 2004;Wright et al. 2005; Ross-Ibarra et al. 2009) and an estimateof the mutation rate µ ⇠ 3 ⇥ 10

�8 (Clark et al. 2005) persite per generation. The severity of the domestication bottle-neck is represented by k = NB/tB (Eyre-Walker et al. 1998;Wright et al. 2005), and following Wright et al. (2005) we as-sumed k = 2.45 and tB = 1, 000 generations. Taking intoaccount archaeological evidence (Piperno et al. 2009), we as-sume tD = 9, 000 and tE = 8, 000. We further assumedtF = 6, 000 for Mexican populations in Models IA and IB(Piperno 2006), tF = 4, 000 for S. American populationsin Model lA (Perry et al. 2006; Grobman et al. 2012), andtmex = 60, 000, Nmex = 160, 000 (Ross-Ibarra et al. 2009),and Pmex = 0.2 (van Heerwaarden et al. 2011) for ModelIB. For both Models IA and IB, we inferred three parameters(↵, � and �), and, for Model II, we fixed tF = 6, 000 andtG = 4, 000 (Piperno 2006; Perry et al. 2006; Grobman et al.2012) and estimated the remaining four parameters (↵, �1, �2

and �).tF for model II is listed as 4,000 and 6,000 above. 6,000 is the number that matches

the lit best. is that what was used? if so, we should cite (Grobman et al. 2012) fixed

Differentiation between low- and highland popula-tions

We used our inferred demographic model to generate a nulldistribution of FST . As implemented in �a�i (Gutenkunst

4

Mexico Lowland

Mexico Highland

NA

NB

NC

N1 N2

N2P

tD tE

tF

NA

NB

NC

N1 N2

N2P

tD tE

tF

tmex

Nmex

NA

NB

NC

N1 N2

tD tE

tF

N3 N4

NC �ĮNA

N1 �ȕNC

N2 ����ȕ�NC

N2P� �ȖN2

NC �ĮNA

N1 �ȕNC

N2 ����ȕ�NC

N2P� �ȖN2

NC �ĮNA

N1 �ȕ1NC

N2 ����ȕ1�NC

N3 �ȕ2N2

N4 ����ȕ2�N2

N4P �ȖN4

tG

N4P Lowland Highland mexicana Mexico

Lowland SA

Lowland SA

Highland

Model IA Model IB Model II

Figure 2 Demographic models of maize low- and high-land populations. Parameters in bold were estimated inthis study. See text for details.

A HWE cut-off of P < 0.005 was used for each subpopu-lation due to our under-calling of heterozygotes. In total, weincluded 18,745 silent SNPs for the Mexican populations inModels IA and IB, 14,508 for the S. American populations inModel I and 11,305 for the Mexican lowland population andthe S. American populations in Model II. We obtained similarresults under more or less stringent thresholds for significance(P < 0.05 ⇠ 0.0005; data not shown), though the number ofSNPs was very small at P < 0.005. Demographic parameterswere inferred with the software �a�i (Gutenkunst et al. 2009),which uses a diffusion method to calculate an expected JFDand evaluates the likelihood of the data using a multinomialassumption.Model IA: This model is applied to the Mexican and S. Amer-ican populations. We assume the ancestral diploid popula-tion representing parviglumis follows a standard Wright-Fishermodel with constant size. The size of the ancestral popula-tion is denoted by NA. At tD generations ago, the bottleneckevent begins at domestication, and at tE generations ago, thebottleneck ends. The population size and duration of the bot-tleneck are denoted by NB and tB = tD � tE , respectively.The population size recovers to NC = ↵NA in the lowlands.Then, the highland population is differentiated from the low-land population at tF generations ago. The size of the low- andhighland populations at time tF is determined by a parameter� such that the population is divided by �NC and (1� �)NC .We assume that the population size in the lowlands is constantbut that the highland population experiences exponential ex-pansion after divergence: its current population size is � timeslarger than that at tF .isn’t this really a shrinking population in the lowlands, since �NC < NC ? wouldn’t

we want instead for lowlands to stay at NC and a new population branching off? how

much do we worry about this? actually, our conclusion holds when Iassumed the pop size of lowlands stays at NC . However, the

likelihood is a bit better in my original model.Model IB: We expand Model IA for the Mexican populationsby incorporating admixture from the teosinte mexicana to thehighland Mexican maize population. do we say ”Mexico population” or

”Mexican” (and thus ”South American”) ”population” throughout? as long as we’re

consistent probably OK either way. vote to Mexican population second

The time of differentiation between parviglumis and mexicanaoccurs at tmex generations ago. The mexicana population sizeis assumed to be constant at Nmex. At tF generations ago,the Mexican highland population is derived from admixturebetween the Mexican lowland population and a portion Pmex

from the teosinte mexicana .

Model II: The final model is for the Mexican lowland, S.American lowland and highland populations. This modelwas used for simulating SNPs with ascertainment bias (seebelow). At time tF , the Mexican and S. American lowlandpopulations are differentiated, and the sizes of populationsafter splitting are determined by �1. At time tG, S. Amer-ican lowland and highland populations are differentiated,and the sizes of populations at this time are determined by�2. As in Model IA, the S. American highland population isassumed to experience population growth with the parameter �.

Estimates of a number of our model parameters were avail-able from previous work. NA was set to 150,000 using esti-mates of the composite parameter 4NAµ ⇠ 0.018 from parvig-lumis (Eyre-Walker et al. 1998; Tenaillon et al. 2001, 2004;Wright et al. 2005; Ross-Ibarra et al. 2009) and an estimateof the mutation rate µ ⇠ 3 ⇥ 10

�8 (Clark et al. 2005) persite per generation. The severity of the domestication bottle-neck is represented by k = NB/tB (Eyre-Walker et al. 1998;Wright et al. 2005), and following Wright et al. (2005) we as-sumed k = 2.45 and tB = 1, 000 generations. Taking intoaccount archaeological evidence (Piperno et al. 2009), we as-sume tD = 9, 000 and tE = 8, 000. We further assumedtF = 6, 000 for Mexican populations in Models IA and IB(Piperno 2006), tF = 4, 000 for S. American populationsin Model lA (Perry et al. 2006; Grobman et al. 2012), andtmex = 60, 000, Nmex = 160, 000 (Ross-Ibarra et al. 2009),and Pmex = 0.2 (van Heerwaarden et al. 2011) for ModelIB. For both Models IA and IB, we inferred three parameters(↵, � and �), and, for Model II, we fixed tF = 6, 000 andtG = 4, 000 (Piperno 2006; Perry et al. 2006; Grobman et al.2012) and estimated the remaining four parameters (↵, �1, �2

and �).tF for model II is listed as 4,000 and 6,000 above. 6,000 is the number that matches

the lit best. is that what was used? if so, we should cite (Grobman et al. 2012) fixed

Differentiation between low- and highland popula-tions

We used our inferred demographic model to generate a nulldistribution of FST . As implemented in �a�i (Gutenkunst

4

Table 2 Inference of demographic parameters

Mexico Model I Model II

Likelihood �5592.80 Likelihood �4654.79

↵ 0.92 ↵ 1.5

� 0.38 � 0.76

� 1 � 1

South America Model I Model III

Likelihood �3855.28 Likelihood �8044.71

↵ 0.52 ↵ 1.0

� 0.97 �1 0.64

� 88 �2 0.95

� 54

Population structure

We performed a STRUCTURE analysis (Pritchard et al. 2000;Falush et al. 2003) of our landrace sample, varying the numberof groups from K = 2 to 6 (Figure 1, Figure S3). Most lan-draces were assigned to groups consistent with a priori popu-lation definitions, but admixture between highland and lowlandpopulations was evident at intermediate elevations (⇠ 1700m).Consistent with previously described scenarios for maize dif-fusion (Piperno 2006), we find evidence of shared ancestrybetween lowland Mexican maize and both Mexican highlandand S. American lowland populations. Pairwise FST amongpopulations reveals low overall differentiation (Table 1), andthe higher FST values observed in S. America are consistentwith decreased admixture seen in STRUCTURE. Archaeolog-ical evidence supports a more recent colonization of the high-lands in S. America (Piperno 2006; Perry et al. 2006; Grobmanet al. 2012), suggesting that the observed differentiation maybe the result of a stronger bottleneck during colonization of theS. American highlands.

Population differentiation under inferred demogra-phy

To provide a null expectation for allele frequency differentia-tion, we used the joint site frequency distribution (JFD) of low-land and highland populations to estimate parameters of twodemographic models using the maximum likelihood methodimplemented in �a�i (Gutenkunst et al. 2009). All models in-corporate a domestication bottleneck (Wright et al. 2005) andpopulation differentiation between lowland and highland popu-lations, but differ in their consideration of admixture and ascer-tainment bias (Figure 2; see Materials and Methods for details).

Estimated parameter values are listed in Table 2; while theobserved and expected JFDs were quite similar for both mod-els, residuals indicated an excess of rare variants in the ob-served JFDs in all cases (Figure 3). Under both models IA and

A

B

Lowlands

Hig

hlan

ds

Observation Expectation ResidualMexico

South America

40

–40

0

Model IA

Model IB

Density

Residual

10–4

0

10–310–210–1

Lowlands

Hig

hlan

ds

Observation Expectation Residual

40

–40

0

Model IA

Model II

Density

Residual

10–4

0

10–310–210–1

Figure 3 Observed and expected joint distributions of mi-nor allele frequencies in low- and highland populations in(A) Mexico and (B) S. America. Residuals are calculatedas (model � data)/

pmodel

IB, we found expansion in the highland population in Mexicoto be unlikely, but a strong bottleneck followed by populationexpansion is supported in S. American maize in both modelsIA and II. The likelihood value of model IB was higher thanthe likelihood of model IA by 850 units of log-likelihood (Ta-ble 2), consistent with analyses suggesting that introgressionfrom mexicana played a significant role during the spread ofmaize into the Mexican highlands (Hufford et al. 2013).

In addition to the parameters listed in Figure 2, we investi-gated the impact of varying the domestication bottleneck size(NB). Surprisingly, NB was estimated to be equal to NC , thepopulation size at the end of the bottleneck, and the likelihoodof NB < NC was much smaller than for alternative parame-terizations (Table 2, S2). This result appears to contradict ear-

7

Table 2 Inference of demographic parameters

Mexico Model I Model II

Likelihood �5592.80 Likelihood �4654.79

↵ 0.92 ↵ 1.5

� 0.38 � 0.76

� 1 � 1

South America Model I Model III

Likelihood �3855.28 Likelihood �8044.71

↵ 0.52 ↵ 1.0

� 0.97 �1 0.64

� 88 �2 0.95

� 54

Population structure

We performed a STRUCTURE analysis (Pritchard et al. 2000;Falush et al. 2003) of our landrace sample, varying the numberof groups from K = 2 to 6 (Figure 1, Figure S3). Most lan-draces were assigned to groups consistent with a priori popu-lation definitions, but admixture between highland and lowlandpopulations was evident at intermediate elevations (⇠ 1700m).Consistent with previously described scenarios for maize dif-fusion (Piperno 2006), we find evidence of shared ancestrybetween lowland Mexican maize and both Mexican highlandand S. American lowland populations. Pairwise FST amongpopulations reveals low overall differentiation (Table 1), andthe higher FST values observed in S. America are consistentwith decreased admixture seen in STRUCTURE. Archaeolog-ical evidence supports a more recent colonization of the high-lands in S. America (Piperno 2006; Perry et al. 2006; Grobmanet al. 2012), suggesting that the observed differentiation maybe the result of a stronger bottleneck during colonization of theS. American highlands.

Population differentiation under inferred demogra-phy

To provide a null expectation for allele frequency differentia-tion, we used the joint site frequency distribution (JFD) of low-land and highland populations to estimate parameters of twodemographic models using the maximum likelihood methodimplemented in �a�i (Gutenkunst et al. 2009). All models in-corporate a domestication bottleneck (Wright et al. 2005) andpopulation differentiation between lowland and highland popu-lations, but differ in their consideration of admixture and ascer-tainment bias (Figure 2; see Materials and Methods for details).

Estimated parameter values are listed in Table 2; while theobserved and expected JFDs were quite similar for both mod-els, residuals indicated an excess of rare variants in the ob-served JFDs in all cases (Figure 3). Under both models IA and

A

B

Lowlands

Hig

hlan

ds

Observation Expectation ResidualMexico

South America

40

–40

0

Model IA

Model IB

Density

Residual

10–4

0

10–310–210–1

Lowlands

Hig

hlan

ds

Observation Expectation Residual

40

–40

0

Model IA

Model II

Density

Residual

10–4

0

10–310–210–1

Figure 3 Observed and expected joint distributions of mi-nor allele frequencies in low- and highland populations in(A) Mexico and (B) S. America. Residuals are calculatedas (model � data)/

pmodel

IB, we found expansion in the highland population in Mexicoto be unlikely, but a strong bottleneck followed by populationexpansion is supported in S. American maize in both modelsIA and II. The likelihood value of model IB was higher thanthe likelihood of model IA by 850 units of log-likelihood (Ta-ble 2), consistent with analyses suggesting that introgressionfrom mexicana played a significant role during the spread ofmaize into the Mexican highlands (Hufford et al. 2013).

In addition to the parameters listed in Figure 2, we investi-gated the impact of varying the domestication bottleneck size(NB). Surprisingly, NB was estimated to be equal to NC , thepopulation size at the end of the bottleneck, and the likelihoodof NB < NC was much smaller than for alternative parame-terizations (Table 2, S2). This result appears to contradict ear-

7

lowlands

high

land

s

density

Mexico observed expected

95 samples ~100K SNPs

Takuno et al. 2015 Genetics

Page 42: Evolutionary Genetics of Complex Genome

-Log

p-v

alue

Fst

S. A

mer

ica

-Log p-value Fst Mexico

shared SNPs

unique S. America

unique Mexico

Takuno et al. 2015 Genetics

Page 43: Evolutionary Genetics of Complex Genome

-Log

p-v

alue

Fst

S. A

mer

ica

-Log p-value Fst Mexico

shared SNPs

unique S. America

unique Mexico

Takuno et al. 2015 Genetics

39%61%

IntergenicGenic

19%

81%

Standing VariationNew mutation

Page 44: Evolutionary Genetics of Complex Genome

Pyhäjärvi et al. GBE 2013

Figures

��������

����� �������

���������

����������������

��������

���� �����

��������

������

������

������

�����������

��������������

��������������

���������

!�������������������"���

#$�$�� ����������������

%&����

��������'()(�'�))�'�*))*))�'�+)))+)))�'�+*))+*))�'�()))()))�'�(*))(*))�'�,))),)))�'�,*)),*))�'�*-./

�����������$0���� ����������������$0���������

1����+

32

Page 45: Evolutionary Genetics of Complex Genome

Pyhäjärvi et al. GBE 2013

Page 46: Evolutionary Genetics of Complex Genome

Pyhäjärvi et al. GBE 2013

environment allele frequency

Page 47: Evolutionary Genetics of Complex Genome

Beissinger et al. 2016 Nature Plants (pending rev)

nucl

eotid

e di

vers

ity

distance to nearest substitution (cM)

hard sweeps in genes play minor role in Zea

Page 48: Evolutionary Genetics of Complex Genome

Beissinger et al. 2016 Nature Plants (pending rev)

nucl

eotid

e di

vers

ity

distance to nearest substitution (cM)

hard sweeps in genes play minor role in Zea

Page 49: Evolutionary Genetics of Complex Genome

Beissinger et al. 2016 Nature Plants (pending rev)

nucl

eotid

e di

vers

ity

distance to nearest substitution (cM)

hard sweeps in genes play minor role in Zea

Page 50: Evolutionary Genetics of Complex Genome

Wallace et al. 2014 PLoS GeneticsRodgers-Melnick et al. 2016 PNAS

GWAS candidate SNPs

Page 51: Evolutionary Genetics of Complex Genome

Wallace et al. 2014 PLoS GeneticsRodgers-Melnick et al. 2016 PNAS

Variance PartitioningGWAS candidate SNPs

Page 52: Evolutionary Genetics of Complex Genome

how to adapt: Zea mays

M T G P H R L

GGTAAA ATG ACT GGT CCA CAT CGA CTG TAG

noncoding/regulatory variationmultiple

mutations

“soft” sweeps

standing variation

Page 53: Evolutionary Genetics of Complex Genome

Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 ScienceRoss-Ibarra et al. 2009 Genetics

Page 54: Evolutionary Genetics of Complex Genome

Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 ScienceRoss-Ibarra et al. 2009 Genetics

Page 55: Evolutionary Genetics of Complex Genome

Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science

dive

rsity

distance from substitution

Ross-Ibarra et al. 2009 Genetics

Page 56: Evolutionary Genetics of Complex Genome

Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science

dive

rsity

distance from substitution

20% nonsyn. adaptive 10% nonsyn. adaptive

50% nonsyn. adaptive 40% nonsyn. adaptive

Ross-Ibarra et al. 2009 Genetics

Page 57: Evolutionary Genetics of Complex Genome

Sattah et al. 2011 PLoS Gen. Williamson et al. 2014 PLoS Gen Hernandez et al. 2011 Science

dive

rsity

distance from substitution

Ross-Ibarra et al. 2009 Genetics

µ ∝ 2,500 Mbp µ ∝ 3,100 Mbp

µ ∝ 130 Mbp µ ∝ 220 Mbp

Page 58: Evolutionary Genetics of Complex Genome

Pyhäjärvi et al. GBE 2013

enric

hmen

t no

<———

>yes

larger genomes enriched in noncoding adaptive variants

inte

rgen

ic

syno

nym

ous

nons

ynon

ymou

s

enric

hmen

t in

terg

enic

<———

>cod

ing

Hancock et al 2011 Science Fraser et al. 2013 Gen. Research

Page 59: Evolutionary Genetics of Complex Genome

Pyhäjärvi et al. GBE 2013

larger genomes enriched in noncoding adaptive variants

enric

hmen

t in

terg

enic

<———

>cod

ing

exce

ss a

dapt

ive

SNPs

Hancock et al 2011 Science Fraser et al. 2013 Gen. Research

Page 60: Evolutionary Genetics of Complex Genome

WHAT IS A TE?

Credit: Robert Martienssen, CSHL

Page 61: Evolutionary Genetics of Complex Genome

Doebley 2004, Studer et al., 2011tb1Hopscotch

Page 62: Evolutionary Genetics of Complex Genome

Doebley 2004, Studer et al., 2011tb1Hopscotch ZmCCT

CACTA

Yang et al., 2013

Page 63: Evolutionary Genetics of Complex Genome

Mu

KNOTTED1 kn1

Greene, et al., 1994http://pmb.berkeley.edu/sites/default/files/users/Knotted1%20mutant.jpgDoebley 2004, Studer et al., 2011tb1

Hopscotch ZmCCTCACTA

Yang et al., 2013

Page 64: Evolutionary Genetics of Complex Genome

Makarevitch et al. 2015 PLoS Genetics

Page 65: Evolutionary Genetics of Complex Genome

Makarevitch et al. 2015 PLoS Genetics

new insertions activate expression

Makarevitch et al. 2014 bioRxiv

-0.5

0.5

1.5

2.5

Lines with the TE insertion

Lines without the TE insertion

GRMZM2G071206

Log 2

(stre

ss/c

ontro

l)

-202468

1012

Lines with the TE insertion

Lines without the TE insertion

-202468

1012

Log 2

(stre

ss/c

ontro

l) GRMZM2G400718 C

-0.50.00.51.01.52.0D

GRMZM2G102447

Lines with the TE insertion

Lines without the TE insertion

GRMZM2G108057

-202468

101214

Lines with the TE insertion

Lines without the TE insertion

GRMZM2G108149

A

B

Log 2

(stre

ss/c

ontro

l) Lo

g 2(s

tress

/con

trol)

E

Log 2

(stre

ss/c

ontro

l)

Lines with the TE insertion

Lines without the TE insertion

on September 9, 2014http://biorxiv.org/Downloaded from

-0.50.00.51.01.52.02.53.03.5

1 2 3 4 5 6 7 8 9 10

Oh43

B73 Mo17

- - + - - + - + - - ++ - - + - - + - - + - - + - - + - - + Gene

Log 2

(stre

ss/c

ontro

l)

TE presence

0%

20%

40%

60%

80%

100%

alaw

dagaf

etug flip

gyma

ipiki

jeli

joem

onnaiba

nihep

odoj

pebi

raider

riiryl

ubel

uwum

Zm00346

Zm02117

Zm03238

Zm05382

Salt

UV

Heat

Cold

B

A

Per

cent

of c

onse

rved

ge

nes

on September 9, 2014http://biorxiv.org/Downloaded from

***

****

*** *

new insertions activate expression

Makarevitch et al. 2014 bioRxiv

-0.5

0.5

1.5

2.5

Lines with the TE insertion

Lines without the TE insertion

GRMZM2G071206

Log 2

(stre

ss/c

ontro

l) -202468

1012

Lines with the TE insertion

Lines without the TE insertion

-202468

1012

Log 2

(stre

ss/c

ontro

l) GRMZM2G400718 C

-0.50.00.51.01.52.0D

GRMZM2G102447

Lines with the TE insertion

Lines without the TE insertion

GRMZM2G108057

-202468

101214

Lines with the TE insertion

Lines without the TE insertion

GRMZM2G108149

A

B

Log 2

(stre

ss/c

ontro

l) Lo

g 2(s

tress

/con

trol)

E

Log 2

(stre

ss/c

ontro

l)

Lines with the TE insertion

Lines without the TE insertion

on September 9, 2014http://biorxiv.org/Downloaded from

-0.50.00.51.01.52.02.53.03.5

1 2 3 4 5 6 7 8 9 10

Oh43

B73 Mo17

- - + - - + - + - - ++ - - + - - + - - + - - + - - + - - + Gene

Log 2

(stre

ss/c

ontro

l)

TE presence

0%

20%

40%

60%

80%

100%

alaw

dagaf

etug flip

gyma

ipiki

jeli

joem

onnaiba

nihep

odoj

pebi

raider

riiryl

ubel

uwum

Zm00346

Zm02117

Zm03238

Zm05382

Salt

UV

Heat

Cold

B

A

Per

cent

of c

onse

rved

ge

nes

on September 9, 2014http://biorxiv.org/Downloaded from

***

****

*** *

Page 66: Evolutionary Genetics of Complex Genome

Fedoroff 2012, Wang and Dooner 2006

Page 67: Evolutionary Genetics of Complex Genome

Homologous(loop)34%

Nopairing20%Nonhomologous46%

Page 68: Evolutionary Genetics of Complex Genome

Maguire1966Gene=cs

Homologous(loop)34%

Nopairing20%Nonhomologous46%

Page 69: Evolutionary Genetics of Complex Genome

Fang et al. Genetics 2012 Pyhäjärvi et al. GBE 2013

Figure S4 LD in chromosome 9 among mexicana populations based on SNPs with minor allele frequency >0.1.

Inv9d

Inv9e

Page 70: Evolutionary Genetics of Complex Genome

Fang et al. Genetics 2012 Pyhäjärvi et al. GBE 2013

0.0

0.4

0.8

0 1000 2000Elevation (m)

Inve

rsio

n Fr

eque

ncy

Inv4n

Figure S4 LD in chromosome 9 among mexicana populations based on SNPs with minor allele frequency >0.1.

Inv9d

Inv9e

Page 71: Evolutionary Genetics of Complex Genome

Fang et al. Genetics 2012 Pyhäjärvi et al. GBE 2013

0.0

0.4

0.8

0 1000 2000Elevation (m)

Inve

rsio

n Fr

eque

ncy

Inv4n

Figure S4 LD in chromosome 9 among mexicana populations based on SNPs with minor allele frequency >0.1.

Inv9d

Inv9eInv1n

Page 72: Evolutionary Genetics of Complex Genome

Lauter et al. 2004 Genetics

Inv4n

mexicana parviglumis

Nielsen 2004; Nielsen et al. 2005; McVean 2007). However,the largest sweep identified in maize to date is only 1.1 Mb(Tian et al. 2009), and both the age of the inversion andcommon tests for departures from neutrality do not provideevidence of strong selection. Another alternative explana-tion would be the presence of strong negative interactionsbetween distantly linked loci, potentially due to syntheticlethality (Boone et al. 2007). Such interactions should notgenerate extended patterns of elevated LD among interven-ing SNPs, as crossing over among haplotypes not carryingalleles involved in the negative interaction should not beaffected. Both selective sweeps and negative interactionsare inconsistent with the presence of only two major haplo-types in the Inv1n region and fail to explain the clinal var-iation in haplotype frequencies seen at Inv1n-I.

To our knowledge, the only prior evidence for Inv1n isa report of high LD and high FST from a much smaller sam-ple of parviglumis (Hufford et al. 2012), but a number ofother large inversions have been previously reported inmaysand its wild relatives (Ting 1965, 1967, 1976; Maguire1966; Kato 1975). These include an !50-Mb inversion onthe long arm of chromosome 3 in Z. luxurians (Ting 1965)and an !35-Mb inversion that covers most of the short armof chromosome 8 in both mays (McClintock 1960) and mex-icana (Ting 1976). While some of these inversions wereexperimentally induced (McClintock 1931; Morgan 1950),several have also been identified in natural populations ofmultiple taxa (Kato 1975; Ting 1976).

One of the factors that may limit the geographic spread oflarge inversions is the potential fitness cost of crossing over.The frequency of chromosome loss is dependent on theinversion size and efficiency of synapsis over the inverted

region (Burnham 1962; Maguire and Riess 1994; Lamb et al.2007). When gene density is low, such as in pericentromericregions, or there is a lack of continuous homology, chromo-somes will often synapse in a nonhomologous manner with-out recombination (McClintock 1933). In maize, for example,an inversion on the long arm of chromosome 1 similar in sizeto Inv1n (19 cM) was seen to undergo homologous pairing inonly about one-third of cases (Maguire 1966). Since Inv1n islocated in a pericentromeric region with low gene density andcovers a short genetic distance (2–13 cM), we anticipatedthat it would rarely pair and recombine with a noninvertedchromosome. Our data are consistent with these arguments.We observed repressed recombination around Inv1n and nocytological evidence of crossing over in inversion heterozy-gotes. SNP data indicate no deviations from expected Hardy–Weinberg genotype frequencies at Inv1n, and we see noobvious evidence of effects on fertility. Given these observa-tions, we suspect that inversion polymorphisms may be rel-atively common in natural plant populations, especially inregions of the genome with low recombination rates suchas pericentromeres. Low recombination has also been offeredas an explanation for the lack of underdominance in manypericentromeric inversions in Drosophila (Coyne et al. 1993).As dense genotyping becomes more cost effective, we predictthat numerous common inversions will be identified in nat-ural populations of Zea and other organisms.

Origin and age of Inv1n

Our evidence suggests that Inv1n-I is the derived, invertedarrangement. Inv1n-I is not found in Tripsacum or Zea taxaexcept for parviglumis and mexicana (Figure 3C), and, un-like in Inv1n-S, all SNPs private to Inv1n-I are derived in

Figure 5 (A) Bayes factors for correlation between allelefrequencies and altitude in 33 natural parviglumis popula-tions. Inv1n is indicated by red vertical lines. The 99thpercentile of the distribution of Bayes factors is indicatedby a horizontal dashed line. Chromosomes 1–10 are plot-ted in order and in different colors. (B) Association be-tween all SNPs and culm diameter. SNPs significant at5% FDR are above the dashed line.

890 Z. Fang et al.

Fang et al. Genetics 2012 Hufford et al. PLoS Genetics 2013

culm diameter

macrohairs, anthocyanin

Inv1n

Page 73: Evolutionary Genetics of Complex Genome

Pyhäjärvi et al. GBE 2013

Page 74: Evolutionary Genetics of Complex Genome

El Porvenir

Opopeo

Xochimilco

Puruandiro

Tenango del Aire

Ixtlan

Nabogame

Santa Clara

San Pedro

Allopatric

Inv4nFst high vs. low elevation maize

Hufford et al. PLoS Gen 2013

Page 75: Evolutionary Genetics of Complex Genome

4%ofB73

~8%absent

✓⇡

n�1X

i=1

1i

= Sreferencegenome~70%lowcopysequenceθπ~8%pairwisediff

1-S%pan-genomeinref

% r

eads

unm

appe

d re

ads

Goreetal.2009ScienceChiaetal2012NatGen

Page 76: Evolutionary Genetics of Complex Genome

4%ofB73

~8%absent

✓⇡

n�1X

i=1

1i

= Sreferencegenome~70%lowcopysequenceθπ~8%pairwisediff

1-S%pan-genomeinref

% r

eads

unm

appe

d re

ads

Goreetal.2009ScienceChiaetal2012NatGen

0%#

20%#

40%#

60%#

80%#

100%#

Angle# Length# NLB# SLB# Width#

10kb%RDV% Gene%RDV% HapMap2%genic%HapMap2%Intergenic% HapMap1%genic% HapMap1%Intergenic%

0#

2#

4#

6#

8#

10#

12#

14#

16#

18#

20#

Angle# Length# NLB# SLB# Width#

Intergenic#

Intronic#

500bp#

Upstream#

500bp#

Downstream#

3'#UTR#

NonHSyn#

Coding#

5'#UTR#

Splice#Site#

Syn#Coding#

0# 0.5# 1# 1.5# 2# 2.5# 3# 3.5#

Fold#Enrichment#

HapMapV2#SNPs#

HapMapV1#SNPs#

0#

5#

10#

15#

20#

25#

30#

35#

0# 50# 100# 150# 200# 250# 300#

pHvalue#(Hlog10)#

PosiVon#Along#Chr#1#(Mb)#

Intergenic# Intronic#SNPs#

UTR# UP/Down#Stream#

Syn#SNP# Splice#Site#

NonSyn#SNP# 10Kb#RDV#

Gene#RDV#

A.# B.# C.#

D.#

0%#

20%#

40%#

60%#

80%#

100%#

Angle# Length# NLB# SLB# Width#

10kb%RDV% Gene%RDV% HapMap2%genic%HapMap2%Intergenic% HapMap1%genic% HapMap1%Intergenic%

0#

2#

4#

6#

8#

10#

12#

14#

16#

18#

20#

Angle# Length# NLB# SLB# Width#

Intergenic#

Intronic#

500bp#

Upstream#

500bp#

Downstream#

3'#UTR#

NonHSyn#

Coding#

5'#UTR#

Splice#Site#

Syn#Coding#

0# 0.5# 1# 1.5# 2# 2.5# 3# 3.5#

Fold#Enrichment#

HapMapV2#SNPs#

HapMapV1#SNPs#

0#

5#

10#

15#

20#

25#

30#

35#

0# 50# 100# 150# 200# 250# 300#

pHvalue#(Hlog10)#

PosiVon#Along#Chr#1#(Mb)#

Intergenic# Intronic#SNPs#

UTR# UP/Down#Stream#

Syn#SNP# Splice#Site#

NonSyn#SNP# 10Kb#RDV#

Gene#RDV#

A.# B.# C.#

D.#

0%#

20%#

40%#

60%#

80%#

100%#

Angle# Length# NLB# SLB# Width#

10kb%RDV% Gene%RDV% HapMap2%genic%HapMap2%Intergenic% HapMap1%genic% HapMap1%Intergenic%

0#

2#

4#

6#

8#

10#

12#

14#

16#

18#

20#

Angle# Length# NLB# SLB# Width#

Intergenic#

Intronic#

500bp#

Upstream#

500bp#

Downstream#

3'#UTR#

NonHSyn#

Coding#

5'#UTR#

Splice#Site#

Syn#Coding#

0# 0.5# 1# 1.5# 2# 2.5# 3# 3.5#

Fold#Enrichment#

HapMapV2#SNPs#

HapMapV1#SNPs#

0#

5#

10#

15#

20#

25#

30#

35#

0# 50# 100# 150# 200# 250# 300#

pHvalue#(Hlog10)#

PosiVon#Along#Chr#1#(Mb)#

Intergenic# Intronic#SNPs#

UTR# UP/Down#Stream#

Syn#SNP# Splice#Site#

NonSyn#SNP# 10Kb#RDV#

Gene#RDV#

A.# B.# C.#

D.#

0%#

20%#

40%#

60%#

80%#

100%#

Angle# Length# NLB# SLB# Width#

10kb%RDV% Gene%RDV% HapMap2%genic%HapMap2%Intergenic% HapMap1%genic% HapMap1%Intergenic%

0#

2#

4#

6#

8#

10#

12#

14#

16#

18#

20#

Angle# Length# NLB# SLB# Width#

Intergenic#

Intronic#

500bp#

Upstream#

500bp#

Downstream#

3'#UTR#

NonHSyn#

Coding#

5'#UTR#

Splice#Site#

Syn#Coding#

0# 0.5# 1# 1.5# 2# 2.5# 3# 3.5#

Fold#Enrichment#

HapMapV2#SNPs#

HapMapV1#SNPs#

0#

5#

10#

15#

20#

25#

30#

35#

0# 50# 100# 150# 200# 250# 300#

pHvalue#(Hlog10)#

PosiVon#Along#Chr#1#(Mb)#

Intergenic# Intronic#SNPs#

UTR# UP/Down#Stream#

Syn#SNP# Splice#Site#

NonSyn#SNP# 10Kb#RDV#

Gene#RDV#

A.# B.# C.#

D.#

fold

enr

ichm

ent

Page 77: Evolutionary Genetics of Complex Genome

Renny-Byfield et al. In Prep

Chr 6 (Mb)

NOR repeat array

Page 78: Evolutionary Genetics of Complex Genome

Bilinski et al. In Prep

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

Page 79: Evolutionary Genetics of Complex Genome

Bilinski et al. In Prep

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

Page 80: Evolutionary Genetics of Complex Genome

Bilinski et al. In Prep

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

Page 81: Evolutionary Genetics of Complex Genome

mixed model for selection on genome size

altitudemean

slope (selection)kinshipgenome size

error

β1 < 0 11MB decrease per 100 meter gained

Bilinski et al. In Prep

Page 82: Evolutionary Genetics of Complex Genome

Bilinski et al. In Prep

Page 83: Evolutionary Genetics of Complex Genome

Bilinski et al. In Prep

Page 84: Evolutionary Genetics of Complex Genome

Bilinski et al. In Prep

bp o

f kno

b

Page 85: Evolutionary Genetics of Complex Genome

Rayburn et al. 1994 Plant Breeding Francis et al. 2008. Ann. Bot.

cycle time that did not exceed 20 h compared with a muchgreater spread of cycle times for the monocots. If DNAmass per se is the limiting factor for cell cycle time, wehypothesize that cycle times would be the same for dicotsand monocots of comparable C-value. This is so even ifthe data for Scilla sibirica and Trillium grandiflorum are

excluded. Indeed, if we ignore the marked discontinuityof the y-axis caused by their inclusion, then the nucleotypiceffect is strong for all species regardless of phylogeny. Totest the rigour of these hypotheses would require data toplug the gap between Trillium grandiflorum and themajority of C-value/cell cycle times analysed here.

Separate plots for diploids and polyploids show a strongnucleotypic effect on CCT in diploids (Fig. 3; Table 2).Removing the five diploid outliers (.25 pg) reduced theslope (b ¼ 0.27) by approximately four-fold but theregression continued to be significant (P , 0.001). Forthe polyploids, a nucleotypic effect on CCT was alsodetected (Fig. 3; Table 2); however, removing the two poly-ploid outliers rendered the regression non-significant (y ¼0.03x 2 13.5). This confirms previous work in which theslope/rate of increase in CCT with increasing DNA washigher in diploids than in autopolyploids (Evans et al.,1972). With the exception of Scilla sibirica, CCT in poly-ploids is generally more buffered than in diploids (Fig. 3).

We acknowledge that some traditionally classifieddiploids are not necessarily so (see Soltis and Soltis,1999). For example, there are strong arguments that Zeamays is actually an allotetraploid (2n ¼ 4x ¼ 20; Gaut andDoebly, 1997). However, in the data reported here wehave assigned ploidy level as listed by the authors of thepapers and reviews we have consulted.

The longest CCTs (.20 h) are exhibited by the peren-nials (Fig. 4). Indeed, the data for perennials overall had anearly seven-fold steeper slope (b ¼ 1.37) than a compar-able regression for annuals (b ¼ 0.20; Table 2). Thesedata are consistent with findings of Bennett (1972) wherethe mean CCT in 19 annuals was significantly shorterthan in eight obligate perennials. Where our analysesdiffer from Bennett (1972) is in relation to the broadrange of CCTs shown by perennials compared withannuals (Fig. 4). However, in Fig. 4 the longer CCTs

FI G. 3. DNA C-value (pg) and cell cycle time (h) in the root apical mer-istem of a range of diploid and polyploid angiosperms. See Table 2 for

regression analyses.

FI G. 2. DNA C-value (pg) and cell cycle time (h) in the root apical mer-istem of a range of (A) eudicots and monocots (n ¼ 110), and (B) eudicots

(n ¼ 60). See Table 2 for regression analyses.

TABLE 2. Regression analyses of all data presented inFigs. 2–4 together with the percentage variance accountedfor by the regression (R2), the level of probability (P) for

each regression

Regression (y ¼ bx þ a) R2 P n

All measurements y ¼ 1.09x þ 5.39 54.2 *** 110Monocots y ¼ 1.29x þ 2.44 58.7 *** 48Eudicots y ¼ 0.32x þ 10.2 15.4 *** 62Diploids y ¼ 1.04x þ 4.95 49.86 *** 86Polyploids y ¼ 1.14x þ 3.12 56.3 *** 24Annuals y ¼ 0.20x þ 10.7 19.9 *** 75Perennials y ¼ 1.37x þ 4.13 63.6 *** 35

*** P , 0.001; n, number of replicates.

Francis et al. — DNA C-value and the Cell Cycle750

at University of C

alifornia, Davis - Library on February 19, 2013

http://aob.oxfordjournals.org/D

ownloaded from

late flowering

early flowering

0

10

20

30

100 105 110DNA

plants

cycle0

6

smaller genome, faster development?

Page 86: Evolutionary Genetics of Complex Genome

Bilinski et al. In Prep

Page 87: Evolutionary Genetics of Complex Genome

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

Bilinski et al. In Prep

Page 88: Evolutionary Genetics of Complex Genome

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

Bilinski et al. In Prep

Page 89: Evolutionary Genetics of Complex Genome

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

2.50

2.75

3.00

3.25

3.50

3.75

MH ML SAH SAL mexicana parviglumis

1C G

enom

e Si

ze (G

b)

Altitudehighland

lowland

Bilinski et al. In Prep

Page 90: Evolutionary Genetics of Complex Genome

• Adaptation in maize occurs from standing variation and targets regulatory variants

• Large genomes may have more targets, more standing variation, and more regulatory adaptation

• Adaptation in complex plant genomes likely involves many kinds of variation including transposable elements, inversions, copy number variation, and even genome size?

Evolutionary Genetics in a Complex Genome

Kew C-Value Database

Page 91: Evolutionary Genetics of Complex Genome

photo by lady_lbrty

Acknowledgments

Maize Diversity GroupPeter Bradbury

Ed Buckler John Doebley Theresa Fulton

Sherry Flint-Garcia Jim Holland

Sharon Mitchell Qi Sun

Doreen Ware

CollaboratorsCSI Davis

Nathan Springer

Lab AlumniTim Beissinger (USDA-ARS, Mizzou)

Kate Crosby (Monsanto) Matt Hufford (Iowa State)

Tanja Pyhäjärvi (Oulu) Shohei Takuno (Sokendai)

Joost van Heerwaarden (Wageningen)

Page 92: Evolutionary Genetics of Complex Genome