30
Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski, Justin Borevitz Dept. of Ecology and Evolution University of Chicago

Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Embed Size (px)

Citation preview

Page 1: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Genomic diversity and population structure in switchgrass, Panicum virgatum:

Genotyping-by-sequencing and population genomics

Geoff Morris*, Paul Grabowski, Justin BorevitzDept. of Ecology and Evolution

University of Chicago

Page 2: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Genomic diversity and population structure

• Geographic patterns of genomic diversity reflect: drift, migration, and adaptation

• Genomic diversity: nucleotide variation and insertions/deletions across many loci in the nuclear and organellar genomes.

• Leads to design of mapping populations for quantitative genetics and molecular breeding

Page 3: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Genomic diversity and natural history

Emerson et al. PNAS 2010

Example: Pitcher plant mosquito (Wyeomyia smithii)

Page 4: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Ecotypic diversity in switchgrass

• Switchgrass and other wide-ranging grassland species have many ecotypes

• Great variability in size, shape, color, and habitat preference• Example: Upland/lowland divergence

Upland (Michigan) Lowland (Oklahoma)

Adapted to: Shorter growing season,Drier climates

Adapted to: Long growing season,Wet climates

Page 5: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Effects of ecotype diversity of productivity

• Three year plot (6m2) experiment at Fermilab• ~20% overyield in switchgrass mixtures compared to

monocultures

Page 6: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

“Genomic diversity and population structure in switchgrass, Panicum virgatum: from the continental scale to a dune landscape”

Morris, Grabowski, and BorevitzAccepted, Molecular Ecology

Page 7: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Biogeography of Indiana Dunes flora

Coastal Plain flora: e.g. Seaside spurge, Marramgrass

Boreal flora: e.g. Jack Pine, Bearberry

Great Plains flora: e.g. Sandreed, Little Bluestem

Eastern deciduous flora: e.g. Tulip tree

Recolonized post-glaciacation: ~10,000 years ago

Page 8: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Switchgrass gene pools

Zhang et al. 2011

?

Page 9: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Landscapes in Indiana Dunes

Landscape features are dynamic and can be dated:•100s – 1000s of years for dunes•10s – 100s of years for blowouts

Big blowout ~ 150 years old

Page 10: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Study questions

• Can switchgrass population structure be confirmed with a genome-wide sample of non-ascertained markers?

• In a hierarchical sample of switchgrass, how much diversity is there on a landscape, regional, and continental scale?

• Did multiple switchgrass gene pools contribute to the Indiana Dunes populations?

• Is there genomic diversity in a single landscape feature (blowout)?

• Is there local (private) genetic diversity in the Indiana Dunes?

Page 11: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Switchgrass plant samples

• Switchgrass cultivated varieties (cultivars)– Kanlow (Oklahoma - lowland)– Blackwell (Oklahoma - upland)– High Tide (Maryland - Coastal)– Forestburg and Sunburst (South Dakota)– Dacotah (North Dakota)– Cave-in-Rock (Illinois)– Southlow (Southern Michigan “ecopool”)

• Indiana Dunes switchgrass– Big Blowout– Jack pine savanna– Interdune

Page 12: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Problems with traditional markers systems

• Locus sampling:– Typically only a few kb are sequenced in a few loci (rDNA, cp introns)

– Large stochastic error and loci-specific bias

– e.g. Plant chloroplast has 100X lower rate of evolution than animal mitochondria

• Ascertainment bias:– Occurs whenever markers are discovered and typed separately

– Worst when ascertainment panel is geographically restricted subpopulation

– e.g. Inferred genetic diversity in Africans is spuriously low when when European markers are used

Page 13: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

= restriction site1) PstI digest of genomic DNA

2) End-polish, blunt-end ligation; Illumina barcodes

3) PCR amplify and pool fragments from multiple samples

4) Assemble and map reads to “stacks” and call SNPs

Genomic diversity from de novo sequencing

• Reduced representation + multiplexing = more samples• 10,000+ candidate SNPs• No reference genome needed• Data here from 76 or 100 bp paired end reads• 40 billion base pair data set

Page 14: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Plastome sequence in RRLs

• Nuclear whole genome shotgun sequence is too light (<<1X) for assembly

• Plastome WGS is very high (>>1X)

1) PstI digest of genomic DNA, with star activity and random shearing

2) End-polish, blunt-end ligation

Page 15: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Analysis of chloroplast data

• Chloroplast genome sequence (plastome) included in data• Random (shotgun) sequence + 20 PstI sites• Switchgrass chloroplast reference available (Upland and

Lowland)• Mapped reads to both ~140,000 base pair chloroplast

genomes• Coverage (# of times each position is read): 1X – 786X

Page 16: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Chloroplast coverage and polymorphisms

Position (kb)

ChloroplastGenomeCoverage

Page 17: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Chloroplast phylogeny

• Neighbor joining tree based on 140kb

• Named haplogroups have >50% bootstrap

• Unfilled lines indicate low-coverage sample

Page 18: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Chloroplast phylogeny

Page 19: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Chloroplast phylogeny

Page 20: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Population analysis of nuclear loci

• Create “pseudoreference” of RRL loci with de novo assembly

• Map reads to pseudoreference to create stacks (150-1500 reads)

• Map reads to switchgrass chloroplast and sorghum mitochondria, and drop stacks that match organelles

• Select single-nucleotide variants that:

– Have high sequence quality (PHRED score < 0.001 for both alleles)

– Vary in frequency across samples (chi-square < 0.01)

– Are nearest to restriction site, closest to beginning of read

• Randomly select one allele per sample (weighted by observed frequency)

Page 21: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Coding sequence variation in the chloroplast

• 77 coding genes in chloroplast (including Rubisco, ribosome, etc)

– 60kb of coding sequence

• Constraints in non-synonymous (NS) vs. synonymous (S) variation provides biological validation for SNPs

• Upland vs. Lowland (~1 million years):

– 23 NS : 16 S (ratio = 1.4)

• Within upland ( < 0.5 millions years)

– 16 NS : 3 S (ratio = 5.3)

Page 22: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Nuclear genome: Multidimensional scaling

~11000 nuclear loci, mean of 100 random allele samples

Page 23: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Nuclear loci: Structure analysis

Bayesian clustering algorithm ~11000 nuclear loci, random allele sample, Burn-in 10K, Run 10K

Page 24: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,
Page 25: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Conclusions

• Confirmed upland vs. lowland differentiation and differentiated a local population using non-ascertained markers

• Lake Michigan switchgrass is distinct from broader upland population in midwest and Great Plains.

• Post-glacial gene flow into the Indiana Dunes included genotypes from across the Great Plains and Midwest

• The chloroplast diversity in the Indiana Dunes did not evolve in the current midwestern population, but originated one or more glacial cycles ago

• A single blowout in the dunes can have as much chloroplast diversity as the Midwest

Page 26: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

New GBS methods for population genomics

• For true population analysis we need 10+ individuals in multiple populations

• Illumina multiplexing is too expensive – separate prep cost for each library adds $100s/sample

• Read count overdispersion (up to ~200X more Poisson) requires technical replicates to even counts

• Sticky-end ligation increases specificity and removes random sequence (including plastome)

Page 27: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Genotype-By-Sequencing (GBS)Based on Elshire et al. 2011, PlosONE

Page 28: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

GBS on continental + dunes switchgrass

Page 29: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

New population genomic studies with GBS

1. Continental population structure (126 individuals)– 50/50 deep diversity and shallow diversity based on chloroplast

markers and SSRs

2. Tetraploid cultivars (24 each for TX, OK, NE, ND cultivars)– Ploidy differences may be confounded with genetic diversity– High sample size should allow traditional pop gen analyses (Fst etc...)

3. Dune half-sibs (4 mothers and 10 offspring each)– True SNPs will segregate in the offspring while homeologous

substitutions will not

Page 30: Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Bioinformatics overview

• No software package for population genomic analysis on GBS• Stacks (U. Oregon) comes closest but multinomial sampling

model expects high frequency SNPs (e.g. mapping population)• Buckler lab TASSEL package (Java) may be appropriate • We’ve been using custom pipeline (CLC, MySQL, R) for

analysis– http://create.ly/gefxsub43