Upload
vukiet
View
233
Download
0
Embed Size (px)
Citation preview
Applications of Genotyping-by-Sequencing
for Wheat Breeding and Genetics
Jesse Poland
Kansas State University
Aug 22, 2013 1
Next Generation Genomics and
Integrated Breeding for Crop Improvement
4th International Workshop
ICRISAT, India
Feb 19, 2014
Aug 22, 2013 2
Early Generation Testing
Prelim Yield testing
(thousands)
Replicated Yield testing
(hundreds)
Advanced Yield testing
(tens)
Years
Varieties (one)
Crossing
Un-adapted
Exotic Elite
The Breeding
Funnel
Aug 22, 2013 3
Early Generation Testing
Prelim Yield testing
(thousands)
Replicated Yield testing
(hundreds)
Advanced Yield testing
(tens)
Years
Varieties (one)
Crossing
Elite Un-adapted
Exotic
The Breeding
Funnel
Genomic Selection
F2 enrichment
(MAS)
Back cross conversion
Parent selection
Molecular Markers and Objectives
Single Locus Typing
- Target known genes
- Few loci (<10)
Marker assisted selection
Backcross conversion
Cost per data point
Whole-genome Profile
- Assay whole genome
- Many loci (thousands)
Genomic Selection (AM)
Background selection
Diversity study
Germplasm typing
Cost per sample
Aug 22, 2013 4
Why use sequencing for genotyping rather than array based methods?
+ Amazing developments in sequencing output
+ Very good for wheat where polyploidy and duplications cause
problems with hybridization/PCR assays
+ Polymorphism discovery simultaneous with genotyping
+ No ascertainment bias
+ Low per sample cost
- Complex bioinformatics
- Requires paradigm shift in
molecular markers
Aug 22, 2013 5
Genotyping-by-sequencing (GBS)
Genotyping-by-sequencing (GBS)
“massively parallel sequencing” = sequencing on Illumina
platform
“multiplex” = using DNA barcode (unique 5-10bp)
- unique DNA sequence synthesized on the adapter
- pool 48-384 samples together
“reduced-representation” = use restriction enzyme to capture only
the portion of the genome flanking restriction sites
- methylation-sensitive restriction enzymes
- Target specific (rare, low-copy) sites in genome
- PstI (CTGCAG), MspI (CCGG)
Aug 22, 2013 6
“…massively parallel sequencing of multiplexed reduced-representation
genomic libraries.”
Aug 22, 2013 7
Poland, J. A. and T. W. Rife (2012) Genotyping-by-Sequencing for Plant Breeding and
Genetics. Plant Gen. 5(3): 92-102.
Application of GBS:
Aug 22, 2013 8
Genomic Selection
Genomic Selection Needed:
1) Training Population (genotypes + phenotypes)
2) Selection Candidates (genotypes)
Aug 22, 2013 9
Heffner, E.L., M.E. Sorrells, J.-L. Jannink. 2009. Genomic selection for crop improvement.
Crop Sci. 49:1-12. DOI: 10.2135/cropsci2008.08.0512
Inexpensive, high-density genotypes
Accurate phenotypes
Using GBS for GS
Is genotyping-by-sequencing a suitable marker platform
for genomic selection?
CIMMYT Semi-Arid Wheat Screening Nursery (SAWSN)
N = 254, advanced lines
Replicated field trials, Cd. Obregon, Mexico
Aug 22, 2013 10
Poland, J., J. Endelman, J. Dawson, J. Rutkoski, S. Wu, Y. Manes, S. Dreisigacker, J. Crossa, H.
Sanchez-Villeda, M. Sorrells, J.-L. Jannink. 2012. Genomic selection in wheat breeding using
genotyping-by-sequencing. Plant Gen. 5:103-113. DOI: 10.3835/plantgenome2012.06.0006
Using GBS for GS
CIMMYT Semi-Arid Wheat Screening Nursery (SAWSN)
GBS: PstI-MspI, 96-plex
HiSeq2000 = 180M – 210M reads / lane
41,371 SNPs 35K
DArT markers (n = 1,729)
Ridge-regression (rrBLUP)
7 sets for field trials – grouping factor
for cross-validation
No full-sib lines in same trial
11
Endelman, J.B. 2011. Ridge regression and other kernels for genomic selection with R package
rrBLUP. Plant Gen. 4:250-255. DOI: 10.3835/plantgenome2011.08.0024
Poland, J., J. Endelman, J. Dawson, J. Rutkoski, S. Wu, Y. Manes, S. Dreisigacker, J. Crossa, H. Sanchez-Villeda, M.
Sorrells, J.-L. Jannink. 2012. Genomic selection in wheat breeding using genotyping-by-sequencing. Plant Gen.
5:103-113. DOI: 10.3835/plantgenome2012.06.0006
Prediction of wheat quality
Aug 22, 2013 12
CIMMYT elite breeding lines (n=164) Cycle 45 International Bread Wheat Screening Nursery (C45IBWSN)
Replicated yield tests
2009 & 2010
6 environments
One replication for quality testing
milling
dough rheology
baking tests
Best Linear Unbiased Estimate (BLUE)
Genotyping-by-sequencing
15,330 SNPs (imputed with MVN-EM)(rrBLUP)
Cross-validation (x100)
Training sets of n=134
Validation sets of n=30
- thousand kernel weight
- mix time
- pup loaf volume
Sarah Battenfield, KSU
Aug 22, 2013 13
650
700
750
800
850
900
950
650 700 750 800 850 900 950
Ob
serv
ed
(cc)
Predicted (cc)
Loaf Volumes
r =
0.54
1
2
3
4
5
6
1 2 3 4 5 6
Ob
serv
ed
(m
in)
Predicted (min)
Mixograph Mix Times
r =
0.67
Sarah Battenfield, KSU
Training
Population
Cross
valid
both
years
Cross
valid
both
years
Cross
valid
both
years 2011 2010
Training Size (n) 1138 995 712 995 712
Testing
Population
Cross
valid
both
years
Cross
valid
both
years
Cross
valid
both
years 2010 2011
Testing Size (n) 569 712 995 712 995
Prediction Accuracies ( r )
Test Weight 0.725*** 0.723*** 0.715*** 0.312*** 0.192***
Grain Hardness 0.513*** 0.510*** 0.495*** 0.005 0.056
Grain Protein 0.630*** 0.629*** 0.620*** 0.400*** 0.335***
Flour Protein 0.604*** 0.602*** 0.589*** 0.394*** 0.284***
Flour SDS
Index 0.666*** 0.666*** 0.661*** 0.433*** 0.461***
Mixograph Mix
Time 0.718*** 0.715*** 0.707*** 0.535*** 0.499***
Alveograph W 0.697*** 0.695*** 0.683*** 0.512*** 0.475***
Alveograph P/L 0.476*** 0.474*** 0.466*** 0.323*** 0.278***
Loaf Volume 0.638*** 0.634*** 0.625*** 0.358*** 0.333***
Prediction of wheat quality
Regional Performance Nursery
Established 1931
1992 – Present
39 Locations
80,000 Plots
350,000 Obs.
792 SRPN + 428 NRPN Entries
44,924 SNPs
3,966 SNPs > 80%
Feb 17, 2014 14
Trevor Rife, KSU
Genomic Selection Accuracy: SRPN
Feb 17, 2014 15
Trevor Rife, KSU
RPN Locations: Evaluating Environments
Feb 17, 2014 16
Trevor Rife, KSU
Feed the Future Innovation Lab for
Applied Wheat Genomics
Dec 2, 2013 17 www.wheatgenetics.org/research/innovation-lab
Aug 22, 2013 18
Application of GBS:
Characterizing genetic diversity
Characterizing genetic diversity
Aug 22, 2013 19
Tanksley, S.D., S.R. McCouch. 1997. Seed banks and molecular maps: Unlocking genetic potential
from the wild. Science 277:1063-1066. DOI: 10.1126/science.277.5329.1063
Aegilops tauschii
(2n=2x=14, DD)
Triticum urartu
(2n=2x=14, AA)
~ Aegilops speltoides
(2n=2x=14, SS)
Triticum turgidum
(2n=4x=28, AABB) Triticum aestivum
(2n=6x=42, AABBDD)
Hybridization Events Forming Modern Wheat
Aug 22, 2013 20
Eric Olson, Michigan State University
Wheat Genetic Resource Center:
Aegilops tauschii collection
• 531 unique accessions
• Physiological classifications
• Genotyped with GBS
Aug 22, 2013 21
Aug 22, 2013 22
-0.05 0.00 0.05
-0.1
0-0
.05
0.0
00.0
50.1
00
.15
WGRC Ae. tauschii Collection
PCA1
PC
A2
tauschii
strangulata
typica
meyeri
56 %
5 %
Genetic groups
contrasting to
morphological
characteristics
Aug 22, 2013 23
-0.20 -0.15 -0.10 -0.05 0.00 0.05
-0.1
5-0
.10
-0.0
50
.00
0.0
50.1
00.1
5
WGRC Ae. tauschii Collection
PCA3
PCA4
Azerbaijan
Afghanistan
Iran
Tajikistan
Turkey
Pakistan
tauschii
strangulata
typica
2 %
2 %
Genetic separation
of geographically
separated groups
Genetic Diversity: Wheat vs Ae. tauschii
Aug 22, 2013 24
-0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04
-0.1
0-0
.05
0.0
00.0
5
WGRC Ae. tauschii Collection
PCA1
PC
A2
Ae.tauschii
wheat
63 %
10
%
-0.10 -0.05 0.00 0.05 0.10 0.15
-0.2
5-0
.15
-0.0
50
.00
0.0
5
WGRC Ae. tauschii Collection
PCA3
PC
A4
Ae.tauschii
wheat
2 %
2 %Limited diversity in
elite breeding pool
Aug 22, 2013 25
Application of GBS:
Genetic Mapping
Mapping Resistance in Synthetic Populations
Dec 2, 2013 26
LOD 5% TRTTF = 4.2
LOD 5% QTHJC = 3.7
0
5
10
1A 1B 1D 2A 2B 2D 3A 3B 3D 4A 4B 4D5A 5B 5D 6A 6B 6D 7A 7B 7DChromosome
LO
D s
core
Stem rust resistance to race TRTTF and QTHJC in the SynOpDH population
Sandra Dunckel, KSU; Eric Olson, MSU; Matthew Rouse, USDA-ARS CDL
Sorrells, M. E., J. P. Gustafson, D. Somers, S. Chao, D. Benscher, G. Guedira-Brown, E. Huttner, A. Kilian, P. E. McGuire, K. Ross, J. Tanaka, P. Wenzl,
K. Williams and C. O. Qualset (2011). "Reconstruction of the Synthetic W7984 × Opata M85 wheat reference population." Genome 54(11): 875-882.
Dec 2, 2013 27
Traci Viinanen, University of Chicago; Lee DeHaan, The Land Institute
“F2” population
T7DL7J#1S
pAs1:green
GAA:red
Th. intermedium
GISH
Sr44
High-density genetic
maps for any species
High-density Genetic Map: Thinopyrum intermedium
Aug 22, 2013 28
Application of GBS:
Marker Assisted Selection
“Spiked GBS” A unified open platform for single marker genotyping and
whole-genome profiling
Utilize 1% of sequencing lane for targeted amplicon
sequencing
• Cost effective
• High-throughput
• Flexible: Single set of barcodes combined with
locus specific primers
Aug 22, 2013 29
QC & QUANTIFY
NORMALIZE DNA
DIGEST
LIGATE
ADAPTERS
WHOLE GENOME PROFILE (GBS)
($10 – 20 PER SAMPLE1) SINGLE LOCUS GENOTYPING (TARGET AMPLICONS)
(~ $0.03 PER GENOTYPE2)
PCR AMPLIFY TARGETS WITH M13 BARCODE PRIMERS
POOL
SAMPLES PCR AMPLIFY
POOL
QC AND QUANTIFY QC & QUANTIFY
“SPIKE” AMPLICON
LIBRARY AT 1%
NEXT-GEN SEQUENCING
GBS BIOINFORMATICS PIPELINE
~198M READS
RAW SEQUENCING DATA
~200M READS
~50,000 MARKERS ON 96 INDIVIDUALS
0.5X COVERAGE ~10 MARKERS ON 384 INDIVIDUALS
500X COVERAGE
TARGETED AMPLICON BIOINFORMATICS PIPELINE
~2M READS
1THE ESTIMATED COST PER SAMPLE IS BASED ON THE NUMBER OF SAMPLES THAT ARE MULTIPLEXED INTO A SINGLE SEQUENCING RUN AND THE COST OF THE
SEQUENCING. PER SAMPLE COST OF $10 CORRESPONDS TO GENOTYPING 190 INDIVIDUALS IN A MULTIPLEX SEQUENCING RUN. 2ESTIMATED COST PER DATA POINT FOR GENOTYPING 10 MARKERS ON 384 INDIVIDUALS.
Aug 22, 2013 30
“Spiked GBS”: SNP genotyping
• 96 winter wheat accessions
• GBS library
• Amplify 4 SNP loci and add at 1%
• “Converted” KASPar Markers
(removed selective bp, add tail for barcode)
Aug 22, 2013 31
0 2000 4000 6000 8000 10000 12000
02
000
4000
600
08
00
01
000
0
BS00023148
allele cnt A
allele
cnt
B
0 1 2 3
01
23
Lr34_exon12kasp
allele cnt A (log10)
allele
cnt B
(lo
g10)
0.0 0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
1.5
Lr34_exon22kasp
allele cnt A (log10)
allele
cnt B
(lo
g10)
Aug 22, 2013 32
Application of GBS:
Variety Identification and Typing
Variety Confirmation and Identification
Aug 22, 2013 33
Breeder Seed Foundation Seed Production
Confirm Variety Identify Variety
Mixed up seed? A tail of two samples…
Feb 17, 2014 34
EXTENSION SAMPLE 1 EXTENSION SAMPLE 2
REFERENCE PERCENT IDENTITY PERCENT IDENTITY
Sample1 100.0% 74.0%
Sample2 74.0% 100.0%
Doans 74.7% 74.5%
Dodge 76.6% 72.6%
Duke 73.7% 74.8%
Dumas 74.1% 75.0%
Duster 76.0% 78.7%
E2041 79.3% 76.1%
Eagle 79.0% 76.8%
Endurance 74.4% 99.8%
Enhancer 76.9% 77.3%
Everest 75.2% 76.4%
Expedition 75.9% 77.7%
Fannin 75.5% 76.2%
Fuller 98.7% 74.7%
G1878 75.2% 74.6%
Gage 77.4% 76.5%
Gallagher 75.6% 81.0%
Garrison 78.3% 77.2%
2 sub-samples from each lot
extracted DNA
genotyped (along with a larger panel)
GBS
47,076 DNA markers
< 5% heterozygous markers
pure line varieties
74% identical markers
different varieties
Compared to reference panel and
varieties identified as Fuller (#1) and
Endurance (#2)
Aug 22, 2013 35
Ravi Singh
David Bonnett
Yann Manes
Susanne Dreisigacker
Jose Crossa
Hector Sanchez
Shuangye Wu★
Josh Sharon
Ryan Steeves
Jared Crain
Sandra Dunckel
Trevor Rife★
Traci Viinanen★
Narinder Singh
Daljit Singh
Lisa Borello
Bikram Gill
Bernd Friebe
Sunish Seghal
Jon Raupp
Allan Fritz
Sarah Battenfield
Chris Gaynor
Ed Buckler
Rob Elshire
Jeff Glaubitz
Jean-Luc Jannink
Mark Sorrells
Jeffrey Endelman★
Julie Dawson
Jessica Rutkoski★
Nils Stein
Martin Mascher ★
Burkhard Steuernagel
Thomas Nussbaumer
Robbie Waugh
Hui Liu