Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Rapidly mapping genes related to soy-
bean seed characters by NGS-based BSA
mapping strategy
Yong Guo
Institute of Crop Science,
Chinese Academy of Agricultural Sciences
June 19th, 2018
Outline
Background
Case study I: mapping genes controlling
soybean cotyledon color
Case Study II: mapping QTLs related to
soybean seed weight
Conclusions
Acknowledgements
Soybean is an important legume crops in the world. It provide important
sources of vegetable oil and plant proteins.
1. Background
The duplicated genome restrict gene isolation in soybean
Schmutz et al. nature, 2010, 463(7278): 178-183.
13MYA 59MYA
Stem growth habit: Dt1、Dt2
Flowering and maturity: E1-E4
SCN resistance: Rhg1、Rhg4
Salt tolerance: GmSALT3, GmCHX1
Leaflet shape: Ln
Pod shattering: qPDH1、SHAT1-5
Seed-hardness: GmHs1-1
Paleopolyploid genome
Methods for identifying genes involved in specific traits
(Takeda et al., Nature Reviews Genetics, 2008, 9(6): 444-457)
Low-throughput and time-consuming of classical approaches
Segregating population
development
Genome-wide investigation of
polymorphic molecular markers
Identification of the most relevant
candidate regions
Fine-mapping by increasing
marker density in target region
Development of physical maps
Candidate gene isolation and
validation
Jeong et al. Plant Cell 2012;24:4807-4818
Next generation sequencing make the sequencing
costs dramatically reduced
Methods for identifying genes involved in specific traits
Lindner et al., Genetics, 2012, 191(4): 1381-1386.
Methods for identifying genes involved in specific traits
Abe et al., Nature biotechnology, 2012, 30(2): 174-178
There are limit mutant resources in soybean
Rice:tens of thousand of mutants available
1536 mutants
Two soybean mutant libraries:
Fast Neutron: Bolon et al., 2011
EMS: Tsuda et al., 2015
G.max
G.soja
Resequencing
1 G.soja + 1 G.max
Resequencing
25 G.sojia+30 G.max
De novo seq
7 G.soja
250M ? 19.6M ? ? ? 712 ?
510M ? 70M ? ? ? ? ?
510M 480M 85M 15M 726 1179 16 338
Kim et al. PNAS, 2010; Li et al. BMC Genomics, 2013; Li et al. NB 2014
High genetic diversity among different soybean accessions
SNP SNP missed
in Re-seq
Small
InDel
Large
InDel CNV-gain CNV-loss G.max-
specific
G.soja-
specific
The objective of this study
Development of NGS-based BSA mapping approach in
soybean using segregating population derived from
germplasm
Validation of the reliability and efficiency of BSA-seq in
fine mapping of genes/QTLs in species with particularly
sizeable or complex genomes
Mapping of genes regulating soybean cotyledon color
and seed weight using developed NGS-based mapping
method
2. Case study I: Mapping genes controlling cotyledon color
Cotyledon color is an important morphological trait for breeding and
germplasm classification
Most of cultivated soybean showed yellow cotyledon color and only a
few exhibited green one
Qualitative trait: three inheritance patterns--maternal inheritance,
double and single gene inheritance
Yellow Green
A segregating population derived from two parental lines with distinct
cotyledon colors was developed
Development of segregating population
Populati
ons
Total
number of
seeds
Seeds with
Yellow
Cotyledon
Seeds with
Green
Cotyledon
Observat
ion Ratio
χ2
(15:1) P-value
130028-1 314 295 19 15.5:1 0.0008 0.8841503
130028-3 341 319 22 14.5:1 0.0018 0.8777637
130028-4 247 234 13 18.0:1 0.2594 0.5217025
130029-1 270 251 19 13.2:1 0.1669 0.5931627
130029-2 252 232 20 11.6:1 0.9524 0.2687178
130030-1 248 234 14 16.7:1 0.0688 0.6939535
130030-2 374 347 27 12.9:1 0.4456 0.4387141
130030-3 247 231 16 14.4:1 0.0003 0.8824539
130030-4 258 244 14 17.4:1 0.1747 0.5846935
130034-1 337 316 21 15.0:1 0.0097 0.9887781
130034-2 302 277 25 11.1:1 1.7881 0.1453782
separation ratio of cotyledon color in F1 all fit 15:1
Investigation of seed cotyledon color in 11 different plants
Soybean cotyledon color is controlled by two genes
Yellow Green
all yellow 3:1 15:1 all green
Cotyledon color in this cross was controlled by two genes and the
green cotyledon trait carried from Jiyu102 was recessive.
Construction of BSA pools for next gerneration seqeucing
Yellow Green
all yellow 3:1 15:1 all green
YC-bulk GC-bulk
ZH30 JY102
30 lines 30 lines
Four DNA samples were used to construct libraries and subjected for
whole genome sequencing using Illumina HiSeq 2500 platform
Sample ID YC-bulk GC-bulk ZH30 JY102
Clean Reads 486,749,106 467,745,622 108,602,086 84,534,592
Clean Base 61,327,351,325 58,897,998,134 13,683,105,387 10,650,695,091
Q20(%) 91.6 91.4 91.5 92.6
Q30(%) 85.1 85.0 85.1 85.7
Mapped
ratio(%) 94.7 93.8 94.8 94.6
Average depth 59X 53X 12X 9X
Coverage_ratio
_1X(%) 95.2 93.5 93.2 89.7
Coverage_ratio
_5X(%) 90.9 85.0 76.8 69.2
Coverage_ratio
_10X(%) 87.9 80.0 57.4 44.1
Summary of Illumina sequencing data
A total of 1,084,921 SNPs and 157,839 small InDel were identified between
the parental lines ZH30 and JY102
Calculation of SNP index and Δ (SNP-index)
SNP filtering: quality score >=100
read depth >=10
SNP index = Count of alternate base (JY102)/Count of reads aligned
Δ (SNP-index) = SNP index in GC-bulk - SNP index in YC-bulk
P-value in Fisher’s exact test for the each SNP locus between GC-
and YC- bulks was also calculated.
Sliding window analysis of SNP index and Δ (SNP-index)
Average SNP-index, Δ (SNP-index) and P value were calculated across
a 2-Mb genomic interval using a 10-kb sliding window
Two candidate regions with statistically significant were identified
qCC1: 54.15-56.83Mb on Chromosome 1
qCC2: 0-2.68Mb on Chromosome 11
Average SNP-index of GC-bulk
>0.9, average P-value<0.05
SNP analysis of candidate regions
qCC1 locus: 2.68 Mb interval region
2,843 SNPs between parental lines
2,284 SNPs had an index of 1.0 in the GC-bulk
251 SNPs result in changes of the coding sequences
qCC2 locus: 2.68 Mb interval region
1,237 SNPs between parental lines
870 SNPs had an index of 1.0 in the GC-bulk
102 SNPs result in changes of the coding sequences
Fine mapping of qCC1 to a 30.7-kb region
Four annotated genes
Candidate gene analysis in qCC1 region
39 SNPs: 21 in genes ,two synonymous and one non-synonymous variations.
15 small InDels: one in exon of Glyma. 01g214600 and the other in intron of
Glyma. 01g214700
Fine mapping of qCC2 to a 67.7-kb region
Nine annotated genes
15 SNPs: All SNPs could not alter amino acid sequence of encoding proteins.
One small InDels: alteration occurs in dominant parental line Zhonghuang30
Candidate gene analysis in qCC2 region
qCC1/2 are the same as previous identified D1/D2 genes
Fang et al., Plant J, 2014; Nakano et al., PCP, 2014
3. Case study II: Mapping QTLs related to seed weight
100-Seed weight is a key component of soybean yield trait
Pod number
× Seed weight
Seed size
Seed yield per plant
Seed number
Seed number per pod
Wild soybean Landrace Modern cultivar
Genetic inheritance of 100-SW in a RIL population
XL ZH28
Year Parent RIL mapping population
Zhonghuang28 Xiaoling Mean S.D Range CV%
2013 20.2 9.8 15.8 2.4 10.3-
23.8 15.4
2014 20.8±1.6 11.4±0.7 15.9 2.3 8.9-24.1 14.6
LS-bulk HS-bulk
Summary of Illumina sequencing data
A total of 1,216,848 clean SNP were identified between the parental
lines ZH28 and XL
Samples HS-bulk LS-bluk Zhonghuang28 Xiaoling
Clean_Reads 314,489,982 284,619,824 108,190,706 81,497,870
Clean_Base 39,622,603,336 35,856,859,929 13,628,408,688 10,267,871,894
Q20(%) 91.76 91.84 95.04 91.76
Q30(%) 85.45 85.36 89.00 85.42
Mapped ratio(%) 97.83 98.31 96.17 95.81
Ave_depth 35X 31X 12X 9X
Cov_ratio_
1X(%) 98.64 86.99 97.82 94.43
Cov_ratio_
5X(%) 93.48 77.53 90.7 76.46
Cov_ratio_
10X(%) 87.96 70.56 66.48 43.05
QTL-seq identified major QTLs on chromosome 20 H
SB
S
NP
-In
de
x
LS
B
SN
P-I
nd
ex
LS
B-H
SB
∆
SN
P-I
nd
ex
The major QTL located on the tail of Chr.20
Δ (
SN
P-i
nd
ex)
Physical position: 34.22-36.75, 40.95-43.35Mb
SNP marker development for genotyping of RILs
50 SNPs were selected from candidate region and RIL population was
genotyped using Sequenom MassARRAY iPLEX platform
The QTL was mapped to a 187-kb region
19 genes were annotated in the 187kb candidate region
11 genes have non-synonymous variations between parental lines
Year 2016 2017
Left Marker SNP22 SNP22
Right Marker SNP25 SNP25
LOD 7.4923 6.2975
PVE(%) 16.8764 15.946
Add 0.9881 0.7472
4. Conclusions
Genome-wide NGS-based BSA mapping approach was
developed in soybean using segregating population
derived from germplasm
Two loci controlling cotyledon color were identified
and fine mapped to 30.7-kb and 67.7-kb interval
Two stay green genes were located in fine mapping
regions and the sequence variant of one gene was
directly identified by whole genome sequencing
A major QTL related to seed weight was identified by
NGS-based mapping method and fine mapped to 187-
kb region,19 annotated genes
5. Acknowledgements
Prof. Lijuan Qiu
Jian Song
Fulai Zhou