Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Generation of a 345K Sugar cane SNP chip
Classification: Public
Karen S Aitken, Andrew Farmer, Paul Berkman, Cedric Muller, Mike Magwire, Bob Dietrich,Xianming Wei, Emily Deomano and Raja Kota
Varieties
ReverseGenetics
ForwardGenetics
BreedingValues
• Using marker-trait associations toassemble an ideal genotype (ideotype)
Forward Genetics
• QTL mapping (bi-parental andmulti-parental populations)
• Genome Wide Association mapping
Reverse Genetics
• Candidate gene association mapping
• Using multiple trait selection indices todevelop lines with superior wholegenome breeding values using molecularmarkers (genomic/genome wideselection - GWS)
Strategy: Integrating -Omics with Breeding
Classification: Public
Modern Sugar Cane Genome Size
Total genome size: 7500-10000 Mb Sorghum: 800 Mb Rice: 400 MbMaize: 2500 Mb
Each individual in a cross yields aunique arrangement of chromosomesdue to random pairing during meiosis
Classification: Public
Developing a SNP chip for GWAS/GS
● Sequence 16 core lines representing Australian and Brazilian breedingprograms
● SNPs called using parameters (yet to be finalized) to be used fordeveloping a SNP chip using Affy’s “Axiom” Technology
- ~345K SNP screening array for 480 AU clones and will includemapped SNPs from Illumina (positive controls)
- Running a screening array ensures that the SNPs that are used onthe smaller array will perform.
- Smaller chip will be in a 96 array format
- Maximum number of SNPs on the 96 array format would be 50K
● Association mapping populations from Australia and Brazil will begenotyped using the smaller array
Classification: Public
Sequencing Results
Sample Reads (in bp)Badila 171909294
Co475 178010130CP74-2005 154340644Nco310 217972160POJ2878 119639502
Q117 159933544Q208 141936024QN58-829 218476606QN66-2008 191146084
QN80-3425 176658030Trojan 71507454Q155 342257032SP70-1143 104978170
RB72454 262617118SP80-3280 68080482SP83-5073 159297874
2 samples/lane with an expected coverage of at least 50X of a given genomic region
Sequences were assembled using previous data from a previous project that involvedsequencing two Australian lines Q165 and IJ76
Classification: Public
Average coverage across all Samples
0 20 40 60 80 100 120 140 160 180 200
Badila
CP74-2005
Co475
Nco310
POJ2878
Q117
Q155
Q208
QN58-829
QN66-2008
QN80-3425
RB72454
SP70-1143
SP80-3280
SP83-5073
Trojan
average unique coverage
Classification: Public
Variation in number of sites with uniquely aligning readcoverage across lines
86.0E+6 87.0E+6 88.0E+6 89.0E+6 90.0E+6 91.0E+6 92.0E+6 93.0E+6 94.0E+6 95.0E+6 96.0E+6
Badila
CP74-2005
Co475
Nco310
POJ2878
Q117
Q155
Q208
QN58-829
QN66-2008
QN80-3425
RB72454
SP70-1143
SP80-3280
SP83-5073
Trojan
sites with unique coverage (reference size = 104 MB)
Classification: Public
Distribution of sugarcane contigs in sorghum genome
Classification: Public
SNP calling Criteria
For SNP calling, following parameters were selected:
1. Addressing dosage:
Class 1: Low dose (single/double) in at least 4 lines, 0 dose in at least 4 lines and high dose in at least1 line
Class 2: Low dose (single/double) in at least 4 lines, 0 dose in at least 4 lines and rest can be mediumdose (3-4 copies) but cannot be high dose
Class 3: Low dose (single/double) in at least 2 lines, 0 dose in at least 2 lines and rest cannot be eitherHD or MD
2. Compare the total number of low dose SNPs selected based on lines derived fromAustralia or Brazil. Data from this analysis will be used to remove any bias in the SNPselection process
3. If using the above filters does not result in sufficient number of SNPs, reduce coveragefrom 50X to a lower number with the limit being 20X
4. Once SNP calling is done, align the number of SNPs that fit the above criteria to a givenline (i.e. total number of SNPs selected from line 1, line 2 and so on)
5. Ensure preselected regions (DArT mini array (~400), and successfully mapped InfiniumSNPs (~2400) are enriched (2-3 SNPs per marker sequence) is selected
6. Map the selected SNPs to the Sorghum reference
Classification: Public
Results from applying Dosage selection criteria
● Using 50x minimum coverage:
% count class code
3 50843 100 (class 1)
11 206346 010 (class 2)
37 682885 011 (class 2 and class 3)
49 892155 001 (class 3)
Total = 1832229
● Using 20x minimum coverage:
% count class code
5 131831 100 (class 1)
14 384630 010 (class 2)
38 1015206 011 (class 2 and class 3)
42 1121023 001 (class 3)
Total = 2652690
Classification: Public
Pairwise-mappable (LD/0D) variant counts
Badila
CP74-2005
Co475
Nco310
POJ2878
Q117
Q155
Q208
QN58-829
QN66-2008
QN80-3425
RB72454
SP70-1143
SP80-3280
SP83-5073
Trojan
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
Badila
CP74-2005
Co475
Nco310
POJ2878
Q117
Q155
Q208
QN58-829
QN66-2008
QN80-3425
RB72454
SP70-1
143
SP80-3
280
SP83-5
073
Troja
n
90000-100000
80000-90000
70000-80000
60000-70000
50000-60000
40000-50000
30000-40000
20000-30000
10000-20000
0-10000
Classification: Public
Distribution of sugarcane contigs with mappable LD/0Dvariants
Classification: Public
Affymetrix Customized Workflow for data analysis
Generate genotypes following best practiceworkflow. Execute SNPolisher using Polyploidsetting and apply Supplemental Variancefilters (Z-score > 3) and HetvMAF = 1.9
Execute Ps_CallAdjust with threshold set to0.1. Perform reproducibility and MI accuracyanalysis and remove probesets with > 1 errorin either category
Conversion Type Count Percentage Conversion Type Count Percentage
PolyHighResolution 11695 2.76 MonoHighResolution 169446 39.96
AAvarianceX 64 0.02 NoMinorHomozygote 21366 5.04
AAvarianceY 162 0.04 OffTargetVariant 251 0.06
ABvarianceX 152 0.04 CallRateBelowThreshold 33652 7.94
ABvarianceY 134 0.03 Other 186263 43.92
BBvarianceX 109 0.03 Unexpected Het 480 0.11
BBvarianceY 112 0.03 Hom-Hom Resolution 162 0.04
Total number of probesets 424048 100
Number of probesets identified by Variance X.Z > 3 used for fitTetra analysis: 541
After applying advanced filters
Conversion Type Count PercentageDelta from
initial
PolyHighResolution 10474 2.47 1221
NoMinorHom 12106 2.85 9260
CallRateBelowThreshold 25222 5.95 8430
Combined (PHR+NMH+CRB) 48802 11.47 ---
Total number of probesets 424048 100 ---
Identify any probesets with high Variance.X.Zscores (> 3) and PHR probesets with >1 MIerror for fitTetra analysis Number of PHR probesets identified with >1 MI error for fitTetra analysis: 902
Execute Ps_Metrics using adjusted call tableand remove CallRateBelowThreshold SNPswith <2 clusters, <2 observations in the leastpopulated cluster and CallRate <80% andNoMinorHom SNPs with <10 observations ofthe minor allele
Classification: Public
Classification of SNP calls using fit-Tetra algorithms
Classification: Public
Sugarcane homology group Sorghum chromosome Number of SNP markers
HG1 Sb4 2092
HG2 Sb6 and Sb5 2842
HG3 Sb3 2696
HG4 Sb1 3216
HG5 Sb7 1280
HG6 Sb9 1543
HG7 Sb10 1640
HG8 Sb8 and Sb2 3401
Scaffolds 146
Total 18856
BLAST results of ~49K SNP markers aligned to the sorghumgenome (>e-51)
Classification: Public
Australian Core program – Association mapping Panel
● Population 1 (“association mapping population”)
- 480 clones from core program
- Cane yield, sugar content, measured at 3 sites x 2 years
- Disease (smut) resistance, other diseases on subset
● Subset of lines from the bi-parental mapping population
Classification: Public
Number of SNP and DArT markers identified associated withTCH and CCS at different p values using mixed model analysis
Significant
level
Number of SNPs
expected by
Random chance
Number of DArTs
expected by
Random chance
TCH CCS
DArT SNP DArT SNP
0.05 2295.75 768 1228 15177 1380 10033
0.01 459.15 154 352 5373 377 2775
0.001 45.9 15 64 1212 55 495
0.0001 4.59 2 8 284 8 93
Classification: Public
Number of SNPs identified associated disease traits at differentp values using mixed model analysis
Significant level Smut Pachymetra Leaf scald Fiji leaf gall
0.053588 3842 2728 3278
0.011050 1031 799 844
0.001350 168 148 137
0.0001216 30 39 26
Classification: Public
Acknowledgements
● Members of the Analysis, Bioinformatics and Genetics group - Syngenta
● Andrew Farmer – NCGR, Paul Berkman - CSIRO
● Sugar cane team – USA/Brazil
- Dirk Benson - Michel Moraes
- Ian Jepson - Jair Durate
- Yan Zhang - Stacy Miles
● CSIRO/SRA - Australia
- Karen Aitken - Phil Jackson
- Xianming Wei - Emily Deomano
● Funding support from Syngenta and SRA (SRDC)
Classification: Public
Innovation
Thank you!