1
Angus Nelore Jersey Breed SNPs are chosen based on location – they are evenly spaced throughout the genome. Results: Redundant coverage on large groups of SNPs in LD (blue, light green); insufficient coverage on smaller groups of SNPs in LD (red, light blue, dark red). Array Plate indicated with blue. Average sample call rate Mendelian consistency Reproducibility 99.62% 99.96% 99.94% Development of a high-throughput high-density SNP genotyping array for bovine Ali Pirani, Julia Montgomery, Brant Wong, Yontao Lu, Yiping Zhan, Fiona Brew, Christopher Davies, Anne Ferguson, and Teresa A. Webster Affymetrix, Inc. Santa Clara, CA 95051 USA kground: Simultaneous genotyping of many single nucleotide ymorphisms (SNPs) has been made possible by the development of ay-based hybridization platforms. High-density genotyping owered the success of genome-wide association studies for ermination of genetic variation affecting complex traits. Medium- sity studies in agricultural organisms have been beneficial in marker- isted breeding, mapping quantitative trait loci, and other lications. Now there is interest in expanding to higher-density ions to further refine and develop these techniques. The bovine earch community has undertaken a massive genome sequencing and screening effort to develop a comprehensive solution for a versatile, h-throughput high-density bovine SNP genotyping array. erimental design: The Affymetrix Bovine Consortium, consisting of demic researchers and breeding groups, has combined efforts to uence the genomes of 15 bovine breeds from Bos indicus and Bos rus on three next-generation sequencing platforms to produce an ensive collection of SNPs and their genotypes. Screening arrays were igned using a method to optimize physical coverage of a subset of se SNPS across the sequenced genomes. A diverse set of 384 ples, representing taurine, indicine, tropical taurine, and Asian eds, was obtained from the HapMap collection and other laborators and screened on the screening arrays on the high- oughput Affymetrix Axiom™ Genotyping Solution. 3 million high- formance SNPs were validated. ults: A multi-breed relevant subset of over 648,000 SNPs has been ected for the Axiom™ Genome-Wide BOS 1 Array Plate. These SNPs e selected using a genetic selection method to obtain greater than percent genetic coverage of many beef and dairy cattle breeds. clusion: Affymetrix has developed a comprehensive collection of s, which genetically cover the LD blocks of multiple breeds. This SNP lection can be utilized for customized breed-specific array elopment. Future breed-specific screens will add to this pool of SNPs h validated genotyping performance. These high-density genotyping ays will support marker assisted breeding and other applications in tle. Breed segregation Bos taurus Mix breeds he SNP screens n the Affymetrix Axiom™ enotyping Solution, SNP screening as carried out for SNPs identified y the sequencing effort. For SNPs o validate, they must be robust, easured by performance metrics, uch as genotype call rate, cluster eparation, reproducibility, etc., and hey must exhibit at least 2 xamples of the minor allele. Japanese Black Jersey Limousin Nelore Norwegian red 1.1M 1.2M 1.4M 2.3M 1M 0.24 0.23 0.21 0.22 0.39 Genetic coverage 2500000 2000000 1500000 Bos indicus Figure 1: Principal component analysis shows the breeds segregate based on subpopulations. Cladogram from Decker, J. E., et al. PNAS 106:18644-18649 (2009). Genetic SNP selection LD calculations were carried out based on genotypes obtained from our SNP screens for 10 breeds. To choose SNPs for the Axiom genotyping panel based on the pool of SNPs with validated genotyping performance, genetic coverage for both common and rare SNPs are considered, with the objective to genetically cover the 3 million validated SNPs. Additional factors, including a genotyping performance metric, the number of feature each SNP requires on the chip, etc., are also considered. The SNP selection/ranking was achieved in a greedy approach derived from the work of Carlson, et al1. At each step, a single SNP that is considered to be th most valuable given the genetic coverage already achieved in breeds of interest is selected. The SNP selection goes on until maximal genetic coverage for selected breeds is achieved or enough SNPs are selected. A separate set of SNPs considered to be of important biological value is al included in the Axiom array design. Table 3: Test set performance. Validated SNPs are polymorphic and pass performance criteria for all the breeds and samples tested. Additional SNPs may be validated on a per-breed basis. These SNPs were used for sample performance and coverage calculations. 1. Carlson C. S., Eberle M. A., Rieder M. J., Yi Q., Kruglyak L., Nickerson D. A. Selecting a maximally informative set single-nucleotide polymorphisms for association analyses using linkage disequilibrium. American Journal of Human Genetics 74:106–20 (2004). Lim- ousin 88.4 Romag- nola 92.5 Brah- man 78.8 Gir 80.8 Hol- Fleck- Here- stein vieh ford 98.1 98.8 87.4 96.9 91.2 95.4 Table 2: Genetic coverage for each breed (%). 1000000 500000 0 Genetic selection Figure 2: Number of polymorphic SNPs are chosen because they accurately represent groups SNPs. SNPs genetically covered on of SNPs in LD. Result: The fewest number of SNPs is used to the Axiom™ Genome-Wide BOS 1 cover the known genetic variation of a population. Axiom™ Genome -Wide BOS 1 Array Plate performanc e Test set Number of SNPs Number of validated SNPs Sample pass rate Genetic selection Each validated SNP is indicated by a bar; the horizontal line represents the chromosome. SNPs in LD are the same color. Physical selection Ran screen with array plate format 20 breeds; ~400 samples Process Sequencing effort by the Affymetrix Bovine Consortium Filtered to SNPs that physically cover the genome and were identified in multiple breeds Axiom Bovine Genomic Database ~ 3M & growing validated SNPs Axiom™ Genome-Wide BOS 1 Array Plate Breed screen information Poly. Avg. SNPs MAF Afrikander 1.4M 0.27 Angus 1.4M 0.21 Ayrshire 0.96M 0.29 Blonde d'Aquitaine 0.99M 0.27 Boran 2.2M 0.26 Brahman 2.4M 0.23 Brown Swiss 0.99M 0.28 Simmental 1.4M 0.21 Gir 2.1M 0.24 Hanwoo 1.3M 0.25 Hereford 1.2M 0.22 Holstein 1.6M 0.21 Genetic SNP selection Rouge des Prés 0.7M 0.33 Romagnola 1.6M 0.20 Tuli 1.4M 0.29 Table 1: Polymorphic SNPs per breed in the 3M validated list. Average minor allele frequency for unrelated samples.

Development of a high-throughput high-density SNP genotyping array for bovine

Embed Size (px)

Citation preview

Page 1: Development of a high-throughput high-density SNP genotyping array for bovine

Angus Nelore Jersey

Breed

SNPs are chosen based on location – they are evenly spacedthroughout the genome. Results: Redundant coverage on large

groups of SNPs in LD (blue, light green); insufficient coverageon smaller groups of SNPs in LD (red, light blue, dark red).

Array Plate indicated with blue.

Average sample call rate

Mendelian consistency

Reproducibility

99.62%

99.96%

99.94%

Development of a high-throughput high-densitySNP genotyping array for bovineAli Pirani, Julia Montgomery, Brant Wong, Yontao Lu, Yiping Zhan, FionaBrew, Christopher Davies, Anne Ferguson, and Teresa A. Webster

Affymetrix, Inc. Santa Clara, CA 95051 USA

Background: Simultaneous genotyping of many single nucleotidepolymorphisms (SNPs) has been made possible by the development ofarray-based hybridization platforms. High-density genotypingempowered the success of genome-wide association studies fordetermination of genetic variation affecting complex traits. Medium-density studies in agricultural organisms have been beneficial in marker-assisted breeding, mapping quantitative trait loci, and otherapplications. Now there is interest in expanding to higher-densityoptions to further refine and develop these techniques. The bovineresearch community has undertaken a massive genome sequencing andSNP screening effort to develop a comprehensive solution for a versatile,high-throughput high-density bovine SNP genotyping array.

Experimental design: The Affymetrix Bovine Consortium, consisting ofacademic researchers and breeding groups, has combined efforts tosequence the genomes of 15 bovine breeds from Bos indicus and Bostaurus on three next-generation sequencing platforms to produce anextensive collection of SNPs and their genotypes. Screening arrays weredesigned using a method to optimize physical coverage of a subset ofthese SNPS across the sequenced genomes. A diverse set of 384samples, representing taurine, indicine, tropical taurine, and Asianbreeds, was obtained from the HapMap collection and othercollaborators and screened on the screening arrays on the high-throughput Affymetrix Axiom™ Genotyping Solution. 3 million high-performance SNPs were validated.

Results: A multi-breed relevant subset of over 648,000 SNPs has beenselected for the Axiom™ Genome-Wide BOS 1 Array Plate. These SNPswere selected using a genetic selection method to obtain greater than90 percent genetic coverage of many beef and dairy cattle breeds.

Conclusion: Affymetrix has developed a comprehensive collection ofSNPs, which genetically cover the LD blocks of multiple breeds. This SNPcollection can be utilized for customized breed-specific arraydevelopment. Future breed-specific screens will add to this pool of SNPswith validated genotyping performance. These high-density genotypingassays will support marker assisted breeding and other applications incattle.

Breed segregation

Bos taurus

Mix breeds

The SNP screensOn the Affymetrix Axiom™Genotyping Solution, SNP screeningwas carried out for SNPs identifiedby the sequencing effort. For SNPsto validate, they must be robust,measured by performance metrics,such as genotype call rate, clusterseparation, reproducibility, etc., andthey must exhibit at least 2examples of the minor allele.

Japanese BlackJerseyLimousinNeloreNorwegian red

1.1M1.2M1.4M2.3M1M

0.240.230.210.220.39

Genetic coverage2500000

2000000

1500000

Bos indicus

Figure 1: Principal component analysis shows the breeds segregate based onsubpopulations. Cladogram from Decker, J. E., et al. PNAS 106:18644-18649(2009).

Genetic SNP selectionLD calculations were carried out based on genotypes obtained from ourSNP screens for 10 breeds. To choose SNPs for the Axiom genotyping panelbased on the pool of SNPs with validated genotyping performance, geneticcoverage for both common and rare SNPs are considered, with theobjective to genetically cover the 3 million validated SNPs. Additionalfactors, including a genotyping performance metric, the number of featureseach SNP requires on the chip, etc., are also considered. The SNPselection/ranking was achieved in a greedy approach derived from the workof Carlson, et al1. At each step, a single SNP that is considered to be themost valuable given the genetic coverage already achieved in breeds ofinterest is selected. The SNP selection goes on until maximal geneticcoverage for selected breeds is achieved or enough SNPs are selected. Aseparate set of SNPs considered to be of important biological value is alsoincluded in the Axiom array design.

Table 3: Test set performance. Validated SNPs are polymorphic and passperformance criteria for all the breeds and samples tested. Additional SNPs may bevalidated on a per-breed basis. These SNPs were used for sample performance andcoverage calculations.

1. Carlson C. S., Eberle M. A., Rieder M. J., Yi Q., Kruglyak L., Nickerson D. A. Selecting a maximally informative set ofsingle-nucleotide polymorphisms for association analyses using linkage disequilibrium. American Journal of HumanGenetics 74:106–20 (2004).

Lim-ousin

88.4

Romag-nola

92.5

Brah-man

78.8

Gir

80.8

Hol- Fleck- Here-stein vieh ford

98.1 98.8 87.4 96.9 91.2 95.4

Table 2: Genetic coverage for each breed (%).

1000000

500000

0

Genetic selectionFigure 2: Number of polymorphic

SNPs are chosen because they accurately represent groups SNPs. SNPs genetically covered onof SNPs in LD. Result: The fewest number of SNPs is used to the Axiom™ Genome-Wide BOS 1cover the known genetic variation of a population.

Axiom™ Genome -Wide BOS 1 Array Plate performanc eTest set

Number of SNPs 648,855

Number of validated SNPs 618,345

Sample pass rate 99.94%

Genetic selection

Each validated SNP is indicated by a bar; the horizontal linerepresents the chromosome. SNPs in LD are the same color.

Physical selection

Ran screen with array plate format20 breeds; ~400 samples

ProcessSequencing effort by the

Affymetrix Bovine Consortium

Filtered to SNPs that physically cover the genomeand were identified in multiple breeds

Axiom BovineGenomicDatabase

~ 3M & growingvalidated SNPs

Axiom™ Genome-Wide BOS 1 Array Plate

Breed screen informationPoly. Avg.SNPs MAF

Afrikander 1.4M 0.27Angus 1.4M 0.21Ayrshire 0.96M 0.29Blonde d'Aquitaine 0.99M 0.27Boran 2.2M 0.26Brahman 2.4M 0.23Brown Swiss 0.99M 0.28Simmental 1.4M 0.21Gir 2.1M 0.24Hanwoo 1.3M 0.25Hereford 1.2M 0.22Holstein 1.6M 0.21

GeneticSNP

selection

Rouge des Prés 0.7M 0.33Romagnola 1.6M 0.20Tuli 1.4M 0.29

Table 1: Polymorphic SNPs per breedin the 3M validated list. Average minorallele frequency for unrelated samples.