2
Introduction A major issue for the successful design of disease association studies using SNPs as surrogate markers is the selection of a cost effective subset of the available variants that would provide high statistical power to detect the association. A number of strategies have been proposed for the selection of SNPs based on empirical patterns of LD. Integrating all these criteria and methods can be challenging: the algorithms required are complex to deploy and all the necessary annotations are deposited in heterogeneous databases. To simplify this process, we have developed SNPbrowser™ Software, a tool to assist in the knowledge-based selection of markers for association studies. Material and Methods Genotype data We genotyped DNAs from 45 African-Americans, 46 Caucasians, 45 Chinese, and 45 Japanese, all unrelated individuals. Over 250,000 TaqMan ® SNP Genotyping Assays were used to genotype these samples, resulting in up to 160,000 SNPs with reliable genotype calls and with a minimum call rate of 90%, in at least one of these populations (De La Vega et al., 2002). Only SNPs having MAF values > 10% and that have passed Hardy Weinberg Equilibrium test at a p-value > 0.05 were considered for analysis. Analysis of linkage disequilibrium We constructed metric maps scaled to the strength of LD that can guide the selection of SNPs for association studies. Linkage disequilibrium units (LDUs) define a metric coordinate system where locations are additive and distances are proportional to the allelic association between markers. The LDMAP software v0.9 developed by Andrew Collins was applied separately to each chromosome and population (http://cedar.genetics.soton.ac.uk/public_html/helpld.html). For more information on the theory and procedure for constructing LD maps, see Maniatis et al., 2002. Haplotype blocks were estimated as intervals of user-specified LDUs, or by the algorithm of Gabriel et al. (2002), optimized through a dynamic programming algorithm (Schwartz et al, 2003). Selection of Informative SNPs for Association Studies We utilized three algorithms to select minimum informative subsets of SNPs, also called “tagging SNPs”: (i) Genotype Correlation; (ii) Pair-wise r 2 (Carlson et al., 2004); and (iii) Haplotype R 2 (Weale et al., 2003). To implement these computationally demanding methods, optimally minimum sets of tag SNPs selected on a chromosome-wide basis were computed at three thresholds of Pairwise r 2 or Haplotype R 2 through the use of a dynamic programming algorithm as described by Halldórsson et al. (2004). Power Calculations for Case/Control Studies We calculated power for a fixed sample size of cases and controls on a per gene basis. For each gene, power is calculated using a haplotype based test, for each of the common haplotypes in the window, and entering in the calculation the empirically observed average LD on the gene region. Using a multiplicative genetic model with relative risk ratio of 3 and prevalence of 1.5%, power is calculated for each haplotype and a frequency weighted average is provided as the summary. This is repeated separately for each population and visualized using a color scale as a background to each gene region (see De La Vega, Gordon, et al., submitted). Results SNPbrowser™ Software Visualization and Query Tools The visualization panel consists of a chromosome map viewer representing the location in the physical map of up to 160,000 Validated, and optionally up to 2 million Pre-designed, TaqMan ® SNPs Genotyping Assays, and their relationship to human genes and their exons.The display differentiates between coding SNPs, which can be prioritized in some studies, and Validated and Pre-designed assays. SNPbrowser™ Software shows the location of the SNPs on the physical (kb) and linkage disequilibrium map, while horizontal lines indicate the spans of blocks of high LD determined by two algorithms. Applied Biosystems' metric LD map was experimentally generated from over 20 million genotypes determined in four major populations and provides information on how to best position SNPs across the genes or regions of interest in a study. A metric LD map, expressed in LD units (LDUs), places SNPs on a coordinate system where distances between SNPs are directly related to the degree of LD between them. For example, SNPs in perfect LD (completely correlated) have zero distance between them, whereas SNPs with no significant correlation are separated by over three LDUs in this map. Analogous to the genetic map expressed in centi-Morgans used for selecting markers for linkage studies in families, the LD map can be used to efficiently select markers for population-based disease association studies with high statistical power. Details of SNP allele frequency on the four populations, and other annotations, can be easily visualized. A number of filters are available to select subsets of SNPs of interest, including filtering by minor allele frequency in one or more the populations. SNPs and genes can be easily located by searching by a number of keyword terms, including SNP, gene, and transcript identifiers, as well as assembly coordinates. A powerful batch search feature provided in the software is very useful when large candidate gene lists are being considered for SNP selection — a click on the search result window immediately pans and zooms the viewer on the region of interest. Also, previously visited locations can be remembered through bookmarks for easy access in subsequent sessions. Figure 1. SNPbrowser™ Software main panel, navigation and simple query tools Optimizing Marker Coverage To supplement the validated assays, we introduced two new assay collections, the TaqMan ® Coding SNP Genotyping Assays and the TaqMan ® Pre-Designed SNP Genotyping Assays. A SNP Density Selection function in the SNP Wizard allows supplementing validated assays with additional SNPs when their density is not sufficient for the coverage requirements or to select SNPs in a picket-fence pattern.The SNP density selection can be performed with the physical map (kb) or the LD map (LDUs). Additionally, the SNP Wizard allows prioritization of the selection of high confidence supplemental SNPs through user-defined prioritization of criteria, including: minor allele frequency as determined in the HapMap project or during the Applera Genomics Initiative resequencing efforts, and independent evidence of discovery of a SNPs from multiple data sources (so-called double-hit SNPs). Figure 2. SNP Wizard – Density Selection Selection of Haplotype Tagging SNPs SNPbrowser™ Software incorporates three algorithms to facilitate the selection of tagging SNPs and to eliminate redundant SNPs (see Methods). The software includes an easy-to-use “SNP Wizard” that displays algorithm results and allows easy refinement of the tagging SNP method and parameter selections. Figure 3. SNP Wizard – Tagging SNPs Finally, once the SNPbrowser™ Software wizards identify the ideal set of SNPs for an association study, simply clicking on the results bar adds the SNP assays to the researcher’s order list, and one additional click calls the Applied Biosystems myScience™ Research Environment for easy ordering of ready to use TaqMan ® SNP Genotyping Assays. The selected SNP list can also be imported on the assay design submission tool of the SNPlex Genotyping System for the ordering of multiplexed oligonucleotide ligation- based assays. Figure 4. TaqMan ® Assay Ordering Conclusions We developed SNPbrowser™ Software, a visualization suite to facilitate the selection of assays for genetic association studies utilizing prior empirical knowledge of the profile of linkage disequilibrium across genomic regions. It includes a library of almost two million human SNP assays, facilitating the knowledge-driven design of association and fine mapping studies through the selection of the most informative SNPs for meaningful and powerful study results. The new easy-to-use SNP Wizard simplifies the selection of SNP marker sets with either a “tagging” SNP or SNP density selection workflow. Finally, the software provides an easy interface for ordering ready-to-use TaqMan ® SNP Genotyping Assays or submitting a list of SNPs for the design of custom SNPlex™ Genotyping System multiplex OLA assays. Acknowledgements We are very grateful to Andrew Collins (U. Southampton), for providing the LDMAP software to estimate LDUs, and to Derek Gordon (Rockefeller U.), who provided custom software to calculate power. We acknowledge the valuable feedback and support provided by Katherine Rogers, Anthony Dodge, Deidre Reardon, Fenton Williams, Michael Rhodes, Toinette Hartshorne, Trevor Woodage, Kelly McDonald, Raymond Fellers and the SQA team. For Research Use Only. Not for use in diagnostic procedures. Applied Biosystems is a registered trademark. AB (Design), Applera, SNPbrowser, and SNPlex are trademarks of Applera Corporation or its subsidiaries in the US and/or certain other countries. TaqMan is a registered trademark of Roche Molecular Systems, Inc. © 2004 Applied Biosystems. All rights reserved. Selection on the physical map (20 kb spacing) Selection on the metric LD map (0.5 LDU spacing) Physical Distance Scale LDU Scale Gene Details SNP Locations and Information Exon Locations Haplotype Blocks Francisco M. De La Vega, Hadar Isaac, Charles R. Scafe, Bjarni V. Halldórsson, and Eugene Spier Applied Biosystems, 850 Lincoln Centre Dr., Foster City, CA 94404, USA Selection of SNP Markers for Genetic Association Studies of Complex Disease Based on Empirical Knowledge of the Genome-wide Patterns of Linkage Disequilibrium References Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet 74:106-120 De La Vega FM, Dailey D, Ziegle J, Williams J, Madden D, Gilbert DA (2002) New generation pharmacogenomic tools: a SNP linkage disequilibrium Map, validated SNP assay resource, and high-throughput instrumentation system for large-scale genetic studies. Biotechniques Suppl:48-50, 52, 54 De La Vega, FM, Gordon, D, Su, X, Scafe, C, Isaac, H., Gilbert, DA, Spier, EG. Power and sample size calculations for genetic case/control studies using gene-centric data: application to Human Chromosomes 6, 21, and 22 in three populations. Submitted, 2004. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in the human genome. Science 296:2225-2229 Halldorsson BV, Bafna V, Lippert R, Schwartz R, De La Vega FM, Clark AG, Istrail S (2004) Optimal haplotype block-free selection of tagging SNPs for genome-wide assoaciation studies. Genome Res 14:1633-1640 Maniatis N, Collins A, Xu CF, McCarthy LC, Hewett DR, Tapper W, Ennis S, Ke X, Morton NE (2002) The first linkage disequilibrium (LD) maps: delineation of hot and cold blocks by diplotype analysis. Proc Natl Acad Sci U S A 99:2228-2233 Schwartz R, Halldorsson BV, Bafna V, Clark AG, Istrail S (2003) Robustness of inference of haplotype block structure. J Comput Biol 10:13-19 Weale ME, Depondt C, Macdonald SJ, Smith A, Lai PS, Shorvon SD, Wood NW, Goldstein DB (2003) Selection and evaluation of tagging SNPs in the neuronal-sodium- channel gene SCN1A: implications for linkage-disequilibrium gene mapping. Am J Hum Genet 73:551-565

Selection of SNP Markers for Genetic Association … selection of markers for association studies. Material and Methods Genotype data We genotyped DNAs from 45 African-Americans, 46

Embed Size (px)

Citation preview

Page 1: Selection of SNP Markers for Genetic Association … selection of markers for association studies. Material and Methods Genotype data We genotyped DNAs from 45 African-Americans, 46

IntroductionA major issue for the successful design of disease association studies using SNPs as surrogate markersis the selection of a cost effective subset of the available variants that would provide high statistical powerto detect the association. A number of strategies have been proposed for the selection of SNPs based onempirical patterns of LD. Integrating all these criteria and methods can be challenging: the algorithmsrequired are complex to deploy and all the necessary annotations are deposited in heterogeneousdatabases. To simplify this process, we have developed SNPbrowser™ Software, a tool to assist in theknowledge-based selection of markers for association studies.

Material and MethodsGenotype data

We genotyped DNAs from 45 African-Americans, 46 Caucasians, 45 Chinese, and 45 Japanese, allunrelated individuals. Over 250,000 TaqMan® SNP Genotyping Assays were used to genotype thesesamples, resulting in up to 160,000 SNPs with reliable genotype calls and with a minimum call rate of 90%,in at least one of these populations (De La Vega et al., 2002). Only SNPs having MAF values > 10% andthat have passed Hardy Weinberg Equilibrium test at a p-value > 0.05 were considered for analysis.

Analysis of linkage disequilibrium

We constructed metric maps scaled to the strength of LD that can guide the selection of SNPs forassociation studies. Linkage disequilibrium units (LDUs) define a metric coordinate system wherelocations are additive and distances are proportional to the allelic association between markers. TheLDMAP software v0.9 developed by Andrew Collins was applied separately to each chromosome andpopulation (http://cedar.genetics.soton.ac.uk/public_html/helpld.html). For more information on the theoryand procedure for constructing LD maps, see Maniatis et al., 2002. Haplotype blocks were estimated asintervals of user-specified LDUs, or by the algorithm of Gabriel et al. (2002), optimized through a dynamicprogramming algorithm (Schwartz et al, 2003).

Selection of Informative SNPs for Association Studies

We utilized three algorithms to select minimum informative subsets of SNPs, also called “tagging SNPs”:(i) Genotype Correlation; (ii) Pair-wise r 2 (Carlson et al., 2004); and (iii) Haplotype R 2 (Weale et al., 2003).To implement these computationally demanding methods, optimally minimum sets of tag SNPs selectedon a chromosome-wide basis were computed at three thresholds of Pairwise r 2 or Haplotype R 2 throughthe use of a dynamic programming algorithm as described by Halldórsson et al. (2004).

Power Calculations for Case/Control Studies

We calculated power for a fixed sample size of cases and controls on a per gene basis. For each gene,power is calculated using a haplotype based test, for each of the common haplotypes in the window, andentering in the calculation the empirically observed average LD on the gene region. Using a multiplicativegenetic model with relative risk ratio of 3 and prevalence of 1.5%, power is calculated for each haplotypeand a frequency weighted average is provided as the summary. This is repeated separately for eachpopulation and visualized using a color scale as a background to each gene region (see De La Vega,Gordon, et al., submitted).

ResultsSNPbrowser™ Software Visualization and Query Tools

The visualization panel consists of a chromosome map viewer representing the location in the physicalmap of up to 160,000 Validated, and optionally up to 2 million Pre-designed, TaqMan® SNPs GenotypingAssays, and their relationship to human genes and their exons. The display differentiates between codingSNPs, which can be prioritized in some studies, and Validated and Pre-designed assays. SNPbrowser™Software shows the location of the SNPs on the physical (kb) and linkage disequilibrium map, whilehorizontal lines indicate the spans of blocks of high LD determined by two algorithms. Applied Biosystems'metric LD map was experimentally generated from over 20 million genotypes determined in four majorpopulations and provides information on how to best position SNPs across the genes or regions of interestin a study. A metric LD map, expressed in LD units (LDUs), places SNPs on a coordinate system wheredistances between SNPs are directly related to the degree of LD between them. For example, SNPs inperfect LD (completely correlated) have zero distance between them, whereas SNPs with no significantcorrelation are separated by over three LDUs in this map. Analogous to the genetic map expressed incenti-Morgans used for selecting markers for linkage studies in families, the LD map can be used toefficiently select markers for population-based disease association studies with high statistical power.Details of SNP allele frequency on the four populations, and other annotations, can be easily visualized.A number of filters are available to select subsets of SNPs of interest, including filtering by minor allelefrequency in one or more the populations. SNPs and genes can be easily located by searching by anumber of keyword terms, including SNP, gene, and transcript identifiers, as well as assembly coordinates.A powerful batch search feature provided in the software is very useful when large candidate gene listsare being considered for SNP selection — a click on the search result window immediately pans andzooms the viewer on the region of interest. Also, previously visited locations can be remembered throughbookmarks for easy access in subsequent sessions.

Figure 1. SNPbrowser™ Software main panel, navigation and simple query tools

Optimizing Marker Coverage

To supplement the validated assays, we introduced two new assay collections, the TaqMan® Coding SNPGenotyping Assays and the TaqMan® Pre-Designed SNP Genotyping Assays. A SNP Density Selectionfunction in the SNP Wizard allows supplementing validated assays with additional SNPs when theirdensity is not sufficient for the coverage requirements or to select SNPs in a picket-fence pattern.The SNPdensity selection can be performed with the physical map (kb) or the LD map (LDUs). Additionally, the SNPWizard allows prioritization of the selection of high confidence supplemental SNPs through user-definedprioritization of criteria, including: minor allele frequency as determined in the HapMap project or duringthe Applera Genomics Initiative resequencing efforts, and independent evidence of discovery of a SNPsfrom multiple data sources (so-called double-hit SNPs).

Figure 2. SNP Wizard – Density Selection

Selection of Haplotype Tagging SNPs

SNPbrowser™ Software incorporates three algorithms to facilitate the selection of tagging SNPs and toeliminate redundant SNPs (see Methods). The software includes an easy-to-use “SNP Wizard” thatdisplays algorithm results and allows easy refinement of the tagging SNP method and parameterselections.

Figure 3. SNP Wizard – Tagging SNPs

Finally, once the SNPbrowser™ Software wizards identify the ideal set of SNPs for an association study,simply clicking on the results bar adds the SNP assays to the researcher’s order list, and one additionalclick calls the Applied Biosystems myScience™ Research Environment for easy ordering of ready to useTaqMan® SNP Genotyping Assays. The selected SNP list can also be imported on the assay designsubmission tool of the SNPlex™ Genotyping System for the ordering of multiplexed oligonucleotide ligation-based assays.

Figure 4. TaqMan® Assay Ordering

ConclusionsWe developed SNPbrowser™ Software, a visualization suite to facilitate the selectionof assays for genetic association studies utilizing prior empirical knowledge of theprofile of linkage disequilibrium across genomic regions. It includes a library of almosttwo million human SNP assays, facilitating the knowledge-driven design of associationand fine mapping studies through the selection of the most informative SNPs formeaningful and powerful study results. The new easy-to-use SNP Wizard simplifies theselection of SNP marker sets with either a “tagging” SNP or SNP density selectionworkflow. Finally, the software provides an easy interface for ordering ready-to-useTaqMan® SNP Genotyping Assays or submitting a list of SNPs for the design of customSNPlex™ Genotyping System multiplex OLA assays.

Acknowledgements

We are very grateful to Andrew Collins (U. Southampton), for providing the LDMAP software to estimateLDUs, and to Derek Gordon (Rockefeller U.), who provided custom software to calculate power. Weacknowledge the valuable feedback and support provided by Katherine Rogers, Anthony Dodge, DeidreReardon, Fenton Williams, Michael Rhodes, Toinette Hartshorne, Trevor Woodage, Kelly McDonald,Raymond Fellers and the SQA team.

For Research Use Only. Not for use in diagnostic procedures.

Applied Biosystems is a registered trademark. AB (Design), Applera, SNPbrowser, and SNPlex are trademarks of Applera Corporation or itssubsidiaries in the US and/or certain other countries.

TaqMan is a registered trademark of Roche Molecular Systems, Inc.

© 2004 Applied Biosystems. All rights reserved.

Selection on the physical map (20 kb spacing)

Selection on the metric LD map (0.5 LDU spacing)

Physical Distance ScaleLDU Scale

Gene Details

SNP Locations and Information

Exon Locations

Haplotype Blocks

Francisco M. De La Vega, Hadar Isaac, Charles R. Scafe, Bjarni V. Halldórsson, and Eugene Spier

Applied Biosystems, 850 Lincoln Centre Dr., Foster City, CA 94404, USA

Selection of SNP Markers for Genetic Association Studies of Complex Disease Basedon Empirical Knowledge of the Genome-wide Patterns of Linkage Disequilibrium

References

Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting amaximally informative set of single-nucleotide polymorphisms for association analysesusing linkage disequilibrium. Am J Hum Genet 74:106-120

De La Vega FM, Dailey D, Ziegle J, Williams J, Madden D, Gilbert DA (2002) Newgeneration pharmacogenomic tools: a SNP linkage disequilibrium Map, validated SNPassay resource, and high-throughput instrumentation system for large-scale geneticstudies. Biotechniques Suppl:48-50, 52, 54

De La Vega, FM, Gordon, D, Su, X, Scafe, C, Isaac, H., Gilbert, DA, Spier, EG. Powerand sample size calculations for genetic case/control studies using gene-centric data:application to Human Chromosomes 6, 21, and 22 in three populations. Submitted, 2004.

Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J,DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R,

Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in thehuman genome. Science 296:2225-2229

Halldorsson BV, Bafna V, Lippert R, Schwartz R, De La Vega FM, Clark AG, Istrail S(2004) Optimal haplotype block-free selection of tagging SNPs for genome-wideassoaciation studies. Genome Res 14:1633-1640

Maniatis N, Collins A, Xu CF, McCarthy LC, Hewett DR, Tapper W, Ennis S, Ke X,Morton NE (2002) The first linkage disequilibrium (LD) maps: delineation of hot and coldblocks by diplotype analysis. Proc Natl Acad Sci U S A 99:2228-2233

Schwartz R, Halldorsson BV, Bafna V, Clark AG, Istrail S (2003) Robustness of inferenceof haplotype block structure. J Comput Biol 10:13-19

Weale ME, Depondt C, Macdonald SJ, Smith A, Lai PS, Shorvon SD, Wood NW,Goldstein DB (2003) Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. Am J HumGenet 73:551-565

Page 2: Selection of SNP Markers for Genetic Association … selection of markers for association studies. Material and Methods Genotype data We genotyped DNAs from 45 African-Americans, 46

ResultsSNPbrowser™ Software Visualization and Query Tools

The visualization panel consists of a chromosome map viewer representing the location in the physicalmap of up to 160,000 Validated, and optionally up to 2 million Pre-designed, TaqMan® SNPs GenotypingAssays, and their relationship to human genes and their exons. The display differentiates between codingSNPs, which can be prioritized in some studies, and Validated and Pre-designed assays. SNPbrowser™Software shows the location of the SNPs on the physical (kb) and linkage disequilibrium map, whilehorizontal lines indicate the spans of blocks of high LD determined by two algorithms. AppliedBiosystems’ metric LD map was experimentally generated from over 20 million genotypes determined infour major populations and provides information on how to best position SNPs across the genes orregions of interest in a study. A metric LD map, expressed in LD units (LDUs), places SNPs on acoordinate system where distances between SNPs are directly related to the degree of LD between them.For example, SNPs in perfect LD (completely correlated) have zero distance between them, whereasSNPs with no significant correlation are separated by over three LDUs in this map. Analogous to thegenetic map expressed in centi-Morgans used for selecting markers for linkage studies in families, the LDmap can be used to efficiently select markers for population-based disease association studies with highstatistical power. Details of SNP allele frequency on the four populations, and other annotations, can beeasily visualized. A number of filters are available to select subsets of SNPs of interest, including filteringby minor allele frequency in one or more the populations. SNPs and genes can be easily located bysearching by a number of keyword terms, including SNP, gene, and transcript identifiers, as well asassembly coordinates. A powerful batch search feature provided in the software is very useful when largecandidate gene lists are being considered for SNP selection — a click on the search result windowimmediately pans and zooms the viewer on the region of interest. Also, previously visited locations canbe remembered through bookmarks for easy access in subsequent sessions.

Figure 1. SNPbrowser™ Software main panel, navigation and simplequery tools

Optimizing Marker Coverage

To supplement the validated assays, we introduced two new assay collections, the TaqMan® Coding SNPGenotyping Assays and the TaqMan® Pre-Designed SNP Genotyping Assays. A SNP Density Selectionfunction in the SNP Wizard allows supplementing validated assays with additional SNPs when theirdensity is not sufficient for the coverage requirements or to select SNPs in a picket-fence pattern. TheSNP density selection can be performed with the physical map (kb) or the LD map (LDUs). Additionally,the SNP Wizard allows prioritization of the selection of high confidence supplemental SNPs through user-defined prioritization of criteria, including: minor allele frequency as determined in the HapMap project orduring the Applera Genomics Initiative resequencing efforts, and independent evidence of discovery of aSNPs from multiple data sources (so-called double-hit SNPs).

Figure 2. SNP Wizard – Density Selection

Selection of Haplotype Tagging SNPs

SNPbrowser™ Software incorporates three algorithms to facilitate the selection of tagging SNPs and toeliminate redundant SNPs (see Methods). The software includes an easy-to-use “SNP Wizard” thatdisplays algorithm results and allows easy refinement of the tagging SNP method and parameterselections.

Figure 3. SNP Wizard – Tagging SNPs

Finally, once the SNPbrowser™ Software wizards identify the ideal set of SNPs for an association study,simply clicking on the results bar adds the SNP assays to the researcher’s order list, and one additionalclick calls the Applied Biosystems myScience™ Research Environment for easy ordering of ready to useTaqMan® SNP Genotyping Assays. The selected SNP list can also be imported on the assay designsubmission tool of the SNPlex™ Genotyping System for the ordering of multiplexed oligonucleotideligation-based assays.

Figure 4. TaqMan® Assay Ordering

ConclusionsWe developed SNPbrowser™ Software, a visualization suite to facilitate the selection of assaysfor genetic association studies utilizing prior empirical knowledge of the profile of linkage disequi-librium across genomic regions. It includes a library of almost two million human SNP assays,facilitating the knowledge-driven design of association and fine mapping studies through theselection of the most informative SNPs for meaningful and powerful study results. The new easy-to-use SNP Wizard simplifies the selection of SNP marker sets with either a “tagging” SNP orSNP density selection workflow. Finally, the software provides an easy interface for orderingready-to-use TaqMan® SNP Genotyping Assays or submitting a list of SNPs for the design ofcustom SNPlex™ Genotyping System multiplex OLA assays.

Acknowledgements

We are very grateful to Andrew Collins (U. Southampton), for providing the LDMAP software to estimateLDUs, and to Derek Gordon (Rockefeller U.), who provided custom software to calculate power. Weacknowledge the valuable feedback and support provided by Katherine Rogers, Anthony Dodge, DeidreReardon, Fenton Williams, Michael Rhodes, Toinette Hartshorne, Trevor Woodage, Kelly McDonald,Raymond Fellers and the SQA team.

For Research Use Only. Not for use in diagnostic procedures.

Applied Biosystems is a registered trademark. AB (Design), Applera, SNPbrowser, and SNPlex are trademarks ofApplera Corporation or its subsidiaries in the US and/or certain other countries.

TaqMan is a registered trademark of Roche Molecular Systems, Inc.

© 2004 Applied Biosystems. All rights reserved.

Selection on the physical map (20 kb spacing)

Selection on the metric LD map (0.5 LDU spacing)

Physical Distance ScaleLDU Scale

Gene Details

SNP Locations and Information

Exon Locations

Haplotype Blocks

Francisco M. De La Vega, Hadar Isaac, Charles R. Scafe, Bjarni V. Halldórsson, and Eugene Spier

Applied Biosystems, 850 Lincoln Centre Dr., Foster City, CA 94404, USA

Selection of SNP Markers for Genetic Association Studiesof Complex Disease Based on Empirical Knowledge of theGenome-wide Patterns of Linkage Disequilibrium

IntroductionA major issue for the successful design of disease associationstudies using SNPs as surrogate markers is the selection of acost effective subset of the available variants that would providehigh statistical power to detect the association. A number ofstrategies have been proposed for the selection of SNPs basedon empirical patterns of LD. Integrating all these criteria andmethods can be challenging: the algorithms required are complexto deploy and all the necessary annotations are deposited inheterogeneous databases. To simplify this process, we havedeveloped SNPbrowser™ Software, a tool to assist in theknowledge-based selection of markers for association studies.

Material and MethodsGenotype data

We genotyped DNAs from 45 African-Americans, 46 Caucasians,45 Chinese, and 45 Japanese, all unrelated individuals. Over250,000 TaqMan® SNP Genotyping Assays were used togenotype these samples, resulting in up to 160,000 SNPs withreliable genotype calls and with a minimum call rate of 90%, in atleast one of these populations (De La Vega et al., 2002). OnlySNPs having MAF values > 10% and that have passed HardyWeinberg Equilibrium test at a p-value > 0.05 were considered foranalysis.

Analysis of linkage disequilibrium

We constructed metric maps scaled to the strength of LD that canguide the selection of SNPs for association studies. Linkagedisequilibrium units (LDUs) define a metric coordinate systemwhere locations are additive and distances are proportional to theallelic association between markers. The LDMAP software v0.9developed by Andrew Collins was applied separately to eachchromosome and population (http://cedar.genetics.soton.ac.uk/public_html/helpld.html). For more information on the theory andprocedure for constructing LD maps, see Maniatis et al., 2002.Haplotype blocks were estimated as intervals of user-specifiedLDUs, or by the algorithm of Gabriel et al. (2002), optimizedthrough a dynamic programming algorithm (Schwartz et al,2003).

Selection of Informative SNPs forAssociation Studies

We utilized three algorithms to select minimum informativesubsets of SNPs, also called “tagging SNPs”: (i) GenotypeCorrelation; (ii) Pair-wise r 2 (Carlson et al., 2004); and (iii)Haplotype R 2 (Weale et al., 2003). To implement these computa-tionally demanding methods, optimally minimum sets of tag SNPsselected on a chromosome-wide basis were computed at threethresholds of Pairwise r 2 or Haplotype R 2 through the use of adynamic programming algorithm as described by Halldórsson etal. (2004).

Power Calculations for Case/ControlStudies

We calculated power for a fixed sample size of cases and controlson a per gene basis. For each gene, power is calculated using ahaplotype based test, for each of the common haplotypes in thewindow, and entering in the calculation the empirically observedaverage LD on the gene region. Using a multiplicative geneticmodel with relative risk ratio of 3 and prevalence of 1.5%, poweris calculated for each haplotype and a frequency weightedaverage is provided as the summary. This is repeated separatelyfor each population and visualized using a color scale as abackground to each gene region (see De La Vega, Gordon, et al.,submitted).

References

Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA (2004) Selecting amaximally informative set of single-nucleotide polymorphisms for association analysesusing linkage disequilibrium. Am J Hum Genet 74:106-120

De La Vega FM, Dailey D, Ziegle J, Williams J, Madden D, Gilbert DA (2002) Newgeneration pharmacogenomic tools: a SNP linkage disequilibrium Map, validated SNPassay resource, and high-throughput instrumentation system for large-scale geneticstudies. Biotechniques Suppl:48-50, 52, 54

De La Vega, FM, Gordon, D, Su, X, Scafe, C, Isaac, H., Gilbert, DA, Spier, EG. Power andsample size calculations for genetic case/control studies using gene-centric data:application to Human Chromosomes 6, 21, and 22 in three populations. Submitted, 2004.

Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J,DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R,Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in thehuman genome. Science 296:2225-2229

Halldorsson BV, Bafna V, Lippert R, Schwartz R, De La Vega FM, Clark AG, Istrail S(2004) Optimal haplotype block-free selection of tagging SNPs for genome-wideassoaciation studies. Genome Res 14:1633-1640

Maniatis N, Collins A, Xu CF, McCarthy LC, Hewett DR, Tapper W, Ennis S, Ke X, MortonNE (2002) The first linkage disequilibrium (LD) maps: delineation of hot and cold blocksby diplotype analysis. Proc Natl Acad Sci U S A 99:2228-2233

Schwartz R, Halldorsson BV, Bafna V, Clark AG, Istrail S (2003) Robustness of inferenceof haplotype block structure. J Comput Biol 10:13-19

Weale ME, Depondt C, Macdonald SJ, Smith A, Lai PS, Shorvon SD, Wood NW,Goldstein DB (2003) Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. Am J HumGenet 73:551-565