5
Genetic Identification by Mass Spectrometric Analysis of Single-Nucleotide Polymorphisms: Ternary Encoding of Genotypes Timothy J. Griffin ² and Lloyd M. Smith* Department of Chemistry, University of WisconsinsMadison, 1101 University Avenue, Madison, Wisconsin 53706-1396 An approach to genetic identification using biallelic single- nucleotide polymorphism (SNP) genetic markers is de- scribed in which the three possible genotypes, AA, Aa, or aa, where “A” and “a” represent the two SNP alleles, are assigned a ternary (base 3) digit of 0, 1, or 2, respectively. Genotyping an individual over a panel of separate SNP markers produces a composite ternary genetic code that can be converted to an easily stored, decimal (base 10) genetic identification number. The unambiguous identification of 11 individuals is demon- strated using ternary genetic codes generated from MALDI- TOF mass spectrometric genotyping data from 7 different SNP markers. Single-nucleotide polymorphisms (SNPs) 1-5 have the potential to revolutionize human genetic studies, with utility as genetic markers in linkage and association studies of complex genetic traits, 1,4,5 pharmacogenomic research, 6-8 and evolutionary genetic studies. 9 SNPs offer several attractive features as genetic markers, including the following: (1) occurrence at a high frequency throughout the human genome, with estimates of one or more SNPs per 1000 nucleotides when comparing any two human chromosomes; 1-3,10 (2) highly stable inheritance, unlike repeat polymorphisms which are susceptible to germ line mutations, which are detrimental to inheritance analysis; 11 and (3) elimination of the need for comparative allele sizing of repeat polymorphisms, which, despite being amenable to MALDI-TOF MS analysis, 12-15 are still most commonly analyzed by labor-intensive gel electro- phoresis-based techniques. 16-19 Large-scale publicly 20 and pri- vately 21 funded efforts to produce maps of SNPs are currently underway, with an ultimate goal of producing a dense genetic map consisting of at least 1 informative SNP per 30 kilobases ()1 × 10 5 total SNPs), to enable genome-wide genetic studies using these markers. We describe a ternary-encoding approach to SNP genotyping, in which the three possible genotypes for any biallelic SNP marker, AA, Aa, or aa (where “A” and “a” denote the two possible SNP alleles) are assigned a ternary digit of 0, 1, or 2, respectively. Genotyping a set of SNP markers using this encoding scheme produces a unique ternary code for an individual, with each digit of the code representing the genotype at a specific marker. If desired, this ternary (base 3) number can be converted to an easily archived, decimal (base 10) number that is a simple representation of an individual’s genotyping data over the panel of SNPs analyzed. We demonstrate the potential of ternary encoding as a straight- forward approach to genetic identification by unambiguously identifying 11 individuals using ternary genetic codes obtained from genotyping a panel of 7 SNP markers using a recently developed MALDI-TOF MS analysis method. EXPERIMENTAL SECTION Human Genomic DNA Samples. Pure genomic DNA from 10 unrelated parents of CEPH reference families (identified here as samples 1-10) were obtained from Coriell Cell Repositories (Camden, NJ). These DNA samples had the following Coriell repository numbers: individual 1, NA12663; individual 2, NA12626; individual 3, NA12625; individual 4, NA13114; individual 5, NA12664; individual 6, NA13113; individual 7, NA04479; individual * Corresponding author: (e-mail) [email protected]; (fax) (608) 265-6780. ² Current address: Department of Molecular Biotechnology, University of Washington, Box 357730, Seattle, WA 98195-7730. (1) Brookes, A. J. Gene 1999, 234, 177-186. (2) Wang, D. G.; Fan, J. B.; Siao, C. J.; Berno, A.; Young, P.; Sapolsky, R.; Ghandour, G.; Perkins, N.; Winchester, E.; Spencer, J.; et al. Science 1998, 280, 1077-1082. (3) Taillon-Miller, P.; Piernot, E. E.; Kwok, P.-Y. Genome Res. 1999, 9, 499- 505. (4) Schafer, A. J.; Hawkins, J. R. Nat. Biotechnol. 1998, 16, 33-39. (5) Landegren, U.; Nilsson, M.; Kwok, P.-Y. Genome Res. 1998, 8, 769-776. (6) Bailey, D. S.; Bondar, A.; Furness, L. M. Curr. Opin. Biotechnol. 1998, 9, 595-601. (7) Collins, F. S. Geriatrics 1999, 54, 41-47. (8) Persidis, A. Nat. Biotechnol. 1998, 16, 791-792. (9) Weiss, K. M. Genome Res. 1998, 8, 691-697. (10) Nickerson, D. A.; Taylor, S. L.; Weiss, K. M.; Clark, A. G.; Hutchinson, R. G.; Stengard, J.; Salomaa, V.; Vartiainen, E.; Boerwinkle, E.; Sing, C. F. Nat. Genet. 1998, 19, 233-240. (11) Weber, J. L.; Wong, C. Hum. Mol. Genet. 1993, 2, 1123-1128. (12) Ross, P. L.; Belgrader, P. Anal. Chem. 1997, 69, 3966-3972. (13) Taranenko, N. I.; Golovlev, V. V.; Allman, S. L.; Taranenko, N. V.; Chen, C. H.; Hong, J.; Chang, L. Y. Rapid Commun. Mass Spectrom. 1998, 12, 413- 418. (14) Butler, J. M.; Li, J.; Shaler, T. A.; Monforte, J. A.; Becker, C. H. Int. J. Legal Med. 1999, 112, 45-49. (15) Braun, A.; Little, D. P.; Reuter, D.; Muller-Mysok, B.; Koster, H. Genomics 1997, 46, 18-23. (16) Jeffreys, A. J.; Wilson, V.; Thein, S. L. Nature 1985, 314, 67-73. (17) Jeffreys, A. J.; Wilson, V.; Thein, S. L. Nature 1985, 316, 76-79. (18) Nakamura, Y.; Leppert, M.; O’Connell, P.; Wolff, R.; Holm, T.; Culver, M.; Martin, C.; Fujimoto, E.; Hoff, M.; Kumlin, E.; et al. Science 1987, 235, 1616-1622. (19) Weber, J. L.; May, P. E. Am. J. Hum. Genet. 1989, 44, 388-396. (20) Collins, F. S.; Patrinos, A.; Jordan, E.; Chakravarti, A.; Gesteland, R.; Walters, L. Science 1998, 282, 682-689. (21) Marshall, E. Science 1999, 284, 406-407. Anal. Chem. 2000, 72, 3298-3302 3298 Analytical Chemistry, Vol. 72, No. 14, July 15, 2000 10.1021/ac991390e CCC: $19.00 © 2000 American Chemical Society Published on Web 05/24/2000

Genetic Identification by Mass Spectrometric Analysis of Single-Nucleotide Polymorphisms:  Ternary Encoding of Genotypes

  • Upload
    lloyd-m

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Genetic Identification by Mass Spectrometric Analysis of Single-Nucleotide Polymorphisms:  Ternary Encoding of Genotypes

Genetic Identification by Mass SpectrometricAnalysis of Single-Nucleotide Polymorphisms:Ternary Encoding of Genotypes

Timothy J. Griffin† and Lloyd M. Smith*

Department of Chemistry, University of WisconsinsMadison, 1101 University Avenue, Madison, Wisconsin 53706-1396

An approach to genetic identification using biallelic single-nucleotide polymorphism (SNP) genetic markers is de-scribed in which the three possible genotypes, AA, Aa,or aa, where “A” and “a” represent the two SNP alleles,are assigned a ternary (base 3) digit of 0, 1, or 2,respectively. Genotyping an individual over a panel ofseparate SNP markers produces a composite ternarygenetic code that can be converted to an easily stored,decimal (base 10) genetic identification number. Theunambiguous identification of 11 individuals is demon-strated using ternary genetic codes generated from MALDI-TOF mass spectrometric genotyping data from 7 differentSNP markers.

Single-nucleotide polymorphisms (SNPs)1-5 have the potentialto revolutionize human genetic studies, with utility as geneticmarkers in linkage and association studies of complex genetictraits,1,4,5 pharmacogenomic research,6-8 and evolutionary geneticstudies.9 SNPs offer several attractive features as genetic markers,including the following: (1) occurrence at a high frequencythroughout the human genome, with estimates of one or moreSNPs per 1000 nucleotides when comparing any two humanchromosomes;1-3,10 (2) highly stable inheritance, unlike repeatpolymorphisms which are susceptible to germ line mutations,which are detrimental to inheritance analysis;11 and (3) eliminationof the need for comparative allele sizing of repeat polymorphisms,which, despite being amenable to MALDI-TOF MS analysis,12-15

are still most commonly analyzed by labor-intensive gel electro-phoresis-based techniques.16-19 Large-scale publicly20 and pri-vately21 funded efforts to produce maps of SNPs are currentlyunderway, with an ultimate goal of producing a dense genetic mapconsisting of at least 1 informative SNP per 30 kilobases ()1 ×105 total SNPs), to enable genome-wide genetic studies using thesemarkers.

We describe a ternary-encoding approach to SNP genotyping,in which the three possible genotypes for any biallelic SNPmarker, AA, Aa, or aa (where “A” and “a” denote the two possibleSNP alleles) are assigned a ternary digit of 0, 1, or 2, respectively.Genotyping a set of SNP markers using this encoding schemeproduces a unique ternary code for an individual, with each digitof the code representing the genotype at a specific marker. Ifdesired, this ternary (base 3) number can be converted to an easilyarchived, decimal (base 10) number that is a simple representationof an individual’s genotyping data over the panel of SNPs analyzed.We demonstrate the potential of ternary encoding as a straight-forward approach to genetic identification by unambiguouslyidentifying 11 individuals using ternary genetic codes obtainedfrom genotyping a panel of 7 SNP markers using a recentlydeveloped MALDI-TOF MS analysis method.

EXPERIMENTAL SECTIONHuman Genomic DNA Samples. Pure genomic DNA from

10 unrelated parents of CEPH reference families (identified hereas samples 1-10) were obtained from Coriell Cell Repositories(Camden, NJ). These DNA samples had the following Coriellrepository numbers: individual 1, NA12663; individual 2, NA12626;individual 3, NA12625; individual 4, NA13114; individual 5,NA12664; individual 6, NA13113; individual 7, NA04479; individual

* Corresponding author: (e-mail) [email protected]; (fax) (608) 265-6780.† Current address: Department of Molecular Biotechnology, University of

Washington, Box 357730, Seattle, WA 98195-7730.(1) Brookes, A. J. Gene 1999, 234, 177-186.(2) Wang, D. G.; Fan, J. B.; Siao, C. J.; Berno, A.; Young, P.; Sapolsky, R.;

Ghandour, G.; Perkins, N.; Winchester, E.; Spencer, J.; et al. Science 1998,280, 1077-1082.

(3) Taillon-Miller, P.; Piernot, E. E.; Kwok, P.-Y. Genome Res. 1999, 9, 499-505.

(4) Schafer, A. J.; Hawkins, J. R. Nat. Biotechnol. 1998, 16, 33-39.(5) Landegren, U.; Nilsson, M.; Kwok, P.-Y. Genome Res. 1998, 8, 769-776.(6) Bailey, D. S.; Bondar, A.; Furness, L. M. Curr. Opin. Biotechnol. 1998, 9,

595-601.(7) Collins, F. S. Geriatrics 1999, 54, 41-47.(8) Persidis, A. Nat. Biotechnol. 1998, 16, 791-792.(9) Weiss, K. M. Genome Res. 1998, 8, 691-697.

(10) Nickerson, D. A.; Taylor, S. L.; Weiss, K. M.; Clark, A. G.; Hutchinson, R.G.; Stengard, J.; Salomaa, V.; Vartiainen, E.; Boerwinkle, E.; Sing, C. F. Nat.Genet. 1998, 19, 233-240.

(11) Weber, J. L.; Wong, C. Hum. Mol. Genet. 1993, 2, 1123-1128.(12) Ross, P. L.; Belgrader, P. Anal. Chem. 1997, 69, 3966-3972.

(13) Taranenko, N. I.; Golovlev, V. V.; Allman, S. L.; Taranenko, N. V.; Chen, C.H.; Hong, J.; Chang, L. Y. Rapid Commun. Mass Spectrom. 1998, 12, 413-418.

(14) Butler, J. M.; Li, J.; Shaler, T. A.; Monforte, J. A.; Becker, C. H. Int. J. LegalMed. 1999, 112, 45-49.

(15) Braun, A.; Little, D. P.; Reuter, D.; Muller-Mysok, B.; Koster, H. Genomics1997, 46, 18-23.

(16) Jeffreys, A. J.; Wilson, V.; Thein, S. L. Nature 1985, 314, 67-73.(17) Jeffreys, A. J.; Wilson, V.; Thein, S. L. Nature 1985, 316, 76-79.(18) Nakamura, Y.; Leppert, M.; O’Connell, P.; Wolff, R.; Holm, T.; Culver, M.;

Martin, C.; Fujimoto, E.; Hoff, M.; Kumlin, E.; et al. Science 1987, 235,1616-1622.

(19) Weber, J. L.; May, P. E. Am. J. Hum. Genet. 1989, 44, 388-396.(20) Collins, F. S.; Patrinos, A.; Jordan, E.; Chakravarti, A.; Gesteland, R.; Walters,

L. Science 1998, 282, 682-689.(21) Marshall, E. Science 1999, 284, 406-407.

Anal. Chem. 2000, 72, 3298-3302

3298 Analytical Chemistry, Vol. 72, No. 14, July 15, 2000 10.1021/ac991390e CCC: $19.00 © 2000 American Chemical SocietyPublished on Web 05/24/2000

Page 2: Genetic Identification by Mass Spectrometric Analysis of Single-Nucleotide Polymorphisms:  Ternary Encoding of Genotypes

8, NA12817; individual 9, NA12818; individual 10, NA04477A. TheDNA sample identified as number 11 was not from the CEPHreference families.

MALDI-TOF MS SNP Analysis. Reaction conditions and theprocedure for the MALDI-TOF MS analysis of SNPs using theInvader assay have been previously described.22 The Invaderreaction buffers and the Cleavase VIII enzyme used in thesereactions were supplied by Third Wave Technologies (Madison,WI). All oligonucleotides used were synthesized by the Universityof Wisconsin Biotechnology Center, or Integrated DNA Technolo-gies (Coralville, IA). For the MALDI-MS analysis, 1 µL of MALDImatrix (0.5% R-cyano-4-hydroxycinnaminic acid (RCHCA) in 1:1CH3CN/ultrapure water) was spotted on the MALDI sample plateand allowed to air-dry. After solid-phase purification of the Invadersignal molecules,22 the samples were resuspended in 1 µL of 1:1CH3CN/ultrapure water and allowed to air-dry. In some cases,signal suppression was observed in the MALDI-MS analysis,possibly due to residual contaminants from the solid-phase samplepreparation procedure.22 The addition of another 1 µL of RCHCAmatrix to each sample spot after spotting and drying of theresuspended sample onto the first layer of matrix was found toincrease signal intensities. The MALDI-TOF MS analysis was doneon a Perseptive Biosystems (Framingham, MA) Voyager DE-STRmass spectrometer using a nitrogen laser at 337 nm with an initialaccelerating voltage of 20 kV and a delay time of 100 ns. Theinstrument was run in reflector mode using negative ion detectionwith external instrument calibration. All spectra acquired consistedof averaged signal from 50 to 100 laser shots, and the data wereprocessed using accompanying Perseptive Biosystems massspectrometry software.

RESULTS AND DISCUSSIONMALDI-TOF MS Genotyping Approach. Of the 11 individu-

als analyzed, 10 were parents from CEPH reference families(identified as individuals 1-10),23 while individual 11 was not froma CEPH family. Each of the seven SNP markers that were analyzedhave been previously characterized and their allele frequenciesestimated.24 For each marker, the more frequently occurring allele

was designated as allele A, while the less frequently occurringallele was identified as allele a. Table 1 gives a description of eachof these SNP markers, including the genotype frequencies for eachSNP predicted by Hardy-Weinberg (H-W) calculations usingthe allele frequencies shown. These SNPs were genotyped byMALDI-TOF MS analysis of reaction products from the Invaderassay,22,25 which involves the hybridization of two sequence-specificoligonucleotides, one termed the Invader oligonucleotide and theother termed the probe oligonucleotide, to a nucleic acid targetof interest (Figure 1A). These two oligonucleotides are designedso that the nucleotide on the 3′ end of the Invader oligonucleotideinvades at least one nucleotide into the downstream duplex formedby the probe oligonucleotide and the target strand, forming asequence overlap at that position. The unpaired “flap” region onthe 5′ end of the probe oligonucleotide is then enzymaticallycleaved, resulting in a 3′-hydroxyl-terminating DNA cleavageproduct. The reaction is run near the melting temperature of theduplex formed between the probe oligonucleotide and target suchthat, in the presence of excess probe oligonucleotide, cleaved anduncleaved probe oligonucleotides will cycle off and on the targetstrand, resulting in a linear accumulation of cleavage product withrespect to both time and target strand concentration.

To produce cleavage products at levels necessary for robustdetection by MALDI-TOF mass spectrometry, a two-step reactionwas used22 in which the primary cleavage product serves as anInvader oligonucleotide in a secondary invasive cleavage reactiondirected against a target oligonucleotide and probe oligonucleotidethat are externally introduced into the reaction mix, producingsecondary cleavage products which are then detected (Figure 1B).This two-step reaction design approximately squares the amountof amplification of cleavage products compared to a single-stageInvader reaction. For each SNP position analyzed, two allele-specific primary probe oligonucleotides and one Invader oligo-nucleotide were designed. The probe oligonucleotides eachcontain an allele-specific hybridization portion (solid region ofprobes in Figure 1A), differing in sequence by only one nucleotide,designated as X or Y, where X and Y are complementary to oneof the two possible nucleotides occurring at the SNP position. The3′-nucleotide of the Invader oligonucleotide (nucleotide N inFigure 1A) is noncomplementary to the target, overlapping theprobe oligonucleotide sequence at the SNP position and facilitating

(22) Griffin, T. J.; Hall, J. G.; Prudent, J. R.; Smith, L. M. Proc. Natl. Acad. Sci.U.S.A. 1999, 96, 6301-6306.

(23) Dausset, J.; Cann, H.; Cohen, D.; Lathrop, M.; Lalouel, J. M.; White, R.Genomics 1990, 6, 575-577.

(24) Kwok, P.-Y.; Deng, Q.; Zakeri, H.; Taylor, S. L.; Nickerson, D. A. Genomics1996, 31, 123-126.

(25) Lyamichev, V.; Mast, A. L.; Hall, J. G.; Prudent, J. R.; Kaiser, M. W.; Takova,T.; Kwiatkowski, R. W.; Sander, T. J.; de Arruda, M.; Arco, D. A.; Neri, B.P.; Brow, M. A. Nat. Biotechnol. 1999, 17, 292-296.

Table 1. SNP Marker Panela

WI-867 WI-921 WI-1126 WI-1325 WI-1803 WI-2026 WI-2032

chromosome 7 11 7 20 4 6 9SNP alleles dG dA dG dA dT dC dT dC dG dA dA dG dC dG

(A) (a) (A) (a) (A) (a) (A) (a) (A) (a) (A) (a) (A) (a)allele frequencies 0.55 0.45 0.84 0.16 0.64 0.36 0.64 0.36 0.70 0.30 0.80 0.20 0.60 0.40

AA Aa aa AA Aa aa AA Aa aa AA Aa aa AA Aa aa AA Aa aa AA Aa aagenotype frequenciesb 0.30 0.50 0.20 0.70 0.27 0.03 0.41 0.46 0.13 0-.41 0.46 0.13 0.49 0.42 0.09 0.64 0.32 0.04 0.36 0.48 0.16

a These SNPs are identified by the specific sequence-tagged site (STS) in which each is located. Sequences of these STSs are available on-lineat http://www.genome.wi.mit.edu/SNP/human/index.html. The two nucleotides corresponding to each of the possible alleles at each SNP markerare shown, along with the previously estimated allele frequencies and resulting Hardy-Weinberg genotype frequencies for each marker. Using theallele frequency values at each marker, the more frequently occurring allele was identified as “A”, and the less frequent allele was identified as “a”.b If p ) frequency of allele A and q ) frequency of allele a, then the genotype frequencies are given by the following: freq(AA) ) p2; freq(Aa) )2pq; freq(aa) ) q2.

Analytical Chemistry, Vol. 72, No. 14, July 15, 2000 3299

Page 3: Genetic Identification by Mass Spectrometric Analysis of Single-Nucleotide Polymorphisms:  Ternary Encoding of Genotypes

enzymatic cleavage immediately 3′ of nucleotide X or Y. Theprimary reaction was run for 2 h at 63 °C, resulting in theaccumulation of either or both allele-specific primary cleavageproducts. The reaction was then brought to 50 °C and a secondaryreaction mix was added that included two distinct secondary targetoligonucleotides and two secondary probe oligonucleotides. Thesequences of the secondary target and probe oligonucleotideswere designed so that one of the cleavage products from theprimary Invader reaction would anneal specifically to one of thetwo secondary targets and act as the Invader oligonucleotide inthe secondary reaction (Figure 1B); the cleavage product fromthe other allele would anneal specifically to the other secondarytarget. The same secondary oligonucleotides and signal outputswere used for each SNP analyzed. To prevent nonspecific back-ground signal from accumulating in the secondary reaction, a 2′-O-methyl RNA arrestor oligonucleotide was added to the second-ary reaction mix, which anneals to any uncleaved primary probeoligonucleotides that are present;22 additionally, the last fivenucleotides of the secondary target oligonucleotides were de-signed to be 2′-O-methyl RNA, and to have a 3′-amino group, tofurther prevent nonspecific signal accumulation.22 For each of theseven SNP markers analyzed here, the signal molecule corre-sponding to allele A (dT4-biotin) had a deprotonated, negativesingly charged molecular ion mass-to-charge [(M - H)-] valueof 1538, and the signal molecule corresponding to allele a (dT3-biotin) had a (M - H)- value of 1234. The signal moleculescontained a biotin group to enable their solid-phase purification,as previously described.22

Figure 2A shows the scheme used for ternary encoding of theMALDI-TOF MS genotyping data from the Invader reaction.

MALDI-TOF mass spectra containing a single peak at m/z 1538(genotype AA) were assigned a 0 ternary digit, those with peaksat both m/z 1538 and 1234 (genotype Aa) were assigned a 1, andthose with a single peak at m/z 1234 (genotype aa) were assigneda 2. Figure 2B shows representative ternary-encoded MALDI-TOFMS genotyping results for 3 of the 11 individuals analyzed in thisstudy.

Genetic Identification of 11 Individuals. The results forall 11 individuals genotyped by this method over this panel of 7SNPs are presented in Table 2. For each individual, both theternary genetic code (ID3) and the corresponding decimal (ID10)identification number is shown for the panel of seven SNPmarkers. The decimal identification number is the sum of theternary digits at each SNP marker multiplied by a power of 3specific to each marker. For a panel of M markers, the highestpower of 3 used to obtain a decimal identification number wouldbe 3M-1, such that ternary digits from one marker in the panelwould be chosen to always multiply this power, and each of thedigits for other markers would multiply a power of 3 decreasingsuccessively by 1 down to 30. For the panel of seven markersshown in Table 2, digits in the WI-867 column were multiplied by36, those in the WI-921 column by 35, those in the WI-1126 columnby 34, on down to those in the WI-2032 column, which weremultiplied by 30. For example, the ternary code for individual 1in Table 2 is (2110112), which is converted to a decimal numberas follows:

Figure 1. SNP genotyping using the Invader assay and MALDI-TOF mass spectrometry.

2 × 36 + 1 × 35 + 1 × 34 + 0 × 33 + 1 × 32 + 1 × 31 +2 × 30 ) 1796

3300 Analytical Chemistry, Vol. 72, No. 14, July 15, 2000

Page 4: Genetic Identification by Mass Spectrometric Analysis of Single-Nucleotide Polymorphisms:  Ternary Encoding of Genotypes

The probability of a match for an individual’s specific identifica-tion number can be easily calculated using the product rule,26

where the known H-W frequencies of each genotype displayedin the corresponding ternary code are multiplied together to givean estimated overall frequency of the genotypic composition overthe panel of polymorphic loci. Although use of the product rulemay not always be accurate due to population substructure effects

that cause deviations from linkage and H-W equilibrium,26-28 wedetermined a probability of match for each individual in Table 2using the genotype frequencies for each SNP marker shown inTable 1 under the assumption of validity of the product rule, toillustrate how these calculations can be done using ternary-encoded genotyping data.

The unique ternary code and corresponding decimal numberproduced for each of the 11 individuals over this panel of 7 SNPs

(26) National Research Council (U.S.), Committee on DNA Forensic Science.The Evaluation of Forensic DNA Evidence: An Update; National AcademyPress: Washington, DC, 1996.

(27) Lange, K. Am. J. Hum. Genet. 1993, 52, 305-311.(28) Budowle, B.; Monson, K. L. Clin. Chim. Acta 1994, 228, 3-18.

Figure 2. (A) Ternary-encoding scheme for MALDI-TOF MS genotyping data. Each genomic DNA sample, containing two copies of eachSNP target sequence, with each copy having two possible nucleotides at the polymorphic position, Y or X (identified as alleles A and a, respectively),was analyzed directly using the Invader assay and MALDI-TOF mass spectrometry. A single peak at m/z 1538 corresponded to an AA genotypeand was assigned a ternary digit of 0; detection of both signal molecules corresponded to an Aa genotype and was assigned a 1; a single peakat m/z 1234 corresponded to the aa genotype and was assigned a 2. (B) Representative ternary-encoded MALDI-TOF MS SNP genotypingresults for individuals 1, 5, and 10 at the seven SNP markers. The assigned ternary code digit for each individual at each marker is shown, aswell as the composite ternary identification code (ID3) and the corresponding decimal identification number (ID10).

Table 2. Ternary-Encoded Genotyping Resultsa

individual WI-867 WI-921 WI-1126 WI-1325 WI-1803 WI-2026 WI-2032 ID3 ID10 prob of match

1 2 1 1 0 1 1 2 2110112 1796 2.2 × 10-4

2 1 0 1 0 1 0 0 1010100 819 6.4 × 10-3

3 1 1 0 0 2 0 0 1100200 990 4.7 × 10-4

4 2 1 1 0 1 0 1 2110101 1792 1.3 × 10-3

5 1 1 0 0 2 1 0 1100210 993 2.4 × 10-4

6 2 0 0 0 1 0 0 2000100 1467 2.3 × 10-3

7 0 0 0 2 1 0 0 0002100 63 1.1 × 10-3

8 1 0 1 0 1 1 1 1010111 823 4.3 × 10-3

9 2 0 0 2 1 2 0 2002120 1527 4.5 × 10-5

10 0 0 0 2 1 1 1 0002111 67 7.2 × 10-4

11 0 1 0 0 1 0 1 0100101 253 1.8 × 10-3

a Ternary-encoding results for 11 individuals at each SNP marker are shown, along with the composite ternary identification code (ID3), thecorresponding decimal identification number (ID10), and the probability of a match for the composite ternary code generated for each individual,calculated as described in the text.

Analytical Chemistry, Vol. 72, No. 14, July 15, 2000 3301

Page 5: Genetic Identification by Mass Spectrometric Analysis of Single-Nucleotide Polymorphisms:  Ternary Encoding of Genotypes

(Table 2) demonstrates the effectiveness of ternary encoding ofSNP data for straightforward genetic identification. Comparisonof any two of the ternary codes shown in Table 2 reveals variationsat an average of almost four digits of the seven total (i.e., anaverage of approximately four SNP genotype differences betweenany two individuals over this panel of seven markers). Pairwisecomparisons between individuals 1 and 7 and also individuals 3and 10 showed differences at six digits, which was the highestnumber of code differences, while individuals 3 and 5 showed theleast amount of variation, having only one code difference, at SNPWI-2032.

The advantages in using SNPs as markers for genetic identi-fication has been previously described.29-33 One study that useda PCR-based SNP genotyping method for genetic identificationassigned values of 1 (AA), 2 (aa), and 3 (Aa) to genotyping dataand assigned a 0 to those samples where the genotyping reactionfailed.33 This strategy is limited in that it is not transferable toother SNP analysis platforms, and although the data could be base-four encoded, it would be inefficient because any genotypeassigned a 0 would not be informative. In contrast, ternaryencoding of SNP genotyping data is compatible with any SNPanalysis platform, and all of the encoded data are informative,enabling simple genetic identification by representing everyindividual by a unique number. An identification number derivedin this manner is easily archived and provides a basis for identitycomparisons that is much more straightforward than conventionalgenetic identification methods that rely on comparative sizing ofrepeat polymorphisms by gel electrophoresis16-19 or MALDI-TOFMS.12-15

Ternary encoding is also compatible with a conservativeapproach that has been proposed for the use of biallelic markersin genetic identification,34 in which a large panel of markers isgenotyped and the subset of loci at which an individual has aheterozygous genotype (i.e., those markers coded as a 1 whenusing ternary-encoding) is selected for comparison with genotyp-ing data from an unknown individual at these same markers. Asthe maximum probability for any heterozygous genotype is 0.5,then a panel of h heterozygous loci from an individual has amaximum probability of match of (0.5)h. This approach onlyassumes the subpopulation to which each sample being comparedbelongs is in approximate H-W equilibrium, eliminating the need

for probability of match calculations that require estimates of allelefrequencies, which may not be accurate when comparing individu-als from admixed populations.26-28 Using this conservative method,a genetic identification panel can be envisioned consisting of 44(2 per autosome) highly polymorphic SNP markers, to ensure asufficient number of heterozygous genotypes for identification ata high discriminating power. Assuming an average allele frequencyratio at each of these markers of 60:40, the average H-Wheterozygosity at any locus would be 0.48, giving on average ∼21heterozygous loci in a given individual from a panel consisting of44 total markers. The maximum probability of a match at 21heterozygous loci between two individuals would be (0.5)21, or 1in 2.1 × 106. More loci could be genotyped and added to this panelif higher degrees of discrimination were required.

CONCLUSIONSThe results presented here demonstrate the effectiveness of

ternary encoding of SNP genotyping data for the straightforwardidentification of individuals. Although ternary encoding of geno-typing data should be compatible with any SNP analysis platform,the MALDI-TOF MS-based SNP analysis approach used here isideally suited for ternary-encoded genotyping: First, the massspectra are easily interpreted and assigned a ternary digit by asimple assessment of the presence or absence of a peak at thedistinct (M - H)- values of the two possible allele-specific signalmolecules produced from the Invader assay. Minimal backgroundsignal is observed by this method, which is clearly evident in thespectra shown in Figure 2B, where a single peak is detected forhomozygotes, indicating the presence of one allele on bothchromosomes, while minimal or in some cases undetectable signalat the (M - H)- value of the signal molecule corresponding tothe absent allele is observed. Conversely, heterozygous genotypesdisplay peaks of roughly equal signal intensities at the (M - H)-

values of both signal molecules. Second, the unambiguousMALDI-TOF MS genotyping results are easily scored and couldbe integrated into a completely automated analysis system, as hasbeen demonstrated for other MALDI-TOF MS-based nucleic acidanalyses.35-37 Finally, the Invader assay permits SNP analysisdirectly from genomic DNA targets, using identical reactionconditions for all SNP markers analyzed, eliminating the need fora PCR amplification step, and making it well-suited for the high-throughput genotyping of large numbers of SNP markers in manydifferent individuals. Therefore, ternary encoding of genotypingdata obtained using the Invader assay and MALDI-TOF MSanalysis has great potential as a straightforward and generalapproach to the rapid genetic identification of individuals.

ACKNOWLEDGMENTT.J.G. was supported in part by a Proctor and Gamble

predoctoral fellowship. This work was supported by Departmentof Energy Grant DE-FG02-91ER61130.

Received for review December 6, 1999. Accepted April 13,2000.

AC991390E

(29) Syvanen, A. C.; Sajantila, A.; Lukka, M. Am. J. Hum. Genet. 1993, 52, 46-59.

(30) Boyce-Jacino, M. T.; Reynolds, J.; Nikiforov, T.; Rogers, Y.-H.; Saville, C.;McIntosh, T.; Goelet, P.; Knapp, M. R. Am. J. Hum. Genet. 1994, 55, A2088.

(31) Reynolds, J. E.; Vrolijk, L. A.; Hyche, R.; Bias, W.; Kwok, P.-Y.; Nickerson,D. A.; Boyce-Jacino, M. T. Am. J. Hum. Genet. 1995, 57, A208.

(32) Nikiforov, T. T.; Rendle, R. B.; Goelet, P.; Rogers, Y. H.; Kotewicz, M. L.;Anderson, S.; Trainor, G. L.; Knapp, M. R. Nucleic Acids Res. 1994, 22,4167-4175.

(33) Delahunty, C., Ankener, W.; Deng, Q.; Eng, J.; Nickerson, D. A. Am. J. Hum.Genet. 1996, 58, 1239-1246.

(34) Green, P. Am. J. Hum. Genet. 1992, 50, 440-443.(35) Van Ausdall, D. A.; Marshall, W. S. Anal. Biochem. 1998, 256, 220-228.(36) Little, D. P.; Cornish, T. J.; O’Donnell, M. J.; Braun, A.; Cotter, R. J.; Koster,

H. Anal. Chem. 1997, 69, 4540-4546.(37) Tang, K.; Fu, D. J.; Julien, D.; Braun, A.; Cantor, C. R.; Koster, H. Proc.

Natl. Acad. Sci. U.S.A. 1999, 96, 10016-10020.

3302 Analytical Chemistry, Vol. 72, No. 14, July 15, 2000