Upload
adriel-forester
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Topic #3Topic #3Linkage Disequilibrium, Haplotypes & Linkage Disequilibrium, Haplotypes &
TaggingTagging
University of Wisconsin
Genetic Analysis Workshop
June 2011
OverviewOverview• Fate of a new mutation
• Linkage Disequilibrium (LD)– Measurement– Indirect association
• SNP selection based on LD – Haplotypes– SNP selection by tagging
• Practical – SNP selection using Haploview
Introduction of a Mutation into a PopulationIntroduction of a Mutation into a Population
Introduction of a Mutation into a PopulationIntroduction of a Mutation into a Population
1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1
1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1
1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1
1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1
1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1
1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1
1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1
1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1
1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1
Haplotype ConceptHaplotype Concept• The sequence 111212 in this location becomes a
signature for the chromosome carrying the mutation
• Haplotype – alleles inherited together at linked loci on the same chromosome
• 111212 haplotype will not be a perfect marker of disease– At the time mutation arose, there may have been other
chromosomes with 111212– New mutations– Recombination
Indirect AssociationIndirect Association
• Each of the alleles in the 111212 haplotype is also expected to be indirectly associated with carrying the mutation.
• Indirect association is an association of a marker with phenotype that is non-causal, being based on linkage disequilibrium (LD)
Linkage Disequilibrium (LD)Linkage Disequilibrium (LD)
• Mendel’s Second Law: alleles at different loci assort independently
• Linkage Disequilibrium (LD): population-level association of alleles at linked loci
How LD is MeasuredHow LD is MeasuredLD – population-level association between linked loci
A locus: A1 or A2
B locus:B1 or B2
Let P(A1) = pA1
Let P(B1) = pB1
Let P(A1B1) = pA1B1
D = pA1B1 - pA1pB1 = 0 if independent
Common LD MeasuresCommon LD Measures• D = |d|
– Preferred measure for population geneticists– Maximum value is bounded by the marginals
• D’ = |d|/dmax– D’ varies between 0 and 1– Does not have an easy interpretation and 1.0 is achieved if
one off-diagonal is zero
• r2 ( D2) = D2/p(1-p)q(1-q)– Has several interpretations:
• = squared (phi) correlation so lies in [0,1]. • = 2/N
– Directly related to power for indirect association
Allelic AssociationAllelic Association• Direct Association
– Initially it was thought that we could pick the genes and the (single) genetic variant w/i each gene that was relevant for disease
• Indirect Association– The existence of LD opens up the possibility of tests by
indirect association – we don’t need to actually test the causal variant but rather need only genotype a marker that is in high LD with the causal variant
Indirect and Direct Allelic AssociationIndirect and Direct Allelic Association
D
Direct Association
Assess relationship of D locus to phenotype directly – expect D to be a functional polymorphism in a candidate gene
D
Indirect Association
M1 M2M3
Assess relationship of D locus indirectly by determining whethermarkers (Mi) are associated with disease – Mi don’t need to befunctional
Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394
Dawson, E. et al. (2002). A first-generation LD map of 22. Nature 418: 544-547
Population DifferencesPopulation Differences
Weiss, K.M & Clark, A.G. (2002). Trends in Genetics, 18(1):19-24.
Recombination HotspotsRecombination Hotspots
Kauppi, L., Jeffreys, A. J., & Keeney, S. (2004). Where the crossovers are: Recombination distributions in mammals. Nature Reviews Genetics, 5, 413-424
Hotspots typically span 1-2 kb
Haplotype BlocksHaplotype Blocks
Two- and Three-locus HaplotypesTwo- and Three-locus Haplotypes
Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394
APOE locus and haplotypescontaining APOE
Two- and Three-locus HaplotypesTwo- and Three-locus Haplotypes
Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394
3-locus haplotype strongersignal than individual markers
SNP Selection by TaggingSNP Selection by Tagging• Basic rationale:
– The power for a causal SNP in a sample of size N is equivalent to power of tagging SNP in a sample of size N/r2
• Tagging SNP selection:– Based on some reference sample (HapMap)– Two overarching strategies
• Pairwise tagging• Multimarker tagging
de Bakker, P. I. W., et al. (2005). Efficiency and power in genetic association studies. Nature Genetics, 37(11), 1217-1223.
Reference Sample: HapMapReference Sample: HapMap(www.hapmap.org)(www.hapmap.org)
• HapMap Phase 1:– SNP Selection Strategy (yield ~ 1 million):
• >1 common SNP every 5 kb, total of 1.3 million before QC• MAF > .05• Some priority for non-synonymous cSNPs
– Sample: N=270 (269) individuals from 4 populations• 30 trios of Europeans from Utah (CEU)• 45 unrelated Han Chinese (CHB)• 45 unrelated Japanese (JPT)• 30 Yoruban trios from Nigeria (YRI)
Reference Sample: HapMapReference Sample: HapMap(www.hapmap.org)(www.hapmap.org)
• Phase 2:– 2.1 million additional SNPs
• Total now averages ~ 1/per kb; >98% of common variants w/i 5kb• Focus still on MAF > .05• Average max r2 of untyped common SNPs to a typed SNP
Population HapMap I HapMap II
YRI .67 .90
CEU .85 .96
CHB+JPT .83 .95
Reference Sample: HapMapReference Sample: HapMap(www.hapmap.org)(www.hapmap.org)
• Phase 3:– Expand to N=1115 in 11 ancestral groups 2.1 million
additional SNPslabel population sample # samples QC+ Draft 1ASW* African ancestry in Southwest USA 90 71
CEU*Utah residents with Northern and Western
European ancestry from the CEPH collection
180 162
CHB Han Chinese in Beijing, China 90 82CHD Chinese in Metropolitan Denver, Colorado 100 70GIH Gujarati Indians in Houston, Texas 100 83J PT J apanese in Tokyo, J apan 91 82LWK Luhya in Webuye, Kenya 100 83
MEX* Mexican ancestry in Los Angeles, California 90 71
MKK* Maasai in Kinyawa, Kenya 180 171TSI Toscans in Italy 100 77YRI* Yoruba in Ibadan, Nigeria 180 163
1,301 1,115* Sample consists of family triples
HAPMAP3, Release 2HAPMAP3, Release 2
Region in NCBI B36
COMT
Phase, Release and Build
HapMap Genotyped SNPs in COMTHapMap Genotyped SNPs in COMT
Using Haploview to Identify Using Haploview to Identify Tagging SNPs for COMTTagging SNPs for COMT
• Download Data from HapMap– Choose HapMap Download, Phase 3, and Release 2– Choose population– Choose chromosome (22) and region (NCBI B36/hg18)
• Transcription starts at 18309; I will start at 18304• Transcription ends at 18337; I will end at 18340
• Haploview Analysis– Get LD plot– Run Tagger (pairwise)– Force include/exclude
COMT LD Plot (D’)COMT LD Plot (D’)
COMT LD Plot (rCOMT LD Plot (r22))
COMT Tagging SNPs (15 tag 24 at avg rCOMT Tagging SNPs (15 tag 24 at avg r22 = .996) = .996)Tag SNP bp Location MAF Other SNPs Tagged
rs5748489 18307146 5’ .37 rs1544325
rs4646310 18308806 5’ .21 rs6518591
rs737865 18310121 Intron #1 .32 rs737866, rs2020917, rs8185002
rs174675 18314051 Intron #1 .31 rs933271
rs5993882 18317533 Intron #1 .26
rs5993883 18317638 Intron #1 .41
rs740601 18330763 Intron #2 .45 rs2239393, rs4646312
rs4680 18331271 Exon #4 .47 rs4633
rs4646316 18332132 Intron #5 .25
rs174696 18333176 Intron #5 .18
rs9306235 18335157 Intron #5 .11
rs9332377 18335692 Intron #5 .17 rs165728
rs165824 18339366 Intron #5 .06
rs165815 18339473 3’ .12
rs5993891 18339746 3’ .06
LD Plot Available from SNPInfo(http://manticore.niehs.nih.gov/)
ConclusionsConclusions• Alleles at linked loci tend to be inherited together, a
phenomenon known as linkage disequilibrium (LD)
• Because recombination is not uniform, the genome has a “block-like” structure – haplotype
• You do not need to have the “causal variant” in your genotyped set if it is adequately tagged
• A major strategy for SNP selection is to ensure adequate coverage (r2 > .8) of common genetic variants in a gene, which can be done with Haploview