Genetics for Epidemiologists
Lecture 5: Analysis of Genetic Association Studies
National Human Genome Research
Institute
National Institutes of
Health
U.S. Department of Health and
Human Services
U.S. Department of Health and Human Services
National Institutes of HealthNational Human Genome Research
InstituteTeri A. Manolio, M.D., Ph.D.
Director, Office of Population Genomics andSenior Advisor to the Director, NHGRI,
for Population Genomics
Topics to be Covered
• Discrete traits and quantitative traits
• Measures of association
• Detecting/correcting for false positives
• Genotyping quality control
• Quantile-quantile (Q-Q) plots
• Odds ratios: allelic and genotypic
• Models of genetic transmission
• Interactions: gene-gene, gene-environment
Larson, G. The Complete Far Side. 2003.
Quantitative Genetics
“…concerned with the inheritance of those differences between individuals that are of degree rather than of kind…”
Quantitative Qualitative
Falconer and Mackay, Quantitative Genetics 1996.
Quantitative Genetics
“…concerned with the inheritance of those differences between individuals that are of degree rather than of kind…”
Quantitative Qualitative
Continuous gradation among individuals from one extreme to other
Sharply demarcated types with little connection by intermediates
Falconer and Mackay, Quantitative Genetics 1996.
Quantitative Genetics
“…concerned with the inheritance of those differences between individuals that are of degree rather than of kind…”
Quantitative Qualitative
Continuous gradation among individuals from one extreme to other
Sharply demarcated types with little connection by intermediates
Effects of genes are small
Effects of genes are large
Falconer and Mackay, Quantitative Genetics 1996.
Quantitative Genetics
“…concerned with the inheritance of those differences between individuals that are of degree rather than of kind…”
Quantitative Qualitative
Continuous gradation among individuals from one extreme to other
Sharply demarcated types with little connection by intermediates
Effects of genes are small
Effects of genes are large
Usually many genes Single genes inherited in Mendelian ratios?
Falconer and Mackay, Quantitative Genetics 1996.
Inheritance Models in Single Gene Trait
A
a
Genotype Group
Model AA Aa aa
Inheritance Models in Single Gene Trait
Genotype Group
Model AA Aa aa
A is Dominant
Inheritance Models in Single Gene Trait
Genotype Group
Model AA Aa aa
A is Dominant
Inheritance Models in Single Gene Trait
Genotype Group
Model AA Aa aa
A is Dominant
A is Recessive
Inheritance Models in Single Gene Trait
Genotype Group
Model AA Aa aa
A is Dominant
A is Recessive
A is Co-Dominant
Inheritance Models in Single Gene Trait
Inheritance Models in Quantitative Trait
A x increase in height
a x decrease in height
Population Mean
Model -x 0 +x
Inheritance Models in Quantitative Trait
Population Mean
Model -x 0 +x
A is Completely Dominant
aa
AAAa
Inheritance Models in Quantitative Trait
Population Mean
Model -x 0 +x
A is Completely Dominant
aa
AAAa
A is Partially Dominant
aa Aa AA
Inheritance Models in Quantitative Trait
Population Mean
Model -x 0 +x
A is Completely Dominant
aa
AAAa
A is Partially Dominant
aa Aa AA
A is Not (Co-) Dominant
aa Aa AA
Inheritance Models in Quantitative Trait
Population Mean
Model -x 0 +x
A is Completely Dominant
aa
AAAa
A is Partially Dominant
aa Aa AA
A is Not (Co-) Dominant
aa Aa AA
A is Over-Dominant
aa AA Aa
Inheritance Models in Quantitative Trait
Quantitative Traits with Published GWA Studies (16 - 34)
•QT interval •Lipids and lipoproteins
•Memory•Nicotine dependence
•ORMDL3 expression•YKL-40 levels •Obesity, BMI, waist•Insulin resistance•Height
•Bone mineral density•F-cell distribution•Fetal hemoglobin levels
•C-Reactive protein•18 groups of Framingham traits
•Pigmentation•Uric Acid Levels•Recombination Rate
Association of Alleles and Genotypes of rs1333049 (‘3049) with Myocardial
Infarction C
N (%)G
N (%)2
(1df)P-value
Cases2,132 (55.4)
1,716 (44.6)
55.11.2 x 10-
13Controls
2,783 (47.4)
3,089 (52.6)
Allelic Odds Ratio = 1.38
Samani N et al, N Engl J Med 2007; 357:443-453.
Association of Alleles and Genotypes of rs1333049 (‘3049) with Myocardial
Infarction C
N (%)G
N (%)2
(1df)P-value
Cases2,132 (55.4)
1,716 (44.6)
55.11.2 x 10-
13Controls
2,783 (47.4)
3,089 (52.6)
Allelic Odds Ratio = 1.38CC
N (%)CG
N (%)GG
N (%)2
(2df) P-value
Cases586
(30.5) 960 (49.9)
378 (19.6)59.7
1.1 x 10-
14Controls
676 (23.0)
1,431 (48.7)
829 (28.2)
Heterozygote Odds Ratio = 1.47
Homozygote Odds Ratio = 1.90
Samani N et al, N Engl J Med 2007; 357:443-453.
-Log10 P Values for SNP Associations with Myocardial
Infarction
Samani N et al, N Engl J Med 2007; 357:443-453.
http://www.broad.mit.edu/diabetes/scandinavs/type2.html
Genome-Wide Scan for Type 2 Diabetes in a Scandinavian Cohort
• Linear regression of inverse normalized levels against number of alleles
• Additive model• Sex, age, age2 as covariates
GWA Study of Serum Uric Acid Levels
Li S et al, PLoS Genet 2007; 3:e194.
Association of rs6855911 and Uric Acid Levels
Li S et al, PLoS Genet 2007; 3:e194.
Genotype Means (mg/dl)
Cohort Additive Effect
AA AG GG
SardiNIA -0.317 4.66 (1.51)
4.48 (1.59)
4.02 (1.63)
InCHIANTI
-0.397 5.27 (1.44)
4.94 (1.31)
4.33 (1.37)
Association Methods for Quantitative Traits
• Linear regression of multivariable adjusted residual against number of alleles (Kathiresan,Nat Genet 2008; 40:189-97)
• Linear regression of log transformed or centralized BMI against genotype (Frayling, Science 2007; 316:889-94)
• Variance components based Z-score analysis of quantile normalized height (Sanna, Nat Genet 2008; 40:198-203)
Ways of Dealing with Multiple Testing
• Control family wise error rate (FWER): Bonferroni (α’ = α/n) or Sĭdák (α’ = 1- [1- α]1/n)
• False discovery rate: proportion of significant associations that are actually false positives
• False positive report probability: probability that the null hypothesis is true, given a statistically significant finding
• Bayes factors analysis: avoids need for assessing genome-wide error rates but must identify reasonable alternative model
Hogart CJ et al, Genet Epidemiol 2008; 32:179-85.
Larson, G. The Complete Far Side. 2003.
Quality Control of SNP Genotyping: Samples
• Identity with forensic markers (Identifiler)
• Blind duplicates
• Gender checks
• Cryptic relatedness or unsuspected twinning
• Degradation/fragmentation
• Call rate (> 80-90%)
• Heterozygosity: outliers
• Plate/batch calling effects
Chanock et al, Nature 2007; Manolio et al Nat Genet 2007
Quality Control of SNP Genotyping: SNPs
• Duplicate concordance (CEPH samples)
• Mendelian errors (typically < 1)
• Hardy-Weinberg errors (often > 10-5)
• Heterozygosity (outliers)
• Call rate (typically > 98%)
• Minor allele frequency (often > 1%)
• Validation of most critical results on independent genotyping platform
Chanock et al, Nature 2007; Manolio et al Nat Genet 2007
Hardy-Weinberg Equilibrium
• Occurrence of two alleles of a SNP in the same individual are two independent events
• Ideal conditions:– random mating - no selection (equal
survival)– no migration - no mutation– no inbreeding - large population sizes– gene frequencies equal in males and females)…
• If alleles A and a of SNP rs1234 have frequencies p and 1-p, expected frequencies of the three genotypes are:
After G. Thomas, NCI
Freq AA = p2 Freq Aa = 2p(1-p) Freq aa = (1-p)2
Metric Perlegen Affymetrix/Broad
Number of SNPs 480,744 439,249
Coverage Single Marker
Multi-Marker
Single Marker
Multi-Marker
CEU 0.90 0.96 0.78 0.87 CHB + JPT 0.87 0.93 0.78 0.86 YRI 0.64 0.78 0.63 0.75Average call rate 98.9% 99.3%
Concordance
Homozygous genotypes 99.8% 99.9%
Heterozygous genotypes 99.8% 99.8%
Coverage, Call Rates, and Concordance of Perlegen and Affymetrix Platforms on
HapMap Phase II
GAIN Collaborative Group, Nat Genet 2007; 39:1045-51.
Metric 5.0 % fail 6.0 % failTotal Samples 1,829 -- 2,289 --Passing QC 1,817 0.44 2,192 4.24> 98% call rate 1,815 0.55 2,257 1.40
Sample and SNP QC Metrics for Affymetrix 5.0 and 6.0 Platforms in GAIN
Courtesy, J Paschall, NCBI
Metric 5.0 % fail 6.0 % failTotal Samples 1,829 -- 2,289 --Passing QC 1,817 0.44 2,192 4.24> 98% call rate 1,815 0.55 2,257 1.40
Total SNPs 457,645 -- 906,660 --Passing QC 429,309 6.19 845,814 6.70MAF > 1% 457,466 0.04 888,234 2.03> 98% call rate 419,810 8.27 821,942 9.34> 95% call rate 439,272 4.01 873,856 3.61HWE < 10 -6 455,899 0.38 904,275 0.26< 1 Mendel error 417,722 8.72 899,721 0.01
< 1 Duplicate error 454,820 0.01 892,103 0.02
Sample and SNP QC Metrics for Affymetrix 5.0 and 6.0 Platforms in GAIN
Courtesy, J Paschall, NCBI
Sample Heterozygosity in GAIN
0
500
1,000
1,500
2,000
2,500
0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40
Fre
quency
Courtesy, J Paschall, NCBI
Sample Heterozygosity in GAIN
0
10
20
30
40
50
60
70
80
90
100
0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40
Fre
quency
Courtesy, J Paschall, NCBI
Signal Intensity Plots for rs10801532 in AREDS
http://www.ncbi.nlm.nih.gov/sites/entrez
Signal Intensity Plots for rs4639796 in AREDS
http://www.ncbi.nlm.nih.gov/sites/entrez
Signal Intensity Plots for rs534399 in AREDS
http://www.ncbi.nlm.nih.gov/sites/entrez
Signal Intensity Plots for rs572515 in AREDS
http://www.ncbi.nlm.nih.gov/sites/entrez
Signal Intensity Plots for CD44 SNP rs9666607
Clayton DG et al, Nat Genet 2005; 37:1243-1246.
Courtesy, G. Thomas, NCI
Principal Component Analysis of Structured Population: First to Third
Components
Courtesy, G. Thomas, NCI
Principal Component Analysis of Structured Population: Fourth and
Fifth Components
Courtesy, G. Thomas, NCI
Influence of Relatedness on Principal Component Analysis
Courtesy, G. Thomas, NCI
Principal Component Analysis of Structured Population: Fourth and
Fifth Components
Courtesy, G. Thomas, NCI
Principal Component Analysis of Structured Population: Fourth and
Fifth Components
Summary Points: Genotyping Quality Control
• Sample checks for identity, gender error, cryptic relatedness
• Sample handling differences can introduce artifacts but probably can be adjusted for
• Association analysis is often quickest way to find genotyping errors
• Low MAF SNPs are most difficult to call
• Inspection of genotyping cluster plots is crucial!
Easton D et al, Nature 2007; 447:1087-1093.
Quantile-Quantile Plot for Test Statistics,
390 Breast Cancer Cases, 364 Controls
205,586 SNPsλ = 1.03
Easton D et al, Nature 2007; 447:1087-93.
Observed and Expected Associations after Stage 2 of Breast
Cancer GWASignificance
Observed
Observed
Adjusted
Expected
Ratio
0.01 - 0.05 1,239 1,162 934 1.24
10-3 – 10-2 574 517 348 1.49
10-4 – 10-3 112 88 53 1.65
10-5 – 10-4 16 12 7 1.71
< 10-5 15 13 1 13.5
All p < 0.05 1,956 1,792 1,343 1.33
Q-Q Plot for Multiple Sclerosis; Effect of MHC
Hafler D et al, N Engl J Med 2007; 357:851-862.
Q-Q Plot for Prostate Cancer, all SNPs
Gudmundsson J et al, Nat Genet 2007; 39:977-983.
Q-Q Plot for Prostate Cancer, excluding Chromosome 8
Gudmundsson J et al, Nat Genet 2007; 39:977-983.
Q-Q Plot for Myocardial Infarction
Samani N et al, N Engl J Med 2007; 357:443-453.
Expected chi-squared statistic0 5 10 15 20 25
Obs
erve
d ch
i-squ
ared
sta
tistic
0
2
0
40
60
-Log10 P Values for SNP Associations with Myocardial
Infarction
Samani N et al, N Engl J Med 2007; 357:443-453.
-Log10 P Values for SNP Associations with Myocardial
Infarction
Samani N et al, N Engl J Med 2007; 357:443-453.
SNP Associations with 1,928 MI Cases and 2,938 Controls from UK
Samani N et al, N Engl J Med 2007; 357:443-453.
Association Signal for Coronary Artery Disease on Chromosome 9
’3049
Samani N et al, N Engl J Med 2007; 357:443-453.
Winner’s Curse: Odds Ratios for CHD Associated with LTA Genotypes in
Multiple Studies
Clarke et al, PLoS Genet 2006; 2:e107.
Genome-Wide Scan for Alzheimer’s Disease in 861 Cases and 550
Controls
Reiman E et al, Neuron 2007; 54:713-20.
Genome-Wide Scan for Alzheimer’s Disease in ApoE*e4Carriers
Reiman E et al, Neuron 2007; 54:713-20.
LOAD Odds Ratios Associated with rs2373115 GG by APOE*e4 Status
APOE*e4 Group
APOE*e4 OR [95% CI]
rs2373115OR [95%CI]
APOE*e4 - 1.12 [0.82,1.53]
APOE*e4 + 2.88 [1.90,4.36]
All6.07 [4.63-
7.95]1.34 [1.06,1.70]
Reiman et al, Neuron 2007; 54:713-720.
Klein et al, Science 2005; 308:385-389.
P Values of GWA Scan for Age-Related Macular Degeneration
Klein et al, Science 2005; 308:385-389.
Odds Ratios and Population Attributable Risks for AMD
Attribute (SNP) rs380390
(C/G) rs1329428
(C/T)
Risk allele C C
Allelic association χ2 P value 4.1 x 10–8 1.4 x 10–6
Odds ratio (dominant) 4.6 [2.0-11] 4.7 [1.0-22]
Frequency in HapMap CEU 0.70 0.82
Population Attributable Risk
70% [42-84%] 80% [0-96%]
Odds ratio (recessive) 7.4 [2.9-19] 6.2 [2.9-13]
Frequency in HapMap CEU 0.23 0.41
Population Attributable Risk
46% [31-57%]
61% [43-73%]
Risk of Developing AMD by CFH Y402H and Modifiable Risk Factors
Schaumberg DA et al, Arch Ophthalmol 2007; 125:55-62.
Risk Factor
CFH Y402H Genotype
YY YH HH
BMI < 30 kg/m2 1.00 1.95
[1.42-2.67]3.96
[2.69-5.82]
BMI > 30 kg/m2
1.98 [0.91-4.31]
2.19 [1.11-4.30]
12.28 [4.88-30.90]
Non-smoker 1.00 1.95 [1.41-2.71]
4.23 [2.86-6.27]
Current smoker
2.34 [1.20-4.55]
3.20 [1.85-5.55]
8.69 [3.86-19.57]
TT
CC
CT
Ordovas et al, Circulation 2002; 106:2315-2321.
Interaction: Is LIPC Genotype Related to HDL-C?
TT CC
CT
Inverse Relation between Endotoxin Exposure and Allergic Sensitization
by CD14 Genotype
Simpson A et al, Am J Respir Crit Care Med 2006;174:386-392.
Challenges in Studying Gene-Environment Interactions
Challenge GenesEnvironme
nt
Ease of measure Pretty easy Often hard
Variability over time
Low/none High
Recall bias None Possible
Temporal relation to disease
Easy Hard
Larson, G. The Complete Far Side. 2003.