70
Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies National Human Genome Research Institute National Institutes of Health U.S. Department of Health and Human Services U.S. Department of Health and Human Services National Institutes of Health National Human Genome Research Institute Teri A. Manolio, M.D., Ph.D. Director, Office of Population Genomics and Senior Advisor to the Director, NHGRI, for Population Genomics

Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

  • Upload
    keelia

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies. National Human Genome Research Institute. U.S. Department of Health and Human Services National Institutes of Health National Human Genome Research Institute. National Institutes of Health. - PowerPoint PPT Presentation

Citation preview

Page 1: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Genetics for Epidemiologists

Lecture 5: Analysis of Genetic Association Studies

National Human Genome Research

Institute

National Institutes of

Health

U.S. Department of Health and

Human Services

U.S. Department of Health and Human Services

National Institutes of HealthNational Human Genome Research

InstituteTeri A. Manolio, M.D., Ph.D.

Director, Office of Population Genomics andSenior Advisor to the Director, NHGRI,

for Population Genomics

Page 2: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Topics to be Covered

• Discrete traits and quantitative traits

• Measures of association

• Detecting/correcting for false positives

• Genotyping quality control

• Quantile-quantile (Q-Q) plots

• Odds ratios: allelic and genotypic

• Models of genetic transmission

• Interactions: gene-gene, gene-environment

Page 3: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Larson, G. The Complete Far Side. 2003.

Page 4: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Quantitative Genetics

“…concerned with the inheritance of those differences between individuals that are of degree rather than of kind…”

Quantitative Qualitative

Falconer and Mackay, Quantitative Genetics 1996.

Page 5: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Quantitative Genetics

“…concerned with the inheritance of those differences between individuals that are of degree rather than of kind…”

Quantitative Qualitative

Continuous gradation among individuals from one extreme to other

Sharply demarcated types with little connection by intermediates

Falconer and Mackay, Quantitative Genetics 1996.

Page 6: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Quantitative Genetics

“…concerned with the inheritance of those differences between individuals that are of degree rather than of kind…”

Quantitative Qualitative

Continuous gradation among individuals from one extreme to other

Sharply demarcated types with little connection by intermediates

Effects of genes are small

Effects of genes are large

Falconer and Mackay, Quantitative Genetics 1996.

Page 7: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Quantitative Genetics

“…concerned with the inheritance of those differences between individuals that are of degree rather than of kind…”

Quantitative Qualitative

Continuous gradation among individuals from one extreme to other

Sharply demarcated types with little connection by intermediates

Effects of genes are small

Effects of genes are large

Usually many genes Single genes inherited in Mendelian ratios?

Falconer and Mackay, Quantitative Genetics 1996.

Page 8: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Inheritance Models in Single Gene Trait

A

a

Page 9: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Genotype Group

Model AA Aa aa

Inheritance Models in Single Gene Trait

Page 10: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Genotype Group

Model AA Aa aa

A is Dominant

Inheritance Models in Single Gene Trait

Page 11: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Genotype Group

Model AA Aa aa

A is Dominant

Inheritance Models in Single Gene Trait

Page 12: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Genotype Group

Model AA Aa aa

A is Dominant

A is Recessive

Inheritance Models in Single Gene Trait

Page 13: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Genotype Group

Model AA Aa aa

A is Dominant

A is Recessive

A is Co-Dominant

Inheritance Models in Single Gene Trait

Page 14: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Inheritance Models in Quantitative Trait

A x increase in height

a x decrease in height

Page 15: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Population Mean

Model -x 0 +x

Inheritance Models in Quantitative Trait

Page 16: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Population Mean

Model -x 0 +x

A is Completely Dominant

aa

AAAa

Inheritance Models in Quantitative Trait

Page 17: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Population Mean

Model -x 0 +x

A is Completely Dominant

aa

AAAa

A is Partially Dominant

aa Aa AA

Inheritance Models in Quantitative Trait

Page 18: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Population Mean

Model -x 0 +x

A is Completely Dominant

aa

AAAa

A is Partially Dominant

aa Aa AA

A is Not (Co-) Dominant

aa Aa AA

Inheritance Models in Quantitative Trait

Page 19: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Population Mean

Model -x 0 +x

A is Completely Dominant

aa

AAAa

A is Partially Dominant

aa Aa AA

A is Not (Co-) Dominant

aa Aa AA

A is Over-Dominant

aa AA Aa

Inheritance Models in Quantitative Trait

Page 20: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Quantitative Traits with Published GWA Studies (16 - 34)

•QT interval •Lipids and lipoproteins

•Memory•Nicotine dependence

•ORMDL3 expression•YKL-40 levels •Obesity, BMI, waist•Insulin resistance•Height

•Bone mineral density•F-cell distribution•Fetal hemoglobin levels

•C-Reactive protein•18 groups of Framingham traits

•Pigmentation•Uric Acid Levels•Recombination Rate

Page 21: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Association of Alleles and Genotypes of rs1333049 (‘3049) with Myocardial

Infarction C

N (%)G

N (%)2

(1df)P-value

Cases2,132 (55.4)

1,716 (44.6)

55.11.2 x 10-

13Controls

2,783 (47.4)

3,089 (52.6)

Allelic Odds Ratio = 1.38

Samani N et al, N Engl J Med 2007; 357:443-453.

Page 22: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Association of Alleles and Genotypes of rs1333049 (‘3049) with Myocardial

Infarction C

N (%)G

N (%)2

(1df)P-value

Cases2,132 (55.4)

1,716 (44.6)

55.11.2 x 10-

13Controls

2,783 (47.4)

3,089 (52.6)

Allelic Odds Ratio = 1.38CC

N (%)CG

N (%)GG

N (%)2

(2df) P-value

Cases586

(30.5) 960 (49.9)

378 (19.6)59.7

1.1 x 10-

14Controls

676 (23.0)

1,431 (48.7)

829 (28.2)

Heterozygote Odds Ratio = 1.47

Homozygote Odds Ratio = 1.90

Samani N et al, N Engl J Med 2007; 357:443-453.

Page 23: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

-Log10 P Values for SNP Associations with Myocardial

Infarction

Samani N et al, N Engl J Med 2007; 357:443-453.

Page 24: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

http://www.broad.mit.edu/diabetes/scandinavs/type2.html

Genome-Wide Scan for Type 2 Diabetes in a Scandinavian Cohort

Page 25: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

• Linear regression of inverse normalized levels against number of alleles

• Additive model• Sex, age, age2 as covariates

GWA Study of Serum Uric Acid Levels

Li S et al, PLoS Genet 2007; 3:e194.

Page 26: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Association of rs6855911 and Uric Acid Levels

Li S et al, PLoS Genet 2007; 3:e194.

Genotype Means (mg/dl)

Cohort Additive Effect

AA AG GG

SardiNIA -0.317 4.66 (1.51)

4.48 (1.59)

4.02 (1.63)

InCHIANTI

-0.397 5.27 (1.44)

4.94 (1.31)

4.33 (1.37)

Page 27: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Association Methods for Quantitative Traits

• Linear regression of multivariable adjusted residual against number of alleles (Kathiresan,Nat Genet 2008; 40:189-97)

• Linear regression of log transformed or centralized BMI against genotype (Frayling, Science 2007; 316:889-94)

• Variance components based Z-score analysis of quantile normalized height (Sanna, Nat Genet 2008; 40:198-203)

Page 28: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Ways of Dealing with Multiple Testing

• Control family wise error rate (FWER): Bonferroni (α’ = α/n) or Sĭdák (α’ = 1- [1- α]1/n)

• False discovery rate: proportion of significant associations that are actually false positives

• False positive report probability: probability that the null hypothesis is true, given a statistically significant finding

• Bayes factors analysis: avoids need for assessing genome-wide error rates but must identify reasonable alternative model

Hogart CJ et al, Genet Epidemiol 2008; 32:179-85.

Page 29: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Larson, G. The Complete Far Side. 2003.

Page 30: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Quality Control of SNP Genotyping: Samples

• Identity with forensic markers (Identifiler)

• Blind duplicates

• Gender checks

• Cryptic relatedness or unsuspected twinning

• Degradation/fragmentation

• Call rate (> 80-90%)

• Heterozygosity: outliers

• Plate/batch calling effects

Chanock et al, Nature 2007; Manolio et al Nat Genet 2007

Page 31: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Quality Control of SNP Genotyping: SNPs

• Duplicate concordance (CEPH samples)

• Mendelian errors (typically < 1)

• Hardy-Weinberg errors (often > 10-5)

• Heterozygosity (outliers)

• Call rate (typically > 98%)

• Minor allele frequency (often > 1%)

• Validation of most critical results on independent genotyping platform

Chanock et al, Nature 2007; Manolio et al Nat Genet 2007

Page 32: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Hardy-Weinberg Equilibrium

• Occurrence of two alleles of a SNP in the same individual are two independent events

• Ideal conditions:– random mating - no selection (equal

survival)– no migration - no mutation– no inbreeding - large population sizes– gene frequencies equal in males and females)…

• If alleles A and a of SNP rs1234 have frequencies p and 1-p, expected frequencies of the three genotypes are:

After G. Thomas, NCI

Freq AA = p2 Freq Aa = 2p(1-p) Freq aa = (1-p)2

Page 33: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Metric Perlegen Affymetrix/Broad

Number of SNPs 480,744 439,249

Coverage Single Marker

Multi-Marker

Single Marker

Multi-Marker

CEU 0.90 0.96 0.78 0.87 CHB + JPT 0.87 0.93 0.78 0.86 YRI 0.64 0.78 0.63 0.75Average call rate 98.9% 99.3%

Concordance

Homozygous genotypes 99.8% 99.9%

Heterozygous genotypes 99.8% 99.8%

Coverage, Call Rates, and Concordance of Perlegen and Affymetrix Platforms on

HapMap Phase II

GAIN Collaborative Group, Nat Genet 2007; 39:1045-51.

Page 34: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Metric 5.0 % fail 6.0 % failTotal Samples 1,829 -- 2,289 --Passing QC 1,817 0.44 2,192 4.24> 98% call rate 1,815 0.55 2,257 1.40

Sample and SNP QC Metrics for Affymetrix 5.0 and 6.0 Platforms in GAIN

Courtesy, J Paschall, NCBI

Page 35: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Metric 5.0 % fail 6.0 % failTotal Samples 1,829 -- 2,289 --Passing QC 1,817 0.44 2,192 4.24> 98% call rate 1,815 0.55 2,257 1.40

Total SNPs 457,645 -- 906,660 --Passing QC 429,309 6.19 845,814 6.70MAF > 1% 457,466 0.04 888,234 2.03> 98% call rate 419,810 8.27 821,942 9.34> 95% call rate 439,272 4.01 873,856 3.61HWE < 10 -6 455,899 0.38 904,275 0.26< 1 Mendel error 417,722 8.72 899,721 0.01

< 1 Duplicate error 454,820 0.01 892,103 0.02

Sample and SNP QC Metrics for Affymetrix 5.0 and 6.0 Platforms in GAIN

Courtesy, J Paschall, NCBI

Page 36: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Sample Heterozygosity in GAIN

0

500

1,000

1,500

2,000

2,500

0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40

Fre

quency

Courtesy, J Paschall, NCBI

Page 37: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Sample Heterozygosity in GAIN

0

10

20

30

40

50

60

70

80

90

100

0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40

Fre

quency

Courtesy, J Paschall, NCBI

Page 38: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Signal Intensity Plots for rs10801532 in AREDS

http://www.ncbi.nlm.nih.gov/sites/entrez

Page 39: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Signal Intensity Plots for rs4639796 in AREDS

http://www.ncbi.nlm.nih.gov/sites/entrez

Page 40: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Signal Intensity Plots for rs534399 in AREDS

http://www.ncbi.nlm.nih.gov/sites/entrez

Page 41: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Signal Intensity Plots for rs572515 in AREDS

http://www.ncbi.nlm.nih.gov/sites/entrez

Page 42: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Signal Intensity Plots for CD44 SNP rs9666607

Clayton DG et al, Nat Genet 2005; 37:1243-1246.

Page 43: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Courtesy, G. Thomas, NCI

Principal Component Analysis of Structured Population: First to Third

Components

Page 44: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Courtesy, G. Thomas, NCI

Principal Component Analysis of Structured Population: Fourth and

Fifth Components

Page 45: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Courtesy, G. Thomas, NCI

Influence of Relatedness on Principal Component Analysis

Page 46: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Courtesy, G. Thomas, NCI

Principal Component Analysis of Structured Population: Fourth and

Fifth Components

Page 47: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Courtesy, G. Thomas, NCI

Principal Component Analysis of Structured Population: Fourth and

Fifth Components

Page 48: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Summary Points: Genotyping Quality Control

• Sample checks for identity, gender error, cryptic relatedness

• Sample handling differences can introduce artifacts but probably can be adjusted for

• Association analysis is often quickest way to find genotyping errors

• Low MAF SNPs are most difficult to call

• Inspection of genotyping cluster plots is crucial!

Page 49: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Easton D et al, Nature 2007; 447:1087-1093.

Quantile-Quantile Plot for Test Statistics,

390 Breast Cancer Cases, 364 Controls

205,586 SNPsλ = 1.03

Page 50: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Easton D et al, Nature 2007; 447:1087-93.

Observed and Expected Associations after Stage 2 of Breast

Cancer GWASignificance

Observed

Observed

Adjusted

Expected

Ratio

0.01 - 0.05 1,239 1,162 934 1.24

10-3 – 10-2 574 517 348 1.49

10-4 – 10-3 112 88 53 1.65

10-5 – 10-4 16 12 7 1.71

< 10-5 15 13 1 13.5

All p < 0.05 1,956 1,792 1,343 1.33

Page 51: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Q-Q Plot for Multiple Sclerosis; Effect of MHC

Hafler D et al, N Engl J Med 2007; 357:851-862.

Page 52: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Q-Q Plot for Prostate Cancer, all SNPs

Gudmundsson J et al, Nat Genet 2007; 39:977-983.

Page 53: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Q-Q Plot for Prostate Cancer, excluding Chromosome 8

Gudmundsson J et al, Nat Genet 2007; 39:977-983.

Page 54: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Q-Q Plot for Myocardial Infarction

Samani N et al, N Engl J Med 2007; 357:443-453.

Expected chi-squared statistic0 5 10 15 20 25

Obs

erve

d ch

i-squ

ared

sta

tistic

0

2

0

40

60

Page 55: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

-Log10 P Values for SNP Associations with Myocardial

Infarction

Samani N et al, N Engl J Med 2007; 357:443-453.

Page 56: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

-Log10 P Values for SNP Associations with Myocardial

Infarction

Samani N et al, N Engl J Med 2007; 357:443-453.

Page 57: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

SNP Associations with 1,928 MI Cases and 2,938 Controls from UK

Samani N et al, N Engl J Med 2007; 357:443-453.

Page 58: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Association Signal for Coronary Artery Disease on Chromosome 9

’3049

Samani N et al, N Engl J Med 2007; 357:443-453.

Page 59: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Winner’s Curse: Odds Ratios for CHD Associated with LTA Genotypes in

Multiple Studies

Clarke et al, PLoS Genet 2006; 2:e107.

Page 60: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Genome-Wide Scan for Alzheimer’s Disease in 861 Cases and 550

Controls

Reiman E et al, Neuron 2007; 54:713-20.

Page 61: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Genome-Wide Scan for Alzheimer’s Disease in ApoE*e4Carriers

Reiman E et al, Neuron 2007; 54:713-20.

Page 62: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

LOAD Odds Ratios Associated with rs2373115 GG by APOE*e4 Status

APOE*e4 Group

APOE*e4 OR [95% CI]

rs2373115OR [95%CI]

APOE*e4 - 1.12 [0.82,1.53]

APOE*e4 + 2.88 [1.90,4.36]

All6.07 [4.63-

7.95]1.34 [1.06,1.70]

Reiman et al, Neuron 2007; 54:713-720.

Page 63: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Klein et al, Science 2005; 308:385-389.

P Values of GWA Scan for Age-Related Macular Degeneration

Page 64: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Klein et al, Science 2005; 308:385-389.

Odds Ratios and Population Attributable Risks for AMD

Attribute (SNP) rs380390

(C/G) rs1329428

(C/T)

Risk allele C C

Allelic association χ2 P value 4.1 x 10–8 1.4 x 10–6

Odds ratio (dominant) 4.6 [2.0-11] 4.7 [1.0-22]

Frequency in HapMap CEU 0.70 0.82

Population Attributable Risk

70% [42-84%] 80% [0-96%]

Odds ratio (recessive) 7.4 [2.9-19] 6.2 [2.9-13]

Frequency in HapMap CEU 0.23 0.41

Population Attributable Risk

46% [31-57%]

61% [43-73%]

Page 65: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Risk of Developing AMD by CFH Y402H and Modifiable Risk Factors

Schaumberg DA et al, Arch Ophthalmol 2007; 125:55-62.

Risk Factor

CFH Y402H Genotype

YY YH HH

BMI < 30 kg/m2 1.00 1.95

[1.42-2.67]3.96

[2.69-5.82]

BMI > 30 kg/m2

1.98 [0.91-4.31]

2.19 [1.11-4.30]

12.28 [4.88-30.90]

Non-smoker 1.00 1.95 [1.41-2.71]

4.23 [2.86-6.27]

Current smoker

2.34 [1.20-4.55]

3.20 [1.85-5.55]

8.69 [3.86-19.57]

Page 66: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

TT

CC

CT

Ordovas et al, Circulation 2002; 106:2315-2321.

Interaction: Is LIPC Genotype Related to HDL-C?

TT CC

CT

Page 67: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Inverse Relation between Endotoxin Exposure and Allergic Sensitization

by CD14 Genotype

Simpson A et al, Am J Respir Crit Care Med 2006;174:386-392.

Page 68: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Challenges in Studying Gene-Environment Interactions

Challenge GenesEnvironme

nt

Ease of measure Pretty easy Often hard

Variability over time

Low/none High

Recall bias None Possible

Temporal relation to disease

Easy Hard

Page 69: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies

Larson, G. The Complete Far Side. 2003.

Page 70: Genetics for Epidemiologists Lecture 5: Analysis of Genetic Association Studies