Upload
lynne-montgomery
View
217
Download
3
Tags:
Embed Size (px)
Citation preview
From Genome-Wide AssociationStudies to Medicine
Florian Schmitzberger - CS 374 – 4/28/2009
Stanford University Biomedical Informatics
Topics
1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);
2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008);
3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)
Topics
1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);
2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008);
3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)
Ge
no
me
-wid
e a
ss
oc
iati
on
stu
die
s
Ge
no
me
-wid
e a
ss
oc
iati
on
stu
die
s
Source: Hardy et al. Genomewide Association Studies and Human Disease.N Eng J Med, 360:1759-1768; 17 (2009)
Ge
no
me
-wid
e a
ss
oc
iati
on
stu
die
s
Source: Hardy et al. Genomewide Association Studies and Human Disease.N Eng J Med, 360:1759-1768; 17 (2009)
Human Genome Research Over Time
Source: Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);
Linkage Analysis
Source: genome.wellcome.ac.uk
Human Genome Research Over Time
Information source: Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);
Initial Lessons
1. “Candidate gene” approach inadequate
Initial Lessons
2. Mutations that cause disease often change protein structure
Hemoglobin subunit beta mutation in sickle-cell disease.
Initial Lessons
3. Loci often have many rare disease-causing alleles
Initial Lessons
4. 90% of sites of genetic variation are common variants in the population
Common disease – common variant(CDCV)
Common polymorphisms (minor allele freq > 1%) contributes to susceptibility to disease.
Common disease – common variant (CDCV)
Common polymorphisms (minor allele freq > 1%) contributes to susceptibility to disease.
We can use GWAS to see how common variants contribute to disease.
Gives us ideas on which positions to investigate.
Tag SNPs
Source: The International HapMap ConsortiumThe International HapMap ProjectNature Vol 426 18/25 2003
Tag SNPs
Source: The International HapMap ConsortiumThe International HapMap ProjectNature Vol 426 18/25 2003
Tag SNPs
Source: The International HapMap ConsortiumThe International HapMap ProjectNature Vol 426 18/25 2003
GWAS – General Lessons Learned
1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150
GWAS – General Lessons Learned
1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150
2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)
GWAS – General Lessons Learned
1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150
2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)
3. Power to detect associations has been low
GWAS – General Lessons Learned
1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150
2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)
3. Power to detect associations has been low
4. Association studies have identified regions rather than causal genes
GWAS – General Lessons Learned
1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150
2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)
3. Power to detect associations has been low
4. Association studies have identified regions rather than causal genes
5. A single locus may contain more than one risk variant
GWAS – General Lessons Learned
1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150
2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)
3. Power to detect associations has been low
4. Association studies have identified regions rather than causal genes
5. A single locus may contain more than one risk variant
6. A single locus may contain both common and rare variants
GWAS – General Lessons Learned
1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150
2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)
3. Power to detect associations has been low
4. Association studies have identified regions rather than causal genes
5. A single locus may contain more than one risk variant
6. A single locus may contain both common and rare variants
7. There is great variation between ethnic groups
Sample size required
For P < 10−8. Source: Altshuler et al.
Sample size required
For P < 10−8. Source: Altshuler et al.
GWAS – Common Diseases:Lessons Learned
1. The risk for loci already identified by GWAS is currently underestimated due to currently unknown mutations.
GWAS – Common Diseases:Lessons Learned
1. The risk for loci already identified by GWAS is currently underestimated due to currently unknown mutations.
2. Many more disease loci remain to be found.(low statistical power with studies so far)
GWAS – Common Diseases:Lessons Learned
1. The risk for loci already identified by GWAS is currently underestimated due to currently unknown mutations.
2. Many more disease loci remain to be found.(low statistical power with studies so far)
3. Some loci will only contain rare variants(won’t be found using common polymorphisms)
Disease Risk VS Disease Mechanism
Primary value of genetic mapping is not risk prediction but gaining knowledge about mechanisms of disease.
GWAS: The Path Ahead
1. Increased sample sizes:1000 cases,1000 controls, 20% variant, 1.3 increase in risk 1% power5000 cases, 5000 controls 98% power
GWAS: The Path Ahead
1. Increased sample sizes:1000 cases,1000 controls, 20% variant, 1.3 increase in risk 1% power5000 cases, 5000 controls 98% power
2. Different ancestry groups
GWAS: The Path Ahead
1. Increased sample sizes:1000 cases,1000 controls, 20% variant, 1.3 increase in risk 1% power5000 cases, 5000 controls 98% power
2. Different ancestry groups
3. Find rare mutations in suspect loci
1000 genomes project
Topics
1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);
2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008);
3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)
C-Reactive Protein (CRP)
• Elevated levels of CRP lead to increased riskof ischemic heart disease and cerebrovasculardisease
• Studies of >40,000 people with ~4,000 with diseaseFollowed for 12-15 years
Measured levels of CRP Genotyping for four CRP polymorphisms
Results
Increased CRP levels
Results
Increased CRP levels
Zach
o et al.
Results
Increased CRP levels
Increased CRP levels lead to increaseddisease risk
Zacho et al.
Increased CRP levels lead to increaseddisease risk
Increased CRP levels lead to increaseddisease risk
Increased CRP levels lead to increaseddisease risk
Results
Increased CRP levels
?
Zacho et al.
Results
Increased CRP levels
Zacho et al.
Possible issues with this study
• CRP polymorphisms could lead to higher plasma levels of less active CRP (unlikely, polymorphisms not near coding region)
• Limitations of the four individual studies
• Variability with race (only white participants studied)
• Potential lack of statistical power
Conclusion
• Genetic variants that lead to increased CRP levels do not lead to an increased risk of heart-disease (and cerebrovascular disease)
Increased CRP levels are likely to be a marker rather than cause for disease.
Topics
1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);
2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008);
3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)
Statistical methods to evaluate markers ingenetic testing
1. ROC (receiver operating characteristic) curves
2. Logistic regression
Genetic testing for the public
Sources:23andme.comdecodeme.comnavigenics.com
Classification based statistics
Evaluates how well one can distinguish between cases and controls.
Disease
YES NOD
iag
nost
ic T
est
Negati
ve
Posi
tive
Disease
YES NOD
iag
nost
ic T
est
Negati
ve
Posi
tive
Disease
YES NOD
iag
nost
ic T
est
Negati
ve
Posi
tive
Disease
YES NOD
iag
nost
ic T
est
Negati
ve
Posi
tive
Disease
YES NOD
iag
nost
ic T
est
Negati
ve
Posi
tive
Disease
YES NO
Dia
gnost
ic T
est
Negati
ve
Posi
tive
Sensitivity = TP
TP + FN
Disease
YES NO
Dia
gnost
ic T
est
Negati
ve
Posi
tive
Sensitivity = TP
TP + FN
With this test, how many people that are actually illwill I catch?
Disease
YES NO
Dia
gnost
ic T
est
Negati
ve
Posi
tive
Sensitivity = TP
TP + FN
Specificity = TN
TN + FP
Disease
YES NO
Dia
gnost
ic T
est
Negati
ve
Posi
tive
Sensitivity = TP
TP + FN
Specificity = TN
TN + FP
With this test, will I telltoo many people they might be ill?
ROC curves
Source: medcalc.be
Important measure: area under the curve (AUC)
Odds Ratios (risk analysis)
The odds of an event occurring in one group
The odds of an event occurring in the control group
Odds Ratios (risk analysis)
The odds of an event occurring in one group
The odds of an event occurring in the control group
event less likely in first group < 1 < event more likely in first group
equal likelihood
Take-home message
OR
“Strong association (low p-value) does not guaranteeeffective discrimination between cases and controls(classification). Excellent classification (high AUC) doesnot guarantee good prediction of actual risk”
- Jakobsdottir et al.
Source: newscientist.com