From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009...

Preview:

Citation preview

From Genome-Wide AssociationStudies to Medicine

Florian Schmitzberger - CS 374 – 4/28/2009

Stanford University Biomedical Informatics

schmitzberger@stanford.edu

Topics

1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);

2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008);

3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)

Topics

1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);

2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008);

3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)

Ge

no

me

-wid

e a

ss

oc

iati

on

stu

die

s

Ge

no

me

-wid

e a

ss

oc

iati

on

stu

die

s

Source: Hardy et al. Genomewide Association Studies and Human Disease.N Eng J Med, 360:1759-1768; 17 (2009)

Ge

no

me

-wid

e a

ss

oc

iati

on

stu

die

s

Source: Hardy et al. Genomewide Association Studies and Human Disease.N Eng J Med, 360:1759-1768; 17 (2009)

Human Genome Research Over Time

Source: Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);

Linkage Analysis

Source: genome.wellcome.ac.uk

Human Genome Research Over Time

Information source: Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);

Initial Lessons

1. “Candidate gene” approach inadequate

Initial Lessons

2. Mutations that cause disease often change protein structure

Hemoglobin subunit beta mutation in sickle-cell disease.

Initial Lessons

3. Loci often have many rare disease-causing alleles

Initial Lessons

4. 90% of sites of genetic variation are common variants in the population

Common disease – common variant(CDCV)

Common polymorphisms (minor allele freq > 1%) contributes to susceptibility to disease.

Common disease – common variant (CDCV)

Common polymorphisms (minor allele freq > 1%) contributes to susceptibility to disease.

We can use GWAS to see how common variants contribute to disease.

Gives us ideas on which positions to investigate.

Tag SNPs

Source: The International HapMap ConsortiumThe International HapMap ProjectNature Vol 426 18/25 2003

Tag SNPs

Source: The International HapMap ConsortiumThe International HapMap ProjectNature Vol 426 18/25 2003

Tag SNPs

Source: The International HapMap ConsortiumThe International HapMap ProjectNature Vol 426 18/25 2003

GWAS – General Lessons Learned

1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150

GWAS – General Lessons Learned

1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150

2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)

GWAS – General Lessons Learned

1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150

2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)

3. Power to detect associations has been low

GWAS – General Lessons Learned

1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150

2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)

3. Power to detect associations has been low

4. Association studies have identified regions rather than causal genes

GWAS – General Lessons Learned

1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150

2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)

3. Power to detect associations has been low

4. Association studies have identified regions rather than causal genes

5. A single locus may contain more than one risk variant

GWAS – General Lessons Learned

1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150

2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)

3. Power to detect associations has been low

4. Association studies have identified regions rather than causal genes

5. A single locus may contain more than one risk variant

6. A single locus may contain both common and rare variants

GWAS – General Lessons Learned

1. GWAS work- 2006: ~two dozen reproducible associations- 2008: >150

2. Effect-sizes are modest for common variants(mostly increases by 1.1-1.5)

3. Power to detect associations has been low

4. Association studies have identified regions rather than causal genes

5. A single locus may contain more than one risk variant

6. A single locus may contain both common and rare variants

7. There is great variation between ethnic groups

Sample size required

For P < 10−8. Source: Altshuler et al.

Sample size required

For P < 10−8. Source: Altshuler et al.

GWAS – Common Diseases:Lessons Learned

1. The risk for loci already identified by GWAS is currently underestimated due to currently unknown mutations.

GWAS – Common Diseases:Lessons Learned

1. The risk for loci already identified by GWAS is currently underestimated due to currently unknown mutations.

2. Many more disease loci remain to be found.(low statistical power with studies so far)

GWAS – Common Diseases:Lessons Learned

1. The risk for loci already identified by GWAS is currently underestimated due to currently unknown mutations.

2. Many more disease loci remain to be found.(low statistical power with studies so far)

3. Some loci will only contain rare variants(won’t be found using common polymorphisms)

Disease Risk VS Disease Mechanism

Primary value of genetic mapping is not risk prediction but gaining knowledge about mechanisms of disease.

GWAS: The Path Ahead

1. Increased sample sizes:1000 cases,1000 controls, 20% variant, 1.3 increase in risk 1% power5000 cases, 5000 controls 98% power

GWAS: The Path Ahead

1. Increased sample sizes:1000 cases,1000 controls, 20% variant, 1.3 increase in risk 1% power5000 cases, 5000 controls 98% power

2. Different ancestry groups

GWAS: The Path Ahead

1. Increased sample sizes:1000 cases,1000 controls, 20% variant, 1.3 increase in risk 1% power5000 cases, 5000 controls 98% power

2. Different ancestry groups

3. Find rare mutations in suspect loci

1000 genomes project

Topics

1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);

2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008);

3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)

C-Reactive Protein (CRP)

• Elevated levels of CRP lead to increased riskof ischemic heart disease and cerebrovasculardisease

• Studies of >40,000 people with ~4,000 with diseaseFollowed for 12-15 years

Measured levels of CRP Genotyping for four CRP polymorphisms

Results

Increased CRP levels

Results

Increased CRP levels

Zach

o et al.

Results

Increased CRP levels

Increased CRP levels lead to increaseddisease risk

Zacho et al.

Increased CRP levels lead to increaseddisease risk

Increased CRP levels lead to increaseddisease risk

Increased CRP levels lead to increaseddisease risk

Results

Increased CRP levels

?

Zacho et al.

Results

Increased CRP levels

Zacho et al.

Possible issues with this study

• CRP polymorphisms could lead to higher plasma levels of less active CRP (unlikely, polymorphisms not near coding region)

• Limitations of the four individual studies

• Variability with race (only white participants studied)

• Potential lack of statistical power

Conclusion

• Genetic variants that lead to increased CRP levels do not lead to an increased risk of heart-disease (and cerebrovascular disease)

Increased CRP levels are likely to be a marker rather than cause for disease.

Topics

1. Altshuler et al. Genetic Mapping in Human Disease. Science 322, 881 (2008);

2. Zacho et al. Genetically Elevated C-Reactive Protein and Ischemic Vascular Disease. N Engl J Med 359, 18 (2008);

3. Jakobsdottir et al. Interpretation of Genetic Association Studies: Markers with Replicated Highly Significant Odds Ratios May Be Poor Classifiers. PLoS Genetics 5, 2 (2009)

Statistical methods to evaluate markers ingenetic testing

1. ROC (receiver operating characteristic) curves

2. Logistic regression

Genetic testing for the public

Sources:23andme.comdecodeme.comnavigenics.com

Classification based statistics

Evaluates how well one can distinguish between cases and controls.

Disease

YES NOD

iag

nost

ic T

est

Negati

ve

Posi

tive

Disease

YES NOD

iag

nost

ic T

est

Negati

ve

Posi

tive

Disease

YES NOD

iag

nost

ic T

est

Negati

ve

Posi

tive

Disease

YES NOD

iag

nost

ic T

est

Negati

ve

Posi

tive

Disease

YES NOD

iag

nost

ic T

est

Negati

ve

Posi

tive

Disease

YES NO

Dia

gnost

ic T

est

Negati

ve

Posi

tive

Sensitivity = TP

TP + FN

Disease

YES NO

Dia

gnost

ic T

est

Negati

ve

Posi

tive

Sensitivity = TP

TP + FN

With this test, how many people that are actually illwill I catch?

Disease

YES NO

Dia

gnost

ic T

est

Negati

ve

Posi

tive

Sensitivity = TP

TP + FN

Specificity = TN

TN + FP

Disease

YES NO

Dia

gnost

ic T

est

Negati

ve

Posi

tive

Sensitivity = TP

TP + FN

Specificity = TN

TN + FP

With this test, will I telltoo many people they might be ill?

ROC curves

Source: medcalc.be

Important measure: area under the curve (AUC)

Odds Ratios (risk analysis)

The odds of an event occurring in one group

The odds of an event occurring in the control group

Odds Ratios (risk analysis)

The odds of an event occurring in one group

The odds of an event occurring in the control group

event less likely in first group < 1 < event more likely in first group

equal likelihood

Take-home message

OR

“Strong association (low p-value) does not guaranteeeffective discrimination between cases and controls(classification). Excellent classification (high AUC) doesnot guarantee good prediction of actual risk”

- Jakobsdottir et al.

Source: newscientist.com

Recommended