64
Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Embed Size (px)

Citation preview

Page 1: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Genome-Wide Association Studies

Ryan Irvin, Ph.D.Assistant Professor

Department of EpidemiologySchool of Public Health

The University of Alabama at Birmingham

Page 2: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Outline of Presentation

• Genome Wide Association (GWA) Studies– Definition and description– Rapid evolution of methods / discovery

• Steps for conducting a GWAS• Examples

– Future directions

Page 3: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

What is a Genome-Wide Association Study?

• A method to rapidly scan variable markers across the entire human genome to test for association with disease– Can ultimately help identify genetic loci that can lead

to new treatments and prevent disease– Particularly useful in complex, common diseases

• Made possible through rapid advancements in genomic techniques– Completion of Human Genome Project (2003)– Completion of HapMap Project (2005)

Page 4: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

GWAS focuses on genetic variation in the form of SNPs

Of about 3.3 billion base pairs it is estimated there are about 10 million common SNPS

Page 5: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Association studies

DiseaseCases

No-DiseaseControl

Allele 1 Allele 2

Marker A is associated with Phenotype

Marker A:

Allele 1 =

Allele 2 =

Page 6: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

GWAS Facts

• GWAS rely on “common disease, common variant” hypothesis

• Basic idea: Common variants (SNPs) cause common disease

• Common means ≥5% of the population has the variant of interest

Page 7: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

GWAS Facts

• Takes advantage of linkage disequilibrium–Depending on the array, 300,000 to 5

million SNPs represent most of the 10 million common SNPs

– Identified risk SNPs often are "tagging along" with the actual causal variants

Page 8: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham
Page 9: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Tag SNPs Define Common Haplotypes

Page 10: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed.

Phase III Hapmap, 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs).

The goal of the International HapMap Project is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared

Page 11: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Hunter DJ and Kraft P, N Engl J Med 2007; 357:436-439.

“There have been few, if any, similar bursts of discovery in the history of medical research…”

Page 12: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

NHGRI Catalog of GWA Studies: http://www.genome.gov/gwastudies/

Page 13: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham
Page 14: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

dbGaP

• Database of genotypes and phenotypes: makes GWAS publically available data

• The National Center for Biotechnology Information has created the dbGaP public repository for individual-level phenotype, exposure, genotype and sequence data.

• dbGaP assigns unique identifiers to studies and subsets of information from those studies, including documents, individual phenotypic variables, tables of trait data, sets of genotype data, computed phenotype-genotype associations.

http://www.ncbi.nlm.nih.gov/gap

Page 15: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Unique Aspects of GWA Studies

• Permit examination of inherited genetic variability at unprecedented level of resolution

• Permit "agnostic" genome-wide evaluation • Once genome measured, can be related to

any trait• Most robust associations in GWA studies

have not been with genes previously suspected of association with the disease

• Some associations in regions not even known to harbor genes

“The chief strength of the new approach also contains its chief problem: with more than 500,000 comparisons per study, the potential for false positive results is unprecedented.”

Page 16: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

How is a GWA Study Conducted?

Page 17: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Step 1: Select your Genotyping Platform:

• Rapid advances in GWAS platforms• Significant content improvementsDecreasing costs

• Increasing coverage of multiple populations• Some drop in coverage when populations beyond HapMap are genotyped

DeBakker et al Nat. Genet. 38: 1298, 2006

Conrad et al Nat. Genet. 38: 1251, 2006

Page 18: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham
Page 19: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Step 2: Design your Sampling / Recruitment / Phenotyping Strategy

• Patient Group of Interest, how do you collect information about disease– Case Control (Ex. Cancer v. None)– Quantitative Trait (Ex. BMI)

• Sample families or unrelated individuals?• Big sample sizes are needed to find

significant SNP effects • Plan for replication

Page 20: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Step 3: Which SNPs to Analyze

•Genotyping Quality Control

•Exclude SNPs from analysis based on•HWE•Missing Rate•Allele frequency

Page 21: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Step 4: Analyze the Massive Dataset

• Analysis method depends:– Phenotypic definition

• (case or control) or quantitative

– Whether families or individuals were sampled• Deal with multiple testing

– Bonferroni correction– False discovery rate

Page 22: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

GWAS Facts• We routinely compare >1 million SNPs

to a trait and just by chance many SNPs are significant at the P < 0.05 or P< 0.001–Stringent significance threshold must

be used (P < 1 * 10-8 for a 1 million marker genotyping platform)

–Large samples sizes are needed to meet stringent significance thresholds

Page 23: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Association for GWASConfounding

Page 24: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Population StratificationPopulation stratification: differences in allele

frequency between cases and controls due to diversity in populations of origin which is unrelated to disease

•Requires: 1) population differences in disease prevalence2) population differences in allele frequencies

Eg. if a disease is more prevalent in one subpopulation any alleles in high frequency in that subpopulation may be spuriously associated with the trait of interest

Cardon LR and Palmer LJ, Lancet2003; 361:598-604.

Page 25: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

The figure shows the results obtained when European, Nigerian and East Asian samples from HapMap and 100 African-American samples are clustered using principal component analysis based on data from ~600,000 genetic markers.

PC1

PC

2Nature Genetics 38, 904 - 909 (2006)

Page 26: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Association for GWAS Case/Control DesignLogistic Regression

Page 27: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Association for GWAS Quantitative TraitsLinear Regression

Let Yi be the Quantitative Trait Value

Yi =

Test whether b1 differs from 0Can use PLINK--Linear

Page 28: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

GWAS Scan Results

Mannuccio et al. Journal of Tehran University Heart Center. 2010;5(3): 116-121

Page 29: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Bierut LJ et al, Hum Molec Genet 2007; 16:24-35.

Displaying ResultsEx. Nicotine Dependence among Smokers

Page 30: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

http://www.broad.mit.edu/diabetes/scandinavs/type2.html

Manhattan Plot3*10 -7

Page 31: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Solutions to MultipleTesting Issue

• Bonferroni correction (type 1 error rate = a/ m)– m=number of independent snps (aka # of tests)– Assume all tests performed are independent– Threshold often considered appropriate: 5x10-8

• Often used, but is an extremely strict criterion that suffers inflated Type II error (reduces the number of true discoveries)

• For discovery efforts (such as in GWAS) one may be more concerned about the rate of false positives among all rejected null hypotheses rather than the probability of making one or more Type I errors

Benjamini, Y. and Hochberg, Y. J. R. Stat. Soc. B 1995. 57: 289/300Storey, JD. The Annals of Statistics 2003, Vol. 31, No. 6, 2013–2035

Page 32: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Solutions to Multiple Testing Issue

• False Discovery Rate - control the proportion of significant results that are in fact type I errors (‘false discoveries’) rather than controlling the chance of making even a single type I error

• q-values are adjusted p-values found using the FDR approach

• For instance, a p-value of 0.05 implies that 5% of all tests will result in false positives

• Likewise, an FDR adjusted p-value (or q-value) of 0.05 implies that 5% of significant tests will result in false positives

• The latter is a much smaller quantityBenjamini, Y. and Hochberg, Y. J. R. Stat. Soc. B 1995. 57: 289/300Storey, JD. The Annals of Statistics 2003, Vol. 31, No. 6, 2013–2035

Page 33: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Replication, Replication, Replication

Replication study: • Similar population, similarly measured

phenotype• Same genetic model, same SNP or

proxy, same direction• Adequately powered to detect

postulated effect

Chanock S, Manolio T, et al, Nature 2007; 447:655-660.

Page 34: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Multi-Stage Designs for GWAS

Hirschhorn & Daly Nat. Genet. Rev. 6: 95, 2005

$$$$$$$$$$$$$$$

$$$

$

Page 35: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Genome-wide association of early-onset myocardial infarction with single nucleotide polymorphisms and copy number variantsMyocardial Infarction Genetics Consortium, Kathiresan S, Voight BF, Purcell S, Musunuru K, Ardissino D, Mannucci PM, Anand S, Engert JC, Samani NJ, Schunkert H, Erdmann J, Reilly MP, Rader DJ, Morgan T, Spertus JA, Stoll M, Girelli D, McKeown PP, Patterson CC, Siscovick DS, O'Donnell CJ, Elosua R, Peltonen L, Salomaa V, Schwartz SM, Melander O, Altshuler D, Ardissino D, Merlini PA, Berzuini C, Bernardinelli L, Peyvandi F, Tubaro M, Celli P, Ferrario M, Fetiveau R, Marziliano N, Casari G, Galli M, Ribichini F, Rossi M, Bernardi F, Zonzin P, Piazza A, Mannucci PM, Schwartz SM, Siscovick DS, Yee J, Friedlander Y, Elosua R, Marrugat J, Lucas G, Subirana I, Sala J, Ramos R, Kathiresan S, Meigs JB, Williams G, Nathan DM, MacRae CA, O'Donnell CJ, Salomaa V, Havulinna AS, Peltonen L, Melander O, Berglund G, Voight BF, Kathiresan S, Hirschhorn JN, Asselta R, Duga S, Spreafico M, Musunuru K, Daly MJ, Purcell S, Voight BF, Purcell S, Nemesh J, Korn JM, McCarroll SA, Schwartz SM, Yee J, Kathiresan S, Lucas G, Subirana I, Elosua R, Surti A, Guiducci C, Gianniny L, Mirel D, Parkin M, Burtt N, Gabriel SB, Samani NJ, Thompson JR, Braund PS, Wright BJ, Balmforth AJ, Ball SG, Hall AS;

Wellcome Trust Case Control Consortium, Schunkert H, Erdmann J, Linsel-Nitschke P, Lieb W, Ziegler A, König I, Hengstenberg C, Fischer M, Stark K, Grosshennig A, Preuss M, Wichmann HE, Schreiber S, Schunkert H, Samani NJ, Erdmann J, Ouwehand W, Hengstenberg C, Deloukas P, Scholz M, Cambien F, Reilly MP, Li M, Chen Z, Wilensky R, Matthai W, Qasim A, Hakonarson HH, Devaney J, Burnett MS, Pichard AD, Kent KM, Satler L, Lindsay JM, Waksman R, Epstein SE, Rader DJ, Scheffold T, Berger K, Stoll M, Huge A, Girelli D, Martinelli N, Olivieri O, Corrocher R, Morgan T, Spertus JA, McKeown P, Patterson CC, Schunkert H, Erdmann E, Linsel-Nitschke P, Lieb W, Ziegler A, König IR, Hengstenberg C, Fischer M, Stark K, Grosshennig A, Preuss M, Wichmann HE, Schreiber S, Hólm H, Thorleifsson G, Thorsteinsdottir U, Stefansson K, Engert JC, Do R, Xie C, Anand S, Kathiresan S, Ardissino D, Mannucci PM, Siscovick D, O'Donnell CJ, Samani NJ, Melander O, Elosua R, Peltonen L, Salomaa V, Schwartz SM, Altshuler D.

Nat Genet. 2009 Mar;41(3):334-41.

Page 36: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

MI-GEN GWAS Study Design

Myocardial Infarction Genetics Consortium. Nat Genet. 2009 Mar;41(3):334-41.

Page 37: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham
Page 38: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Myocardial Infarction Genetics Consortium. Nat Genet. 2009 Mar;41(3):334-41.

The MI-associated SNP at 9p21.3 (rs4977574) was strongly associated with mRNA level of cyclin-dependent kinase inhibitor 2B (CDKN2B),

Page 39: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Coronary Artery Disease (CAD)

• CAD is the most common cause of death in industrialized countries, and its prevalence is rapidly increasing in developing countries.

• CAD has a complex and heterogeneous etiology involving numerous environmental and genetic factors of disease risk

Page 40: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Review of CAD GWAS• In 2007, three landmark GWAS successfully and simultaneously reported

the identification of variants on chromosome 9p21.3 associated with risk for MI and CAD.

• The SNPs in the region were in strong LD and defined a haplotype associated with a 15%–20% increased risk in heterozygous individuals and a 30%–40% increased risk in homozygous individuals .

• Since then, several studies have confirmed the role of the chromosome 9p21.3 region in the risk of CAD and have postulated, on the basis of fairly persuasive data, associations of this region with other disease phenotypes, including type 2 diabetes, ischemic stroke, aortic aneurysm , glioma , malignant melanoma , and aggressive periodontitis.

Zeller et al, Clinical Chemistry, 58:1 2012

Page 41: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Zeller et al, Clinical Chemistry, 58:1 2012

Page 42: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

9p21 Function Summary

• Multiple cell culture and animal model studies have investigated the function of genes located in this region in CAD (summarized by Zeller et al).

• Overall, despite major progress, elucidation of the 9p21 susceptibility mechanisms remain unclear

• Further in vitro and in vivo studies are required to fully establish the involvement of the CDKN2A, CDKN2B, CDKN2B-AS (ANRIL), and MTAP genes in the pathologic process and to translate the GWAS signal into a biological meaning

Zeller et al, Clinical Chemistry, 58:1 2012

Page 43: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Hypertension

• Hypertension is the most widespread cardiovascular disease, with a global burden estimated at ~30% of the population.

• All genetic loci discovered thus far explain only about 2% of the overall variation in blood pressure

Zeller et al, Clinical Chemistry, 58:1 2011

Page 44: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Hypertension GWAS• The identification of genetic loci for hypertension through GWASs has

proved difficult• The first GWAS yielded no significant loci.

– Single blood pressure measurements in population-based cohorts are often unreliable

– many individuals receive medications• Several larges cale GWASs and meta-analyses have recently

identified broadly replicated risk loci with genome-wide significance • The odds ratios associated with risk loci are very modest and do not

exceed 1.05. • With some authors arguing that the genetic contribution to developing

hypertension might be as large as 50%, it is evident that there is still a huge amount of missing heritability for this important disease!

Zeller et al, Clinical Chemistry, 58:1 2011

Page 45: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Hyperlipidemia

Several aspects of GWAS for blood lipids are particularly noteworthy.

1. A substantial number of the genes well known to cause rare Mendelian lipid disorders, such as LDLR3 (low density lipoprotein receptor), APOB (apolipoprotein B), and PCSK9 (proprotein convertase subtilisin/kexin type 9), have been rediscovered through GWAS.

2. It is intriguing that a large number of GWAS loci for hyperlipidemia are also significantly associated with CAD risk, proving the unequivocal link between blood lipid concentration and CAD risk.

3. Many GWAS loci for blood lipids have been replicated!!! Are, thus, being functionally elucidated, providing good evidence for how GWAS finding can inspire functional experiments.

Zeller et al, Clinical Chemistry, 58:1 2012

Page 46: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

The 1p13 Sortilin Story

• The 1p13.3 region has been replicated in GWAS of CAD and LDL cholesterol.

• Most studies have identified rs599839 as the lead SNP with the strongest correlation with CAD and LDL-C.

• In European populations, the less frequent alleles have been associated with decreased LDL-C concentrations and a decreased risk of CAD.

Zeller et al, Clinical Chemistry, 58:1 2012

Page 47: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

The 1p13 Sortilin Story• Region encompasses 3 tightly linked genes, namely CELSR2 [cadherin,

EGF LAG seven-pass G-type receptor 2 (flamingo homolog, Drosophila)], PSRC1 (proline/serine-rich coiled-coil 1), and SORT1 (sortilin 1).

• In human blood cells, Linsel-Nitschke et al. observed a significant association between rs599839 and levels of SORT1 expression but detected no association with CELSR2 or PSRC1.

• However, using the largest population-based human monocyte biobank available, Zeller et al. could not identify the cis association between SORT1 mRNA levels and the 1p13.3 locus, whereas they did prove a strong association between PSRC1 mRNA levels and the less frequent rs629301 allele (a tag of rs599839) .

• Despite these discrepancies, recent investigations have focused on the SORT1 gene to provide functional characterization of the association signals.

Zeller et al, Clinical Chemistry, 58:1 2012

Page 48: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

The 1p13 Sortilin Story

• Sortilin 1 is a member of a VPS10P-domain receptor family, and binds the LDL receptor–associated protein (RAP), the lipolytic enzyme lipoprotein lipase, and apolipoprotein A-V.

Zeller et al, Clinical Chemistry, 58:1 2012

Page 49: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

The 1p13 Sortilin Story

Zeller et al, Clinical Chemistry, 58:1 2012

Page 50: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Applications of Genetic Studies in CVD

O’Donnell, N Engl J Med. 2011 Dec 1;365(22):2098-109

Page 51: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Hypothesis Driven Research/ Functional Studies

• Does an identified genome variant have effect on protein expression or action?

• Disrupt the gene or variant of interest in a cell or animal model– Can help understand underlying pathology

• Exciting new opportunities with induced pluripotent stem sells (iPSC)– iPSC differentiated to cardiomyoctyes are being used to

investigated the effect of mutations on electrophysiological properties with application to the long-QT syndrome

• Understanding of functional impact is needed for applications in disease prediction and personalized medicine

O’Donnell, N Engl J Med. 2011 Dec 1;365(22):2098-109

Page 52: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Disease Prediction: Not there yet

• Recent efforts have evaluated the risk predicting ability of validated lipid-modulating SNPs and SNPs associated with type 2 diabetes, hypertension, CAD, and CVD.

• In most analyses, assessment of genetic risk scores did not offer any substantial improvement in risk discrimination with c statistics or net reclassification.

• Thus, the clinical usefulness of genetic risk scores based on variants identified by GWASs currently remains unclear, and the use of genetic testing to improve risk stratification is clearly too premature and remains to be defined.

Zeller et al, Clinical Chemistry, 58:1 2011

Page 53: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Highs and Lows from CVD GWAS

• Many loci show association with CVD across groups of differing ancestry

• Most cardiovascular traits are influenced by a large number of loci

• Translation of these findings into clinical practice largely unrealized – Small effect size of isolated variants– Rarely identify the causal variant

• Type 1 error burden“The chief strength of the new approach also contains its chief problem: with more than 500,000 comparisons per study, the potential for false positive results is unprecedented.” (Hunter and Kraft, 2007)

Page 54: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Where’s the Heritability

• CDCV hypothesis: posits common variants present in ≥5% of the population contribute to common diseases (hypothesis tested by GWAS)

• GWAS generally do not capture rare and low frequency variants present in <5% of the population

• CDRV hypothesis: diseases due to multiple rare variants with intermediate penetrances

Page 55: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Potential of Rare Variants in CVD

• Targeted sequencing of the human genome such as genes or exonsEx. FHS- 1 in 64 persons carries a functional

mutation in renal salt handling genes (NCCT, NKCC2, or ROMK) associated with clinically significant alterations in BP

Ji W. et al. Nat Genet 2008;40:592-9

Page 56: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham
Page 57: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Next Generation Sequencing (NGS)

• Can simultaneously sequence multiple areas of the genome rapidly and at low cost

• Can directly observe the causal variant!• Whole-exome sequencing: sequencing of the

protein coding regions, or exons, of an entire genome

• Whole-genome sequencing: sequencing of the entire human genome

• Still prohibitively expensive for clinical and observational studies

• Capacity for bioinformatic annotation, data storage and analysis lag behind sequencing capabilities

O’Donnell, N Engl J Med. 2011 Dec 1;365(22):2098-109

Page 58: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

1000 genomes projecthttp://www.1000genomes.org/about

So far, has sequenced the genomes of 1,092 people from 14 populations in Europe, East Asia, sub-Saharan Africa and the Americas. Ultimately, they will study more than 2,500 individuals from 26 populations

Page 59: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Exome Chip

Page 60: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

NHLBI Grand Opportunity Exome Sequencing Project (ESP)

• The goal of the NHLBI GO Exome Sequencing Project (ESP) is to discover novel genes and mechanisms contributing to heart, lung and blood disorders by pioneering the application of next-generation sequencing of the protein coding regions of the human genome across diverse, richly-phenotyped populations and to share these datasets and findings with the scientific community to extend and enrich the diagnosis, management and treatment of heart, lung and blood disorders. The groups participating and collaborating in the NHLBI GO ESP include: Seattle GO - University of Washington, Seattle, WA– BroadGO - Broad Institute of MIT and Harvard, Cambridge, MA– WHISP - Ohio State University Medical Center, Columbus, OH– Lung GO - University of Washington, Seattle, WA– WashU GO - Washington University, St. Louis, MO– Heart GO - University of Virginia Health System, Charlottesville, VA– ChargeS GO - University of Texas Health Sciences Center at Houston

https://esp.gs.washington.edu/drupal/

Page 61: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Highs and Lows from GWAS

• Many loci show association with complex traits across groups of differing ancestry

• Most complex traits are influenced by a large number of loci

• Translation of these findings into clinical practice largely unrealized – Small effect size of isolated variants– Rarely identify the causal variant

• Type 1 error burden“The chief strength of the new approach also contains its chief problem: with more than 500,000 comparisons per study, the potential for false positive results is unprecedented.” (Hunter and Kraft, 2007)

Page 62: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Summary: Utility and Challenges of GWA Studies for Gene Discovery

• Utility– Hypothesis-generating

• Agnostic with respect to biological knowledge• Most discoveries to date have been genes with no known or

suspected association with disease– Unprecedented ability to capture inherited genetic

variability at excellent resolution• Challenges

– Potential for false-positive results is high• Replication requires collaboration across studies and similar

population, similar phenotype, and same genetic effect– Identification of causal SNP

Page 63: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Translation of GWA Studies to Discovery of Genes in Complex Disease – the Future

– Overall limited due to small effect sizes of identified snps, lack of replication

– Genotyping “different” kinds of variants• Copy number variants• Medical resequencing• Epigenetic markers

– Coping with the data tsunami• Utilize bioinformatics / high performance computing

capacity to integrate findings from association studies and gene-expression studies

Page 64: Genome-Wide Association Studies Ryan Irvin, Ph.D. Assistant Professor Department of Epidemiology School of Public Health The University of Alabama at Birmingham

Questions?