37
Molecular and Genetic Epidemiology Kathryn Penney, ScD January 5, 2012

Molecular and Genetic Epidemiology

  • Upload
    vivian

  • View
    61

  • Download
    2

Embed Size (px)

DESCRIPTION

Molecular and Genetic Epidemiology. Kathryn Penney, ScD January 5, 2012. Definitions. Genetic Epidemiology ‘a science which deals with the etiology, distribution, and control of disease in groups of relatives and with inherited causes of disease in populations’ - Morton, 1982 - PowerPoint PPT Presentation

Citation preview

Page 1: Molecular and Genetic Epidemiology

Molecular and Genetic Epidemiology

Kathryn Penney, ScDJanuary 5, 2012

Page 2: Molecular and Genetic Epidemiology

Definitions Genetic Epidemiology

‘a science which deals with the etiology, distribution, and control of disease in groups of relatives and with inherited causes of disease in populations’ - Morton, 1982

Molecular Epidemiology (www.aacr.org) seeks to identify human (cancer) risk and (carcinogenic)

mechanisms to improve (cancer) prevention strategies is multi-disciplinary and translational, going from the

bench to the field and back uses biomarkers and state-of-art technologies to gain

mechanistic information from epidemiological studies

Page 3: Molecular and Genetic Epidemiology

Genetic and Molecular Epidemiology

Genetic variation

Disease

Disease

Exposure

Biological Factors/ Mechanism

Association?

Association?

Association?

Page 4: Molecular and Genetic Epidemiology

Genetic Studies

Page 5: Molecular and Genetic Epidemiology

Twin studies Determine if a disease has a genetic component Estimate the genetic contribution to disease

(heritability) Genetics (heritable component) Shared environment Unique environment

Twins Monozygotic (MZ) share 100% of their genes Dyzygotic (DZ) share ~50% of their genes

Use correlation of trait/disease RMZ = genetics + shared environment RDZ = ½ genetics + shared environment Genetics = 2 x (RMZ – RDZ)

Page 6: Molecular and Genetic Epidemiology

Heritability

Lichtenstein et al, 2000

Page 7: Molecular and Genetic Epidemiology

Association studies Family based

Parent-child trios, siblings Population based

Case-control Types of studies

Candidate gene/SNPs Genome-wide association study (GWAS)

Single nucleotide polymorphisms (SNPs) vs. mutations/rare variants Germline variation SNPs > 1% population frequency

A/A

A/C

A/C

cases

controls

Page 8: Molecular and Genetic Epidemiology

Samples Blood

DNA, RNA, biomarkers (dietary, hormones)

Tissue Tumor and normal DNA, RNA, proteins

Page 9: Molecular and Genetic Epidemiology

Candidate genes Select a gene of interest Select SNPs to genotype

Literature tagSNPs

Haplotype tagSNPs

C G A A C GC G A A C GC G A C C GC T A C C AC T A C C A

G/T A/C G/AC G A A C GC G A A C GC G A C C GC T A C C AC T A C C A

G/T A/C G/A12345

Page 10: Molecular and Genetic Epidemiology

Candidate genes The International HapMap Project

Catalog of common genetic variants Describes what these variants are, where they

occur, and how they are distributed among people within populations and among populations

Page 11: Molecular and Genetic Epidemiology

www.hapmap.org Haploview – visualize correlations between SNPs in

HapMap or study data Tagger – method to select tagSNPs in HapMap or

study data

Candidate genes

Page 12: Molecular and Genetic Epidemiology

Are the SNPs associated with outcome?

Are the SNPs associated with intermediate phenotypes/biomarkers/tumor markers?

Candidate genes

Page 13: Molecular and Genetic Epidemiology

Genotyping technology Taqman

PCR-based fluorescent assay Single SNP assay

Sequenom PCR-based single-base extension MALDI-TOF (Matrix-Assisted Laser

Desorption/Ionization – Time Of Flight) Multi-plex (≤36-40 SNPs) assay

Page 14: Molecular and Genetic Epidemiology

Genome-wide Association Study (GWAS) Estimated 10 million SNPs in the genome

Genotype 350k – 1 million SNPs across entire genome

Test association of each SNP with outcome

Adjust for the number of tests performed p < 5x10-8 considered “genome-wide” significant

Replicate findings in a different population Same SNP, same direction, approximate same magnitude of

effect

Page 15: Molecular and Genetic Epidemiology

GWAS results

Amundadottir et al, 2009

Page 16: Molecular and Genetic Epidemiology

Published Genome-Wide Associations through 6/2010, 904 published GWA at p<5x10-8 for 165 traits

NHGRI GWA Catalogwww.genome.gov/GWAStudies

Page 17: Molecular and Genetic Epidemiology

Genotyping technology Illumina

1 million SNP chip tagSNPs selected from

HapMap data Affymetrix

1 million SNP chip Selected based on

distance

http://www.illumina.com/Documents/products/technotes/technote_intelligent_snp_selection.pdf

Page 18: Molecular and Genetic Epidemiology

Whole Genome Sequencing Human Genome Project

First genome sequenced in 2000; project completed 2003 1000 Genomes Project

Goal: to create a complete and detailed catalogue of human genetic variation

Knome (founded by George Church and Harvard University) knomeDiscovery – sequencing (30x) and interpretation for

~$5,000 The Personal Genome

Interpretation (counseling?) Screening? High-risk groups? Drug efficacy? May help individuals alter behavior – but for now, we can’t do

anything about our genes!

Page 19: Molecular and Genetic Epidemiology

Bias in Genetic Studies

Page 20: Molecular and Genetic Epidemiology

Bias in Genetic Studies

Genetic polymorphism Disease

???

CONFOUNDING

Page 21: Molecular and Genetic Epidemiology

Bias in Genetic Studies

Genetic polymorphism Disease

Race/Ethnicity

CONFOUNDING

Page 22: Molecular and Genetic Epidemiology

Population Stratification

Example: Prostate cancer is more common in African

Americans than in Caucasians Frequency of many SNPs is different in African

American and Caucasian populations If we ignored race/ethnicity, what might

happen in our study?

Page 23: Molecular and Genetic Epidemiology

Population Stratification

Figure 1. The effects of population structure at a SNP locus.If the study population consists of subpopulations that differ genetically, and if disease prevalence also differs across these subpopulations, then the proportions of cases and controls sampled from each subpopulation will tend to differ, as will allele or genotype frequencies between cases and controls at any locus at which the subpopulations differ. The figure shows an example of this scenario with two populations in which the cases have an excess of individuals from population 2 and population 2 has a lower frequency of allele A than population 1. In this example, the structure mimics the signal of association in that there is a significant difference in allele and genotype frequencies between cases and controls.

Marchini, 2004

Caucasian

African American

Page 24: Molecular and Genetic Epidemiology

Adjusting for Ethnicity Defining & measuring ethnicity

Self-report Ancestry (where are you grandparents from?) Genotype many (hundreds) “ancestry

informative markers” Control for ethnicity

In design Restrict to one ethnicity Match on ethnicity

In analysis Stratify by ethnicity Include ethnicity in regression model

Page 25: Molecular and Genetic Epidemiology

Misclassification Non-differential

Of exposure: the degree of misclassification is the same according to disease status Likelihood that exposure is wrong is similar among

those who do and do not develop disease Differential

Of exposure: The degree of misclassification varies according to the disease status

Page 26: Molecular and Genetic Epidemiology

Misclassification Laboratory tests do not always work perfectly –

some % of samples may fail genotyping Missing or incorrect exposure information

Non-differential or differential misclassification? What can we do to ensure that the misclassification is

non-differential?

Page 27: Molecular and Genetic Epidemiology

Gene x Environment Interaction: An Example of Effect Modification

Given equal exposure to the same risk factor, individuals may have different risk of disease depending on their genetic background The effect of an exposure on a disease outcome is modified by genotype

Page 28: Molecular and Genetic Epidemiology

Gene-environment interaction

D+ D-

E+ 40 20

E- 80 40

D+ D-

E+ 60 80

E- 20 60

D+ D-

E+ 100 100

E- 100 100

OR = 1

AA genotype

AT/TT genotype

OR = 1

OR = 2.25

Stratify on genotype

Page 29: Molecular and Genetic Epidemiology

Effect Modification is Biological

DNA damage Lung Cancer

CYP1A1 GSTM1

Metabolism

Page 30: Molecular and Genetic Epidemiology

GWAS follow-up

Page 31: Molecular and Genetic Epidemiology

GWAS follow-up-Dozens of GWAS for many diseases have now been performed

-Thousands of samples and hundreds of thousands of SNPs

-Replication is necessary to determine which significant results are real

-Once we know the results are real, then what???

Eeles RA et al. (2008)

Page 32: Molecular and Genetic Epidemiology

GWAS follow-up Risk prediction model development

Understand biological function candidate genes/regions!

Some associated SNPs are not in gene regions Many types of biological data and techniques

can be employed to determine the function of the risk SNPs Fine mapping Expression (RNA and protein) Enhancer activity

Page 33: Molecular and Genetic Epidemiology

GWAS follow-up – 8q24 story

Ghoussaini et al.

A) Haploview output of the 1.18-Mb 8q24 "desert" showing the five cancer-specific regions reported to date

Page 34: Molecular and Genetic Epidemiology

GWAS follow-up – 8q24 story

Pomerantz et al, 2009

8q24 variation not associated with MYC mRNA expression in prostate tumor or normal tissue

Page 35: Molecular and Genetic Epidemiology

(a) ChIP assay on Colo205, demonstrating a pattern consistent with enhancer activity. (b) Luciferase reporter assay demonstrating enhancer activity in two CRC lines. Error bars denote one standard deviation from the mean of replicate assays. (c) Representative luciferase assay showing increased enhancer activity of G over T alleles, performed on a total of 18 clones (nine G and nine T over 3 d) (P = 0.024). Error bars denote one standard deviation from the mean of assays performed in triplicate. (d) Mass spectrometry plots from Sequenom analysis showing preferential binding of TCF7L2 to risk allele (G) in immunoprecipitated DNA, as evidenced by differential peak heights (right panel) compared to control input DNA (left panel) (P = 1.1 10-5).

GWAS follow-up – 8q24 story

Pomerantz et al, 2009

Page 36: Molecular and Genetic Epidemiology

GWAS follow-up (and beyond)

GWAS results

mRNA expression

Page 37: Molecular and Genetic Epidemiology

Thank you! Questions?