Upload
dotruc
View
218
Download
0
Embed Size (px)
Citation preview
Jeremiah M. Scharf, MD, PhD
Departments of Neurology, Psychiatry and
Center for Human Genetic Research
Massachusetts General Hospital
Introduction to the
Genetics of Complex
Disease
Breakthroughs in Genome Science
Human Genome Project:
Sequence
2001 HapMap Project:
Common Variation
2005 1000 Genomes Project:
Rare Variation
2010 ENCODE Project:
Function
2012
Patterns of Inheritance:
Single Gene Disorders
Recessive
Example: Sickle Cell Anemia
Single gene causes disease
Disease requires two copies of
mutation
Dominant
Example: Huntington Disease
Single gene causes disease
Disease requires one copy of
mutation
Complex Disorders
• Inheritance pattern: multifactorial or “complex”
• Not due to single gene
• Several or many genes may contribute
• Each may have small effect by itself
• Effects may depend on interaction with environment and other genes (epistasis)
Complex Disease Genetics
• Most common medical illnesses are genetically complex
• Aggregate in families but don’t show Mendelian segregation
• Multiple genes contribute to disease in each individual
• Incomplete penetrance and variable expression – penetrance = probability of disease given risk
genotype
• Gene-gene and gene-environment interaction
Chain of Genetic Research
Adapted from Faraone and Tsuang. 1995.
Questions Study Methods
Is the disorder familial? Family study
How much do genes contribute? Twin and adoption studies
What genes are involved? Linkage, association,
sequencing
How do genes cause disease? Functional and biological
studies
Does it “Run in Families”?
• Compare prevalence (risk) in relatives
of affected proband to prevalence in
relatives of unaffected controls
• Recurrence risk ratio:
Risk to first-degree relative of affected
Prevalence in general population 1 =
Familial relative risk (RR) for various
neuropsychiatric disorders
Textbook of Neuropsychiatry and Behavioral Neurosciences, 5th Edition. Eds, Yudofsky SC, Hales RE.
© 2008 American Psychiatric Publishing, Inc. All rights reserved. www.appi.org
Mendelian
(monogenic)
Complex
Inheritance
genetic
+
non-genetic
(“environmental”)
“Deterministic”
“Probabilistic”
SNCA
Parkin
Etc
APP
PS1
PS2
Twin Studies: Is it “Genetic”?
• Compare concordance in MZ vs DZ twins
• MZ > DZ implies genetic contribution
• MZ < 100% implies environmental
contribution
Heritability (h2): Proportion of phenotypic variance
(in a population) attributable to genetic factors.
Heritability Caveats
• A heritability of 60% means
– that at least one gene operates on the trait
– that 60% of the individual differences in that population can be
attributed to differences in the additive effects of certain genes
• A heritability of 60% does not mean
– that the trait of any one individual is 60% determined by his or her
genes, 40% determined by his or her environment
– that environmental interventions can not have striking effects
• Ignores heterogeneity in mode of inheritance
• Depends on degree of genetic and environmental variability in
the population
Courtesy: Shaun Purcell
Estimated Heritability
Disorder/trait Approx. h2
Autism 80%
Schizophrenia 80%
Bipolar Disorder 60-80%
Attention Deficit Disorder ~75%
Tourette Syndrome 60-80%
Inflammatory Bowel Disease 65-75%
Multiple Sclerosis 55%
Alcohol/drug addiction 55%
Major Depression 40%
Anxiety Disorders 30-45%
Breast Cancer 25%
Where are the genes?
Molecular Genetic Methods • Linkage analysis: examines the co-
inheritance of the phenotype with markers of known chromosomal location – Primary application: genome scans (“Where”)
• Association analysis: examines correlation between specific genetic variants and presence of the phenotype – Primary application: candidate gene and
genomewide studies (“Which”)
Linkage vs. Association
Linkage Association
Question Where are the
Genes?
Which Alleles Confer
Risk?
Best Suited For Mendelian Disease Complex Disease
Genomic Scope Whole Genome [Candidate Gene] or
Whole Genome
Subjects Families Case/control or
nuclear families
Markers Microsatellites or
SNPs SNPs
Typical Marker
Spacing < 10 Mb < 10 kb
McCarthy et al., 2008; Sullivan et al., 2012
Genetic Architecture Landscape of mutations that collectively
contribute to disease
Major gene
Large effect
Example:
Huntington’s Disease
Many genes (polygenic)
Small effects
Example:
Height
Boston
McCarthy et al., 2008; Sullivan et al., 2012
Genetic methods target different types of
mutations
COPY NUMBER VARIANTS
LINKAGE
Family-based Case-control
OR
ASSOCIATION
NEXT-GEN
SEQUENCING 1
2
3 3 Early-onset AD
(APP, PS1/2)
Cystic Fibrosis
VCFS/DiGeorge
Williams Syndrome
Idiopathic
Neurodevelopmental Disorders
Late-onset AD
APOE
Common
Disorders
Inflammatory Bowel Disease
Multiple Sclerosis
Type 2 Diabetes
Schizophrenia
ACGGCGCGCATCGCTGATCGATGGCTCGTG
ACAGCAGCTACGACATGACGCAGCGCCAAC
GGGCTAGCTAGCTTTAGTTTCCCCGAAAGCG
CGAGCGACGCTCGATCGCTCGATCGACGGC
GCGCATCGCTGATCGATGGCTCGTGACAGC
AGCTACGACATGACGCAGCGCCAACGGGCT
AGCTAGCTTTAGTTTCCCCGAAAGCGCGAGC
GACGCTCGATCGCTCGATCGACGGCGCGCA
TCGCTGATCGATGGCTCGTGACAGCAGCTA
CGACATGACGCAGCGCCGACGGCGCGCATC
GCTGATCGATGGCTCGTGACAGCAGCTACG
T
SINGLE NUCLEOTIDE POLYMORPHISMS (SNPs):
Most common form of human genetic variation
A G
A G
A G
A A
A A A A
A A A A
A G G G G G G G
G G A G A G G G
Cases Controls
A G A G
G G
A G A G G G
G G
A A
A G
A G A A
A G A G
A A G G
Trios
Association Analysis:
Co-inheritance of Alleles
and Disease Across Families
Are alleles transmitted to
affected offspring more than
50% of time?
Are alleles more common in
cases than controls?
Association Studies are Like
Other Epidemiologic Studies • General Question: Is Exposure Associated
with Disease?
• Is smoking associated with MI?
+
Cases (MI+) Controls (MI-)
+ --
--
-- +
+ + +
-- --
-- -- -- --
+ MI+ MI-
120 50
-- 54 100
OR = (120*100)/(50*54) = 4.44
2 = 41.0, p < .0001
A G
A G
A G
A A
A A A A
A A A A
A G G G G G G G
G G A G A G G G
Alleles as Exposures
Are alleles more common in cases
than controls?
ie Is G allele associated with MI?
Cases (MI+) Controls (MI-)
MI+ MI-
G 120 50
A 54 100
OR = (120*100)/(50*54) = 4.44
2 = 41.0, p < .0001
Family-based Association Analysis:
Transmission/disequilibrium test
?
? 1 2 1 2
1 1
2 2 1 2 1 1
1 2 1 1 1 2
Not
Transmitted
Transmitted
1
1
2
2
a 120
c d
cTDT2
=(b-c)2
(b+c)å
50
95
195
1 2
1
2
Transmitted
Not
Transmitted
TDT
2 =
( 120 - 50 ) 2
( 120 + 50 )
b
=
p < .0001
Association Study Pitfalls
Problem Solutions
False positives:
-Multiple testing (genes x SNPs x phenotypes)
-Low prior probability for any SNP
(even for the “best” candidate gene!)
Correct for multiple testing
Independent Replication!
False negatives:
-Modest effects sizes of susceptibility alleles
-Vast majority of studies are underpowered
-Typical odds ratios for GWAS loci = 1.1-1.3
-Detection requires samples of 10s of
thousands
Increase sample size
LD and Haplotypes
• Linkage disequilibrium (LD): correlation in the
population between alleles at two loci. ie non-
random association of alleles at linked loci
• Haplotype: A series of alleles at linked loci
along a single chromosome
• Haplotype (LD) blocks: genomic regions of
LD. The human genome shows a block-like
structure with limited haplotype diversity (Gabriel et al. Science, 2002)
The GWAS Era
• Before 2006: only a handful of genes had been found for any
common medical disorders like diabetes, heart disease,
inflammatory bowel disease, arthritis
• Since 2006: thousands of confirmed genetic findings
for major medical diseases
• What Happened?
Powerful DNA chip technology
Computational advances
Whole genome analysis
Much larger studies
Genomewide Association Studies (GWAS) • Micro-array based genotyping technique
• Assays common DNA variants (“SNPs”) that “tag” blocks of
DNA across the human genome
– mean DNA block size: ~10-20 kb (10-20,000 DNA bases)
– much finer resolution than linkage studies
– each chip assays > 1 million SNP markers in a single experiment
www.nature.com/.../v5/n5/full/nmeth0508-447.html; http://www.illumina.com; http://www.sanger.ac.uk
Genomewide Association Study
(GWAS) • DNA Microarray (DNA-Chip) with 500K - 5M SNPs
covering the genome
• Allele frequencies usually >5%
• Examine for each SNP:
– allele frequency differences between cases and controls
– correlation between allele count and quantitative trait
• Threshold for significance: p < 5 x 10-8
Crohn’s: ~ 10 genes / 1,000 cases
Schizophrenia: ~ 4 / 1,000
Adult Height: ~ 3/ 1,000
(Bipolar Disorder: ~ 1 gene/ 1,000 cases)
# G
WA
S L
oci
# of cases
So, you found an association.
Is it due to…? • True association with causal variant?
• Spurious association due to confounding? (population stratification)
• Linkage disequilibrium with nearby causal variant?
• Chance
– indexed by p value--but beware multiple testing!
Hardy-Weinberg Equilibrium
• large population
• no mutation
• no selection
• random mating
• no migration
[A] = p
[a] = q
p + q =1
[AA] = p2
[Aa] = 2pq
[aa] = q2
frequencies remain stable
With genome-wide SNP data, population structure can be detectable to very fine scales...
Novembre et al (2008)
Population Stratification
• Differences in allele
frequencies between
cases and controls due
to systematic differences
in ancestry rather than
association of genes
with disease.
Population Allele Differences Can
Confound Association Studies • Does A/G SNP in CNR1 gene cause MI?
• Cases recruited from MGH patients: – 55% European-American
– 20% African-American
• Controls recruited from volunteers – 85% European American
– 5% African American
G A
European
American .7 .3
African
American .4 .6
A G
A G
A G
A A
A A A A
A A A A
A G G G G G G G
G G A G A G G G
Cases (MI+) Controls (MI-) p < .0001
Copy Number Variation • Structural variations of > 1kb
• Low copy repeats are common mechanism:
– Highly homologous sequence elements arising from segmental duplication
• E.g. cause of psychiatric illness
– VCFS/DiGeorge syndrome - microdeletion on 22q: 20-30% incidence of
psychotic illness
– Autism - de novo CNVs in >10% of sporadic cases?
Related Methods: The “-omics”
• Functional Genomics - a field of molecular biology that attempts to
make use of the vast wealth of data produced by genomic projects to
describe gene and protein functions and interactions. Focuses on
dynamic aspects such as gene transcription, translation, and protein-
protein interactions, as opposed to the static aspects of the genomic
information such as DNA sequence or structures.
– Transcriptomics (expression profiling)- examines the expression level of
mRNAs in a given cell population, often using high-throughput techniques
based on microarray technology.
– Proteomics- examines the full complement of proteins and their structure,
quantity, and function
– Metabolomics- examines the whole set of small-molecule metabolites (such
as metabolic intermediates, hormones and other signalling molecules, and
secondary metabolites) to be found within a biological sample or organism
– Interactomics- examines the whole set of molecular interactions in cells
RoadMap Epigenomics Consortium, Nature 2015
Nature 518, 317–330 (19 February 2015) doi:10.1038/nature14248
Genomics In Silico
NIH BISTIC definition
• Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.
• Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.
Develop Functional
Assays
Preclinical and
Safety Studies
Confirmed
Genetic Variants
Small Molecule
Screening Proof-of-Concept
Trials
Larger Clinical
Trials
Biological
Characterization
Skin cells
Induced
Stem Cells
Neurons
Glia
Animal Models
Summary • Most common diseases are complex
– Aggregate in families with non-Mendelian patterns of inheritance
– Multiple genes of varying effect
– +/- Gene-gene interaction (epistasis), gene-environment interaction
• Association analysis is most common method for identifying susceptibility alleles – Interpret with care:
• Beware false positives
• Replication is essential
• Exome and whole genome sequencing now feasible and successful in identifying rare variants related to Mendelian and complex disorders – Ultimately, whole genome sequencing may become the
preferred approach