Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Applications of Human Genome Sequencing: Disease and
Personal Genomics
Conflicts: Personalis, Genapsys, AxioMx, Novartis
Michael Snyder
February 3, 2015
Outline • Quick review on human variation • Disease genome sequencing
– Cancer, Mystery Diseases
• Personal genomics
Genetic Variation Among People: Three
Types
3.7 Million/person
2) Short Indels (Insertions/Deletions 1-100 bp)
GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG
1) Single nucleotide variants (SNVs)
GATTTAGATCGCGATAGAG GATTTAGA------TAGAG
300-600K/person
Structural Variation (SV): Large Blocks (1000bp) of DNA that are Deleted, Inserted or Inverted
-" People Have >3,000 differences (>2.0kb) Relative to the Reference Human Genome Sequence
-" Likely responsible for much human differences and disease
Heterogeneity in Olfactory Receptor Genes (Examined 851 OR Loci)
CNVs affect: 93 Genes 151 ψgenes
The Cost of DNA Sequencing is Dropping
Human Genome Cost <$2Khttp://www.genome.gov/
Examples of People Who Have had Their Genomes Sequenced
From: www.genciencia.com
Jim Watson Craig Ventor Ozzy Osbourne
sciencewithmoxie.blogspot.com.au/2010_11_01_archive.html
• Understand and Treat Disease – Cancer– Mystery diseases– Prenatal diagnostics
• Managing Health Care in Healthy Individuals?
Impact of Genomics on Medicine
Cancer Genome Sequencing 1) Cancer is a genetic disease: both inherited and
somatic
2) TCGA-like projects have sequenced >10,000 cancer –Many types Ovarian, colon, breast, bladder, par=ncreatic ….
3) Each cancer contain many mutations
4) 10-20 “driver” mutations
Results of cancer genomics
10-100,000 SNVs for many cancer types + Structural variants (large insertions, deletions, fusions)
~254 cancer genes have been identified
Most common genes are already known e.g. p53, RB1, KRAS, BRCA1/2, etc
B Vogelstein et al. Science 2013
Cancer Genome Sequencing
Vogelstein et al., March Science, 2013
5) Each cancer is unique
6) Often fall into common pathways
TCGA-identified Pathways in Ovarian Cancer
12
RB and PI3K/AKT Signaling Homologous Recombination
Notch Signaling
12
RB and PI3K/AKT Signaling Homologous Recombination
Notch Signaling
Cancer genome sequencing Sequence genomes (cancer tissue and normal) find genetic changes and suggest possible therapies
Normal (~30X)
Cancer (>60X)
VS
Image: Bruce Blaus Image: National Cancer Institute
GATTTAGATCGCGATAGAG GATTTAGATCACGATAGAG
Initially; Now >200X)
Copy number alterations from metastatic Colon tumor
Chromosome 7: Two amplification regions
Chr 7p arm Chr 7q arm Log 2 ratio -
genomic copy
number
Chromosome 7 coordinates (Mb) NCBI 37
•" Two regional amplifications with complex genomic structure.
•" Both loci > 10 copy number showing statistical
significance.
CEN
With Hanlee Ji
CML (Chronic Myeloid Leukemia) - Gleevac
1) Chromosome Translocation involved Abl Protein Kinase and bcr
2) Gleevac—binds active site of Abl and inhibits it; causes remission
3) Gleevac binds other kinases (Kit) that are mutated in other cancer: Gleevac can be used to treat them.
Pharmacogenomics—Matching Drugs to Disease !
Her2 Mutated in 25% of
Breast Cancer
Iressa®: –" Used to treat nonsmall
cell lung carcinoma–" 10% of patients
respond –" Works only if patients
tumors have very specific mutations in EGFR
Matching drugs to cancer types - BRAF BRAF: Oncogene mutated in
~65% of melanoma, and more ~15-20% of colon cancers. Vast majority of BRAF mutations are at V600 (most commonly V600E).
Vemurafenib®, PLX4032®:
E!ective BRAF-inhibitors specific for V600 mutations. (Few side-e!ects) Resistance frequently
arises.
Bollag et al. Nature 2010
Challenges associated with cancer genome sequencing
Identification of the “Driver mutations is difficult; thousands of variants; only a few “drive” cancer cell growth
- Focuses on known genes - Focuses on protein coding regions
Tumor heterogeneity Drug resistance usually appears Drug combinations
Other omes: RNA-Seq • Identifies the expressed genes i.e. potential drivers
Example: Esophageal cancer Deep genome sequencing >20,000 SNVs & Indels >1000 Structural Variants
1/3 of synonymous variants expressed as RNA
Other omes: Circulating Tumor DNA
• CAPP-Seq Targeted search for
somatic mutations
• Biotinylated oligos 139 recurrent mutated genes in NSCLC
• 96% specificity for mutant allele fractions down to ∼0.02%
(Newman et al 2014) Nat. Med.
What About Cancer Predisposition?
Identifying “heritable” causes of cancer
From Jim Ford
APC FANCE PMS2
ATM FANCF PRSS1
BLM FANCG PTCH1
BMPR1A FANCI PTEN
BRCA1 FANCL RAD51C
BRCA2 LIG4 RET
BRIP1 MEN1 SLX4
CDH1 MET SMAD4
CDK4 MLH1 SPINK1
CDKN2A MLH2 STK11
EPCAM MSH6 TP53
FANCA MUTYH VHL
FANCB NBN
FANCC PALB2
FANCD2 PALLD
42 Gene Panel Breast Cancer Study
From Jim Ford
Multiple Gene Panel Test Results
• 36 of 361 women without B1/2 (10%) carried a potentially pathogenic mutation – ATM, BLM1, CDH1, CHEK2, MLH1, MSH2, MSH6, PMS2,
MUTYH, NBN, PRSS1, SLX4, RAD51C, PALB2, BRIP1 • Participants notified of significant results
– 11/14 from initial 200 followed-up, confirmed, counseled • Variants of Uncertain Significance (VUS) were common
– 40% of patients (0.7 per patient)
Kurian, Ford et al. Journal of Clinical Oncology 2014 Ford, Montreal BRCA Conference 2014
Mendelian Conditions: Undiagnosed Mystery Diseases • 0.4% of live births • 8% of adults have genetic disorder
recognized by adulthood • 25 M US Citizens • $5M/individual/lifetime
Ng et al., 2010 Nat. Genetics
Family with Charcot-Marie Disease
• Neuropathy – Heterogenous disease—many different genes mapped
• Sequence genome to 30X coverage • 3.4 M SNPs: (561,719 novel)
– 2,255,102 in intergenic – 1,165,204 in genes, introns etc.
• 174 nonsynonymous SNPs in region of interest • 54 related to Neuropathies • Ultimately zoomed in on SH3TC2 gene:
Full blown disease has two mutations: Y169H (missense), R954X (nonsense)* Single heterozygotes have some phenotypes
*Implicated previously Lupski et al 2010 NEJM
Solving Mystery Diseases: Dizygotic Twins:
Dopamine Responsive Dystonia
• Constantly sick, colicky, failed to meet milestones “floppy”; MRI showed some abnormalities
• Children diagnosed with dystonia
• Trial of L-DOPA showed dramatic improvement in 2 days
• Sequenced genomes-found mutation in SPR Gene
• Administered dopamine + seratonin precursor
From Richard Gibbs, Baylor
X
Solving Mystery Diseases: Child With Variety of Conditions
Developmentally Delayed, Significant Health Issues
F M
A1
Mother SNVs: 3,125,880 Private: 581,754 Indels: 723,379
Father SNVs: 3,119,588 Private: 596,691 Indels: 750,522
Child SNVs: 3,118,638 Private: 33,158 Indels: 673,809
SNVs: Single nucleotide variants Indels: = Insertions/deletions (~<100bp)
Candidates: TCP10L2, SUPV3L1, PIEZO1 DNAH2, NGLY1, FANCA, WFS1
Lessons Learned
1) Overall success rate for identifyng causative mutations is low 25%
2) Information not always directly actionable but still valuable.
3) Best success with a) Specific phenotypes b) Large families
4) Need large database to share information: Recurrence is key. ClinVar
Fetal DNA Sequencing 1) Cell free fetal DNA can be detected in maternal
blood as early a 4-5 weeks gestation
2) 4-13% circulating DNA is fetal à increases with pregnancy
3) Targeted detection of mutations
4) Whole genome sequencing routinely used to detect trisomies: Down’s (Chr. 21), Chromosome 18 and Chromosome 13. 99% sensitivity
5) Taking over from aminocentesis
Fetal DNA Sequencing
Srinivasan A, Bianchi DW, Huang H, et al. Am J Hum Genet 2013; 1–10.
Personal Genome Sequencing:
Can genome sequencing of a healthy person be useful in health care?
• Pharmacogenomics– Drug doses and side effects
• Predict disease risk
• Catch disease early
Impact of Genomics on Medicine
Genome Interpretation Pipeline
Sequence Genome (Illumina, Complete Genomics)
Call SNVs, Indels, SVs Phase Variants
Disease Gene (HGMD)
Risk Variants
Disease Gene Complex/Common
Disease Risk Genotypes
Specific Disease Variants
Pharmaco-genomics
Complex/Complex/Complex/Common Pharmaco
Specific
Genome
Transcriptome (mRNA, miRNA, isoforms, edits)
Proteome
Metabolome
Personal Omics Profile
Autoantibody-ome
Personal “Omics” Profiling (POP)
Cytokines
Epigenome
Initially 40K
Molecules/Measure-
ments
Now Billions! Microbiome (Gut, Urine,
Nasal, Tongue, Skin)
Personal Omics Profile 58 months; 92 Timepoints; 7 Viral Infections
/
/
Chen et al., Cell 2012
Accurate Genome Sequencing
3.3 M Hi conf. SNVs, 217K Indels and 3K SVs 2 or more Platforms
(Plus low confidence)
Whole Genome Sequencing •" Complete Genomics: 35 b paired ends (150X) •" Illumina: 100 b paired ends (120X)
Exome Sequencing •" Nimblegen •" Illumina •" Aglilent
3.30M 89%
100K 2%
345K 9%
CG Illumina
Coding Non-Coding
miRNA Splice UTR
miRNA targets
Seed sequence SIFT PP2
OMIM/Curated Mendelian disease
(51)
Nonsynonymous (1320)
Synonymous
mRNA stability
tRNA rate
Approach I: Known Disease Risk Pipeline
Rick Dewey & Euan Ashley
Damaging (234)
All variants ~3.5M
Rare/novel variants (<5%)
(1)
Curated List of Rare Variants (SNVs, All heterozygous)
Missense • ALAD, ABCC2, ACADVL, ADAMTS13, AGRN, BAAT, CDS1,
CHD7, COL4A3, CTSD, DGCR2, DLD, DYSF, EPCAM, FGFR1OP, FKRP, GAA, GNAI2, HSPB1, IGKC, ITPR1, MED12, MKS1, NTRK1, PCM1, PKD1, PLEKHG5, PMS2, PRSS1, PTCH2, SERPINA1, SETX, SYNE1, TERT, TTN, VWF, ZFPM2, PNPLA2.
Bolded Genes expressed in PBMC (RNA).
Nonsense • PRAMEF2, PLCXD2, NUP54, RP1L1, PIK3C2G,
NDE1, GGN, CYP2A7, IGKC
Not Rare But Important • KCNJ11 , KLF14, GCKR …
Missense •" ALAD, ABCC2, ACADVL, ADAMTS13, AGRN, BAAT, CDS1,
CHD7, COL4A3, CTSD, DGCR2, DLD, DYSF, EPCAM, FGFR1OP, FKRP, GAA, GNAI2, HSPB1, IGKC, ITPR1, MED12, MKS1, NTRK1, PCM1, PKD1, PLEKHG5, PMS2, PRSS1, PTCH2, SERPINA1, SETX, SYNE1, TERT, TTN, VWF, ZFPM2, PNPLA2.
Nonsense •" PRAMEF2, PLCXD2, NUP54, RP1L1, PIK3C2G,
NDE1, GGN, CYP2A7, IGKC
Not Rare But Important •" KCNJ11 , KLF4, GCKR ! Not Rare But Important
KCNJ11 , KLF4, GCKR
Diabetes
Not Rare But Important !
Not Rare But Important KCNJ11 , KLF4, GCKR KCNJ11 , KLF4, GCKR KCNJ11 , KLF4, GCKR
High Cholesterol !!KCNJ11 , KLF4, GCKR High Cholesterol
PRSS1, PTCH2, SERPINA1, SETX, SYNE1, TERT, TTN, MKS1, NTRK1, PCM1, PKD1, PLEKHG5, PMS2,
PRSS1, PTCH2, SERPINA1, SETX, SYNE1, TERT, TTN,
Aplastic Anemia
Rare Variants in Disease Genes (51 Total)
A B
C
D
E
F
Figure 2. Medical Findings(A) High interest disease- and drug-related variants in the subject’s genome.
(B) RiskGraph of the top 20 diseases with the highest posttest probabilities. For each disease, the arrow represents the pretest probability according to the
subject’s age, gender, and ethnicity. The line represents the posttest probability after incorporating the subject’s genome sequence. Listed to the right are the
numbers of independent disease-associated SNVs used to calculate the subject’s posttest probability.
(C) RiskOGram of type 2 diabetes. The RiskOGram illustrates how the subject’s posttest probability of T2D was calculated using 28 independent SNVs. The
middle graph displays the posttest probability. The left side shows the associated genes, SNVs, and the subject’s genotypes. The right side shows the likelihood
ratio (LR), number of studies, cohort sizes, and the posttest probability.
(D) Blood glucose trend. Measurements were taken from samples analyzed at either nonfasted or fasted states; the nonfasted states (all but days 186,
322, 329, and 369 and after day 400) were at a fixed time after a constant meal. Data was presented as moving average with a window of 15 days. Red
1298 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
Rong Chen and Atul Butte 0% 100%
* *
*
* Known *
Approach II: Complex Disease Risk Profile Using VariMed
VariMed: Highly curated Database of genetic associations with complex disease
Sum over likelihood ratios for disease risk
Disease risk profile T2D
HNF1B rs4430796 GG
MTNR1B rs10830963 CC
rs17036101 GG
rs4607103 CC
THADA rs7578597 TT
rs1153188 TA
rs12779790 AG
rs5945326 AA
TP53INP1 rs896854 TC
EPO rs1617640 AA
WFS1 rs10010131 GG
rs9300039 CC
RBMS1 rs1020731 GA
JAZF1 rs864745 TC
ARAP1 rs1552224 AA
KCNQ1 rs231362 GG
rs4457053 GG
rs7578326 AA
PPARGC1A rs2970847 TC
rs1111875 TT
KCNJ11 rs5219 TT
IGF2BP2 rs4402960 GT
SLC30A8 rs13266634 CT
CDKAL1 rs7754840 GG
FTO rs8050136 CC
KCNQ1 rs2237892 CT
rs10811661 TC
TCF7L2 rs7903146 CT
PrevalenceGenotype Test
1.13 1 11320 46%
0.94 1 16061 43%
1.02 1 89920 44%
1.04 1 89920 44%
1.03 1 89920 43%
0.96 1 89920 42%
1.06 1 89920 43%
1.09 1 94337 42%
1.01 1 94337 40%
1.48 2 4011 39%
1.07 2 30248 31%
1.05 2 42170 29%
0.95 2 84605 28%
1.00 2 89920 29%
1.03 2 94337 29%
1.07 2 94337 29%
1.12 3 94337 27%
1.07 3 94337 25%
1.31 4 5558 24%
0.88 6 93188 19%
1.15 7 87066 21%
1.06 8 104401 19%
0.94 9 145718 18%
0.91 10 51327 19%
0.87 10 63470 20%
0.80 13 6570 23%
0.85 18 154141 27%
1.18 49 140717 30%
27% LR Studies Samples Probability
Type 2 diabetes
10% 50% 100% Rong Chen and Atul Butte
!"#$ %&& %'& (&& ('& )&& )'& '&& ''& *&& *'&& "#$
!"#$%&'()*+,-./(
012(3#*4'5()6'"1789'(7%(:&7(;<='$78%</(
+&,&
-&&--&-%&-(&-)&-'&-*&
.-'&
!"#$%&'()*+,%
GLUCOSE LEVELS
HRV INFECTION (DAY 0-21)
RSV INFECTION (DAY 289-311)
LIFESTYLE CHANGE (DAY 380-
CURRENT) 44
HbA1c (%): 6.4 6.7 4.9 5.4 5.3 4.7 (Day Number) (329) (369) (476) (532) (546) (602)
Genome Analysis of 12 Healthy PeopleDewey, Grove, Pan, Ashley, Quertermous et al JAMA 2012
Ethnicity:7 Asians5 Europeans
Sequence genomes with Illumina (all 12; Mean depth: 50X (38-62)
9 also sequenced with Complete Genomics
Inherited Disease Risk and Carrier Status
# Variants Per Subject Median (Range)
Candidate Variants Manually Curated - Previously reported or potential pathogenic variants in ACMG genes
108 (90-127) 3 (1-7)
Reportable variants associated with disease risk (HGMD) - Reported disease-associated variants - Rare expected pathogenic variants - Genetic variants of unknown significance
5 (2-6)
0 (0-2) 0 (0-1) 3 (1-6)
Reportable variants associated with carrier status - Reported disease-associated variants - Rare expected pathogenic variants - Genetic variants of unknown significance
13 (8-18)
2 (0-4) 2 (1-4) 9 (4-12)
Study of 12 Healthy PeopleDewey, Grove, Pan, Ashley, Quertermous et al
- 3 followup diagnostic tests (range 0-10)- Cost ~$400-$1400 per individual (median $663-
$773
- 54 minutes per variants
- One individual had a BRCA1 nonsense mutation—no known family history
Gene SNP Patient genotype Drug(s) affected
CDKN2A/2B rs10811661 C/T Troglitazone (increased beta-
cell function)
CYP2C19 rs12248560 C/T Clopidogrel (increased activation)
LPIN1 rs10192566 G/G Rosiglitazone (increased effect)
SLC22A1 rs622342 A/A Metformin (increased effect)
VKORC rs9923231 C/T Warfarin (lower dose required)
High interest drug response–related variants: PharmaGKB
Drug Side Effects
• One teenager had her genome sequenced; discovered genetic susceptibility for blood clots (factor V mutation) (3-8% of European have this mutations)
à affects her birth control decisions as hormonal therapies increase blood clots
Many Unaddressed Challenges1) Accuracy and coverage
2) Interpreting non protein coding regions
3) Other information
Sequencing Accuracy Sequencing the Same Genome Twice
Personalis
146,100 SNPs (3.7%)
All genesMendelian disease genes in ClinVarACMG reportable genes
Median percentage of gene covered by ten or more reads
Perc
enta
ge o
f gen
es0
2040
6080
100
<70 70−74 75−79 80−84 85−89 90−94 95−98 99−1000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
6 7 7
92 92 93
All genesMendelian disease genes in ClinVarACMG reportable genes
Minimum percentage of gene covered by ten or more reads
Perc
enta
ge o
f gen
es0
2040
6080
100
4 4 2 1 2 0 2 2 2 3 4 26 7
211 13
18
3136
4540
32 30
<70 70−74 75−79 80−84 85−89 90−94 95−98 99−100
Coverage of Mendelian disease genes is high but incomplete
All genesMendelian disease genes in ClinVarACMG reportable genes
Median percentage of gene covered by ten or more reads
Perc
enta
ge o
f gen
es0
2040
6080
100
<70 70−74 75−79 80−84 85−89 90−94 95−98 99−1000 0 0 0 0 0 0 0 0 0 0 0 1 1 0 2 2 2
11 12 14
85 85 84
All genesMendelian disease genes in ClinVarACMG reportable genes
Minimum percentage of gene covered by ten or more reads
Perc
enta
ge o
f gen
es0
2040
6080
100
1 0 0 0 0 0 0 0 0 1 1 0 2 2 05 5 2
18 1923
74 73 75
<70 70−74 75−79 80−84 85−89 90−94 95−98 99−100
Illumina Complete Genomics
Coverage of ACMG reportable genes is high but incomplete
ACTA2
ACTC1
APC
BRCA1
BRCA2
CACNA1S
COL3A1
DSP
GLA
LDLR
MLH1
MSH2
MSH6
MUTYH
MYBPC3
MYH11
MYH7
MYL2
MYL3
NF2
NTRK1
PCSK9
PMS2
RB1
SCN5A
SDHAF2
SDHB
SDHC
SDHD
TGFBR2
TMEM
43TNNT2
TP53
TSC1
TSC2
VHL
DSC2
APOB
FBN1
RYR2
LMNA
DSG2
SMAD3
MYLK
MEN1
PRKAG2
TGFBR1
RET
TPM1
RYR1
PTEN
PKP2
WT1
KCNQ1
KCNH2
STK11
02040608090
92
94
96
98
100
Gene
IlluminaComplete Genomics
% o
f exo
nic
base
s co
vere
d by
10
read
s
Mapping Regulatory Variation to Personal Genomes
Two approaches:
1) Mapping transcription factor binding in different people.
2) RegulomeDB: Assembling regulatory information from the ENCODE Project and other sources.
54
X
RegulomeDB
Alan Boyle
Data Type Types Features Genomic Coverage (bp)
Transcription Factor ChIP-Seq (ENCODE) 495 conditions / cell lines 7,721,822 230,795,743
Transcription Factor ChIP-Seq (non-ENCODE) 32 conditions / cell lines 397,534 140,534,725
Transcription Factor ChIP-exo 1 condition 35,161 2,604,066
Histone Modifications 284 conditions / cell lines/ marks 23,055,241 2,805,205,184
DNase I Hypersensitive Sites 114 conditions / cell lines 20,710,098 614,973,579
FAIRE Sites 25 conditions / cell lines 4,816,196 476,386,909
DNase I Footprints 50 cell lines 128,266,803 178,722,370
Predicted Binding (PWMs) 1,158 motifs 239,713,973 1,151,732,122
eQTLs 142,945 SNPs 142,945 142,945
dsQTLs 6,069 SNPs 6,069 6,069
Manual Annotations 6 Genomic Regions 282 11,607
VISTA Enhancers 1,448 Enhancers 1,325 1,658,146
Validated SNPs affecting binding 855 SNPs 855 855
Alan P Boyle RegulomeDB
Damaging Variation in an Individual
Protein Coding Non-coding
Damaging Variation in an Individual
Gene Regulatory region
Protein Coding Non-coding
Gene
and
Alan P Boyle RegulomeDB
CAPN1 Compound Heterozygote
CAPN1 Regulatory region
Maternal Chromosome
Paternal Chromosome and
Calcium-sensitive cysteine protease in brain synapses Protective against Alzheimer’s Disease (Trinchese et al. 2008)
What About Other Omes?
Health Is a Product of Genome & Exposome
Food Health Disease
Genome
Pathogens
Exercise
Stress
Epigenomics Definition: From Wikipedia
• Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. Epigenetic modifications are reversible modifications on a cell’s DNA or histones that affect gene expression without altering the DNA sequence (Russell 2010 p. 475).
• Two of the most characterized epigenetic modifications are DNA methylation and histone modification.
Key Points
Many Opportunities for Modification Throughout Life
DNA Methylation A Sixth Base: 5Hydroxymethylation
http://www.ks.uiuc.edu/Research/methylation/
DNA Methylation in Promoters is Traditionally Associated with Gene Silencing
• Transfection experiments – 1980s – Methylated promoterà gene off – Unmethylation promoterà Gene on
• Correlates globally with gene expression (or lack thereof)
Whole genome profiling of DNA methylation
bisu
lfite
sequ
encin
g5m
C
5mCG 5mCG 5mCHG
Deep sequencing
Bisulfite converted library
Fragmented gDNA
Bisulfite conversion
MethylC-seq Bisulfite sequencing 2007...
Lister et al. et al. 2008
Illumina - SE or PE
Map reads against computationally BS converted + / - strands
Post-processing & stack
Call mC modifications
mC
Lister et al. 2009 In stem cells 62M mC (6% of Cs) Fibroblast 45M (4.3%) Stem cells 75% mCpG & 25% nonCpGs Fibroblasts: 99% are in CpGs Correlate with Gene Expression
Gene bodies are often methylated
DNA Methylation Associated with Many Diseases and Traits
Aging Nutrition Cancer Asthma and Allergies (Nadeau)