Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
1
The Impact of Genomics Era on Mycobacterium tuberculosis Research
Jamie Posey, PhD Applied Research Team Lead
National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention Division of Tuberculosis Elimination
NGS Platforms
MOLECULAR EPIDEMIOLOGY
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
2
Cluster 1
~ 100 Patients
DRUG RESISTANCE
Conventional Drug Susceptibility Testing
q Growth based § Time consuming (week to month) § Laborious § Infrastructure
q Liquid or solid media § Equivalent concentrations?
q Critical concentration
q Minimum inhibitory concentration
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
3
Drug Resistant Survey
q Based on phenotypic assays § Some countries perform routine DST § Resource-limited areas (once every 3-5 years) § Lack infrastructure
q Can we use molecular assays? § Do we have the knowledge § What tools are needed
rpoB Surveillance
q Rifamycin resistance § Role of mutations
q Frequency of mutations § Population level § Assay development
q Importance of data § Calculate sensitivity and specificity of assays § Silent mutations § How much phenotypic resistance is missed § What is the affect on patient outcome
Fusion Primers for Ion Torrent PGM
RRDR
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
4
Rifampin Resistant Determining Region (RRDR) of rpoB
rpoB Surveillance
q 14 months § >10,000 isolates
q Mutations § 411 isolates (35 unique mutations) § Ser531Leu (184 isolates) § Silent mutations (94 isolates)
RESEARCH
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
5
Areas to Address
q Identify new mechanisms of drug resistance § Existing and new drugs
q Identify preexisting resistance § Repurposed and new drugs
q Microevolution in the patient
q Compensatory mutations
29 44
26
19
36 63
46 44
B B
B B
B
B
A
C
C G C31998G58 bp upstream Rv0029
C TC1663856T
acn
C T C51403T Rv0047c G AG1674048A
fabG1
C T C118832T Rv0102 T CT1877958C
pks7
C T C247984T Rv0207c C GC1888075G
pks9
T C T362962CPE_PGRS5
T CT2087076C
171 bp upstream of Rv1838c
C G C477188G Rv0398c C TC2372126T
Rv2112c
C A C480678A mmpL1 G CG2402463C
Rv2142c
A GA649974G
ubiE T CT2614547C
46 bp upstream Rv2339
C T C761147T rpoB G AG2751471A
Rv2449c
G C G765719C rpoC G AG2958534A
Rv2631
C G C799139G Rv0698 G AG3126489A
Rv2819c
G AG905686A
Rv0811c G AG3137406A
echA16
C A C926861APE_PGRS13
C TC3213150T
lepB
C GC1023436G
betP A CA3377940C
PPE46
T GT1093459G
PE_PGRS17
A CA3380380C
PPE47
G AG1114491A
Rv0997 A GA3416480G
Rv3055
C TC1208858T
Rv1084 C TC3455434T
Rv3088
C TC1231660T
Rv1104 G CG3608047C
Rv3230c
A CA1246730C
bpoB C TC3764285T
PPE56
C TC1266797T
Rv1139c G AG3765280A
PPE56
C TC1309314T
fdxC C AC3777772A
spoU
A CA1320356C
papA3 A GA4026439G
5 bp upstream Rv3585
G AG1353888A
tagA C AC4037284A
PE_PGRS59
G AG1421085A
Rv1272c T AT4072484A
Rv3633
G CG4084482C
topA
A GA4314271G
bfrB
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
6
Identify New Mechanisms of Resistance
fabG1 inhA
L203L A silent mutation in mabA confers isoniazid resistance on Mycobacterium tuberculosis. Ando H, Miyoshi-Akiyama T, Watanabe S, Kirikae T.
inhA Transcript Levels
CLINICAL MANAGEMENT
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
7
Starting Material
Sputum Dx Culture Subculture
WGS versus Targeted Approach
q WGS § Need a culture § High molecular weight DNA (1—5 ug) § Analyze about 90—99% of genome § Low to medium throughput
q Targeted loci § Could possibly start with processed sputum § PCR based § Lower quality and quantity of DNA § Only analyze the areas amplified § High throughput
Sensitivity and Specificity of Loci
Drug Gene(s) Sensitivity (%) Specificity (%)
RIF rpoB 97.1 93.6
INH inhA, katG 90.6 100
EMB embB 78.6 93.1
FQ gyrA 81.6 97.7
KAN rrs, eis 86.5 96.1
AMK rrs 90.0 98.8
CAP rrs, tlyA 60.9 87.3
MDR rpoB, inhA, katG 90.8 94.7
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
8
Retrospective DR Study and MDDR Service Sensitivity and Specificity of Loci
Drug Gene(s) Sensitivity (%) Specificity (%)
RIF rpoB 97.1 97.4
INH inhA, katG 86.0 99.1
EMB embB 78.8 94.3
FQ gyrA 79.0 99.6
KAN rrs, eis 86.7 99.6
AMK rrs 90.9 98.4
CAP rrs, tlyA 55.2 91.0
Molecular Detection of Drug Resistance
q Original assay § 8 single PCR reactions § Sanger DNA sequencing § 16 sequencing reactions
q High throughput assay § NGS – Ion Torrent PGM § Multiplex PCR § Barcoded – 96 samples per assay
Targeted Loci
DRUG Gene Region Rifampin rpoB 176 and RRDR Isoniazid katG 315
inhA Promoter Ethambutol embB ERDR
Pyrazinamide pncA Promoter and ORF Fluoroquinolones gyrA QRDR
gyrB QRDR Kanamycin eis Promoter
rrs 1401 Amikacin rrs 1401
Capreomycin rrs 1401 tlyA Promoter and ORF
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
9
Workflow
2
13Ion AmpliSeq™ Library Preparation User Guide
Methods
Workflow
Amplify targets(page 15)
Partially digest primer sequences (page 17)
Ligate adapters to the amplicons and purify (page 17)
Option 1: Equalize the
library(page 19)
Option 2: Quantify the unamplified
library by qPCR
(page 21)
Option 3: Quantify the amplified
library with the Qubit® 2.0
Fluorometer or Agilent® 2100 Bioanalyzer® instrument (page 23)
Combine libraries (optional) and proceed to template preparation.
A P1
AP1Adapters
Amplify targets usingIon AmpliSeq™ Primer Pool
Genomic DNA
Partially digest primer sequences
Ligate adapters P1
XBarcode Adapters
OR
Barcoded libraryP1X
OR
Nonbarcoded library
Pilot Study
q 8 and 48 samples
q Use crude DNA preps
q Compare Ion Torrent data to Sanger sequence
Number of Reads Per Sample
Sample Number of Reads MLB2 100,039
MLB18 55,909 MLB36 40,246
MLB138 5,715 MLB149 51,144 MLB176 4,525 MLB207 98,238 MLB224 50,865
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
10
>
MID Ref'Pos Type Ref'Base Called'BaseSNP'% Feature'Name Amino'Acid'ChangeDepthMLB2 4247730 SNP G C 97.70% embB G406A 1196MLB224 4247578 SNP G A 99.20% embB L355L 1658MLB224 4247646 SNP A C 95.90% embB E378A 560MLB138 7585 SNP G C 90.90% gyrA S95T 220MLB149 7585 SNP G C 90.30% gyrA S95T 944MLB176 7585 SNP G C 72.00% gyrA S95T 100MLB176 7582 SNP A G 84.00% gyrA D94G 100MLB18 7585 SNP G C 90.00% gyrA S95T 964MLB2 7585 SNP G C 89.80% gyrA S95T 2538MLB224 7585 SNP G C 91.80% gyrA S95T 1279MLB36 7585 SNP G C 89.20% gyrA S95T 869MLB36 7570 SNP C T 99.50% gyrA A90V 830MLB18 2155168 SNP C G 99.60% katG S315T 1308MLB2 2155168 SNP C G 99.60% katG S315T 2475MLB36 2155168 SNP C G 99.30% katG S315T 957MLB18 2288827 SNP C T 78.30% pncA V139M 2702MLB2 2288935 SNP A G 99.80% pncA Y103H 8046MLB224 2288733 SNP G A 98.70% pncA A170V 1475MLB36 2288826 SNP A G 78.90% pncA V139A 1882MLB176 761161 SNP T C 60.00% rpoB L452P 25MLB18 761155 SNP C T 99.10% rpoB S450L 900MLB2 761155 SNP C T 99.50% rpoB S450L 1577
SNP Report
SNP Report
MLB36 761155 SNP C T 99.10% rpoB S450L 680MLB2 1473246 SNP A G 99.80% rrs 5051MLB207 1473183 SNP A G 35.00% rrs 20MLB36 1473246 SNP A G 99.80% rrs 1658MLB138 1917972 SNP A G 93.10% tlyA L11L 159MLB149 1917972 SNP A G 96.40% tlyA L11L 863MLB176 1917972 SNP A G 97.40% tlyA L11L 76MLB176 1918523 SNP G A 98.90% tlyA G195D 89MLB18 1917972 SNP A G 97.00% tlyA L11L 943MLB2 1917972 SNP A G 96.60% tlyA L11L 1511MLB207 1917972 SNP A G 97.40% tlyA L11L 1337MLB224 1917972 SNP A G 96.30% tlyA L11L 934MLB36 1917972 SNP A G 96.50% tlyA L11L 818
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
11
Number of Reads Per Sample (48)
16,000
32,000
48,000
# R
eads
0
500
1000
1500
2000
2500
3000
0 1 2 3 4 5 6 7 8 9 10 11 12
Coverage Depth
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9 10 11 12
Coverage Depth
# R
eads
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
12
Summary and Next Steps q Pilot project
§ 8 samples – completed and 100% agreement with Sanger data § 48 samples – analyzing data
q Quality and quantity of DNA
q Determine the minimal coverage and number of reads
q Scale up to 96 samples
q Test processed specimens
METAGENOMICS
Starting Material
Sputum Dx Culture Subculture
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
13
Synthetic Dilution Series Strategy
• Pure gDNA dilution to extinction • Mtb, STEC, C diff
• gDNA/gRNA dilution to extinction in complex background • Mtb/Sputum • STEC/Stool • Influenza/A549 RNA • OC43 Coronavirus/A549 RNA
• Targeted enrichment study • Background depletion study
Mtb Sputum Synthetic Sets
• Create seven sets of dilutions (100 x 50 µL ea.) in a constant background of Sputum gDNA (25 ng µL-1)
Ct 40 Mtb IS6110 RT-PCR
• Dilution 0 – Unspiked Sputum Ct >40
• Dilution 1 – 10% Mtb Ct 13.0 ±0.16
• Dilution 2 – 1% Mtb Ct 16.7 ±0.12
• Dilution 3 – 0.1% Mtb Ct 20.1 ±0.05
• Dilution 4 – 0.01% Mtb Ct 23.5 ±0.14
• Dilution 5 – 0.001% Mtb Ct 27.1 ±0.15
• Dilution 6 – 0.0001% Mtb Ct 30.4 ±0.12
1 2 3 4 5 6
5 aliquots of each dilution sampled at random assayed in triplicate RT-PCR
0
DNA Targeted Sequence Enrichment Strategy
• Commercial and lab developed kits • Agilent SureSelect • Roche NimbleGen SeqCap • NuGEN Select • Custom assays
• Three initial designs for each approach • Mtb H37Rv • STEC O157:H7 Sakai • Influenza H3N2 A/Texas/50/2012
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
14
Agilent SureSelect
Roche NimbleGen SeqCap
Roche Targeted Sequence Enrichment Project Workflow
MTb gDNA + Sputum gDNA at different ratios
MTb % 0.0001 0.001 0.01 0.1 1 10
KAPA-Based Library Prep
Sequence Capture
MiSeq Sequencing and Analysis
All mixtures @ 25 ng uL-1
0
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
15
Mtb Sputum Roche TE Analysis – Pre Treatment/ No Enrichment Read Count Normalized
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
0 10 1 0.1 0.01 0.001 0.0001
Num
ber o
f Reads
% Mtb DNA in Original Sample
Other
Mtb
Human
QC
Mtb Sputum Roche TE analysis – Pre Treatment/ No Enrichment
5814354, 96%
554, 0%
268994, 4%
No Spike
5013170, 93%
137646, 3%
243280, 4%
10%
5261796, 95%
12682, 0%
253198, 5%
1%
5376692, 95%
1548, 0%
284728, 5%
0.1%
6501390, 95%
214, 0% 323650, 5%
0.01%
6171228, 95%
104, 0%
315366, 5%
0.001%
6497154, 95%
72, 0% 374982,
5%
0.0001%
Mtb Sputum Roche TE analysis - Pre Treatment/ No Enrichment
0
1
2
3
4
5
6
7
0 10 1 0.1 0.01 0.001 0.0001
Fold Gen
ome Co
verage
Mtb in Original Sample
Average Read Depth
0
20
40
60
80
100
0 10 1 0.1 0.01 0.001 0.0001
% Gen
ome Co
vered
% Mtb in Original Sample
Percent Coverage (Reference Base pairs Covered)
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
16
Mtb Sputum Roche TE analysis - Pre Treatment/ No Enrichment
0
1000
2000
3000
4000
5000
6000
1000
193000
385000
577000
769000
961000
1153000
1345000
1537000
1729000
1921000
2113000
2305000
2497000
2689000
2881000
3073000
3265000
3457000
3649000
3841000
4033000
4225000
H37Rv Genome PosiXon
10% Mtb
0 500
1000 1500 2000 2500 3000 3500 4000 4500
1000
193000
385000
577000
769000
961000
1153000
1345000
1537000
1729000
1921000
2113000
2305000
2497000
2689000
2881000
3073000
3265000
3457000
3649000
3841000
4033000
4225000
H37Rv Genome PosiXon
1% Mtb
0 2000 4000 6000 8000 10000 12000 14000 16000
1000
178000
355000
532000
709000
886000
1063000
1240000
1417000
1594000
1771000
1948000
2125000
2302000
2479000
2656000
2833000
3010000
3187000
3364000
3541000
3718000
3895000
4072000
4249000
H37Rv Genome PosiXon
0.1% Mtb Gene: rrs (16S) Gene: rrl (23S) Gene: rrf (5S) At decreasing levels of Mtb genome
sequence present in samples – mappers appear to erroneously attribute other 16S sequence in background sputum to the reference genome
Mtb Sputum Roche Post Enrichment Analysis – Read Count Normalized
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
9000000
0 1 0.1 0.01 0.001 0.0001
Num
ber o
f Reads
% Mtb in Original Sample
Other
Mtb
Human
QC
4318994, 84%
389712, 8%
407228, 8%
0.001%
1878156, 39%
2566948, 54%
344742, 7%
0.01%
533198, 13%
3227798, 81%
245416, 6%
0.1%
417872, 9%
4037224, 85%
272644, 6%
1%
6577598, 93%
37984, 1% 431280, 6%
No Spike
Mtb Sputum Roche Post Treatment Analysis
5194026, 89%
114054, 2%
533266, 9%
0.0001%
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
17
Mtb Sputum Roche Post Enrichment Analysis
1.37
184.33
148.8
115.61
17.6 7.58
0 20 40 60 80 100 120 140 160 180 200
0 1 0.1 0.01 0.001 0.0001
Fold Gen
ome Co
verage
% Mtb in Original Sample
Average Read Depth
5.36
99.4 99.31 99.18 96.77
57.72
0 10 20 30 40 50 60 70 80 90 100
0 1 0.1 0.01 0.001 0.0001
% Gen
ome Co
verage
% Mtb in Original Sample
Percent Coverage (Reference Base Pairs Covered)
Mtb Sputum Roche Post Enrichment Analysis
Gene: rrs (16S) 1471846 – 1473382 Gene: rrl (23S) 1473658 – 1476795 Gene: rrf (5S) 1476899 – 1477013
0
5000
10000
15000
1000
23
4000
46
7000
70
0000
93
3000
11
6600
0 13
9900
0 16
3200
0 18
6500
0 20
9800
0 23
3100
0 25
6400
0 27
9700
0 30
3000
0 32
6300
0 34
9600
0 37
2900
0 39
6200
0 41
9500
0
No Spike
0 1000 2000 3000 4000 5000 6000
1000
24
7000
49
3000
73
9000
98
5000
12
3100
0 14
7700
0 17
2300
0 19
6900
0 22
1500
0 24
6100
0 27
0700
0 29
5300
0 31
9900
0 34
4500
0 36
9100
0 39
3700
0 41
8300
0
1%
0 1000 2000 3000 4000 5000
1000
26
1000
52
1000
78
1000
10
4100
0 13
0100
0 15
6100
0 18
2100
0 20
8100
0 23
4100
0 26
0100
0 28
6100
0 31
2100
0 33
8100
0 36
4100
0 39
0100
0 41
6100
0
0.1%
0 2000 4000 6000 8000 10000 12000 14000 16000
1000
24
7000
49
3000
73
9000
98
5000
12
3100
0 14
7700
0 17
2300
0 19
6900
0 22
1500
0 24
6100
0 27
0700
0 29
5300
0 31
9900
0 34
4500
0 36
9100
0 39
3700
0 41
8300
0
0.01%
0
5000
10000
15000
20000
1000
24
7000
49
3000
73
9000
98
5000
12
3100
0 14
7700
0 17
2300
0 19
6900
0 22
1500
0 24
6100
0 27
0700
0 29
5300
0 31
9900
0 34
4500
0 36
9100
0 39
3700
0 41
8300
0
0.001%
0 5000 10000 15000 20000 25000 30000
1000
24
7000
49
3000
73
9000
98
5000
12
3100
0 14
7700
0 17
2300
0 19
6900
0 22
1500
0 24
6100
0 27
0700
0 29
5300
0 31
9900
0 34
4500
0 36
9100
0 39
3700
0 41
8300
0
0.0001%
Mtb Sputum Roche TE Coverage
0
20
40
60
80
100
0 10 1 0.1 0.01 0.001 0.0001
% Gen
ome Co
vered
% Mtb in Original Sample
Percent Coverage (Reference Base pairs Covered) No Enrichment
5.36
99.4 99.31 99.18 96.77
57.72
0
20
40
60
80
100
0 1 0.1 0.01 0.001 0.0001
% Gen
ome Co
verage
% Mtb in Original Sample
Percent Coverage (Reference Base Pairs Covered) Post Enrichment
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
18
Mtb Sputum Roche Average Depth
0 1 2 3 4 5 6 7
0 10 1 0.1 0.01 0.001 0.0001
Fold Gen
ome Co
verage
% Mtb in Original Sample
Average Read Depth No Enrichment
1.37
184.33
148.8
115.61
17.6 7.58
0 20 40 60 80 100 120 140 160 180 200
0 1 0.1 0.01 0.001 0.0001
Fold Gen
ome Co
verage
% Mtb in Original Sample
Average Read Depth Post Enrichment
Drug Resistant Genes
Drug Resistant Genes
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
19
0 20 40 60 80
100
0 1 2 3 4 5 6
BLR 3
S95A
0 20 40 60 80 100
0 1
BLR 4
D94A S95T
0 20 40 60 80 100
0 1
BLR 2
D94A S95T
0 20 40 60 80
100
0 14 15 21
BLR 1
A90V D94N S95T
% Reads
% Reads
% Reads
% Reads
0 20 40 60 80
100
0 1 3 10 16
AQR 1
D94G S95T
0 20 40 60 80 100
0 1 3 5 6 8 9 10 11 12
AQR 2
D94Y D94G S95T
0 20 40 60 80
100
0 3 8 10 13 15 18 21
AQR 3
G88C D89G A90V D94N D94G S95T
0 20 40 60 80 100
1 2 3 4 6 7 8 9 13
AQR 4
A90V D94N D94A S95T
% Reads
% Reads
% Reads
% Reads
Molecular Epidemiology
Drug Resistance
Metagenomics
Posey: The Imipact of Genomics Era on Mtb Research
2/26/16-‐TB Genomics
20
Acknowledgements q Laboratory Branch / Applied Research Team
§ Melisa Willby § Paige Chopra § Paul Grwzybowski § Lauren Cowan
q Metagenomics Group § Chris Hopkins § Eishita Tyagi § Scott Burns
q Core Facility § Mike Frace § Scott Sammons § Kristen Knipe