View
1.933
Download
4
Category
Preview:
DESCRIPTION
Sergei L. Kosakovsky Pond, PhD (UC San Diego AntiViral Research Center) presents "Promises and Challenges of Next Generation Sequencing for HIV and HCV"
Citation preview
The UC San Diego AntiViral Research Center sponsors weekly presentations by infectious disease clinicians, physicians and researchers. The goal of these presentations is to provide the most current research, clinical practices and trends in HIV, HBV, HCV, TB and other infectious diseases of global significance. The slides from the AIDS Clinical Rounds presentation that you are about to view are intended for the educational purposes of our audience. They may not be used for other purposes without the presenter’s express permission.
AIDS CLINICAL ROUNDS
January 11, 2013
Promises and Challenges of Next Generation Sequencing for HIV and HCVSergei L Kosakovsky Pond, PhD. Associate Professor, UCSD Department of Medicine.
Outline
✤ Next generation / Ultradeep sequencing (NGS/UDS) technology
✤ NGS applications for HIV and HCV
✤ What are the unique advantages of NGS?
✤ What are the limitations of NGS?
✤ Clinical relevance of NGS-based assays
✤ Regulatory approval
Genomic sequencing
✤ In the recent years, sequencing (DNA, RNA) has rapidly become the cheapest and fastest assays in many applications
✤ Sub-$1000 human genome very shortly.
http://www.genome.gov/sequencingcosts/
NGS (Solexa) introduced commerically
Is NGS relevant for medicine?
✤ In 2012, 6 out of TIME magazine’s Top 10 Medical Breakthroughs relied on NGS
1 The ENCODE project (non-coding DNA)
2 The Human Microbiome Project6 Cancer Genome Atlas
7 Neo-/pre-natal screening for rare diseases
8 Pediatric Cancer Diagnostics
10 P. acnes phage characterization
Next generation sequencing
✤ Traditional (Sanger) sequencing generates a small number of intermediate length reads (~1000 bp)
✤ All NGS technologies perform millions of parallel sequencing reactions to generate many, typically short, reads per run.
✤ Two canonical applications for NGS
✤ Assembling long sequences from short fragments (human genome, cancer)
✤ Characterizing diverse populations (HIV, HCV, immune repertoire, metagenomics)
Platform comparison
Instrument First introduced Output per run Run-time Use in HIV/
HCV settings
Roche 454 FLX+/ Junior 2005
105-106
400-700bp reads
10-20 hrs Extensive (>300 papers)
Illumina HiSeq/MiSeq 2007 107-109
36-250bp reads7 hrs - 11 days Limited (~30
papers)
Life SciencesIonTorrent 2010 105-107
35-400 bp reads1-8 hrs Limited (<10
papers)
Pacific Biosciences PacBioRS
2011104-105
1000-10000 bp reads
1-2 hrs Limited (<10 papers)
✤ Being able to characterize HIV-1 populations rapidly and accurately is important for understanding pathogenesis, interplay between viruses and humoral responses, and the evolution of drug resistance
✤ Both HIV-1 and HCV exist as viral quasispecies in a host, i.e. many distinct viral strains are circulating at any given moment in time
✤ NGS has the potential to directly sequence many such strains
✤ Using multiplexing (multiple samples/run), high throughput can be achieved
Characterizing viral diversity within a host
Characterizing minority DRAMs
✤ Perhaps the clearest clinical application of NGS for HIV and HCV.
✤ Already know what mutations we are looking for (e.g. K103N).
✤ Which mutations are real?
✤ Sequencing error
✤ Assay error / reproducibility
✤ What frequency of mutations matter clinically?
Drug resistance associated mutations (DRAMs)
✤ Using bulk-sequencing (standard tests): all viral strains from a biological sample are PCR amplified and sequenced together
✤ Generates a “population” virus sequence that may hide mutations present in minority variants
✤ The basis of all current FDA approved sequencing tests
✤ Ambiguous peaks on the electropherogram reflect mixed populations
✤ Can detect minority variants at frequencies ≥20%
A T G T G C T G C C A C A G G G A T G G A A A G G A T C A C C A G C A A T A T T C C A A T G T A G C A T G A C G A A A A T C T T A G A G C C T T T T A G A A A A C A A A A T C C A G A A A T ABULK9590858075706560555045403530252015105
G T T A T A T A T Y A A T A C A T G G A T G A T T T G T A T G T G G G A T C T G A C T T A G A A A T A G G R CBULK150145140135130125120115110105100
Mixed bases
✤ Are we missing lower frequency variants?
✤ Do all four combinations of resolved mixtures (CA, CG, TA, TG) actually exist in the sample?
Bulk sequence
Cloning/Single genome sequencing✤ Cloning or limiting dilution PCR followed by Sanger sequencing:
single genome sequencing (SGS)
✤ Generates ~10-100 sequences; how representative is this of the entire population?
pNL4-‐3p6-‐rt
AB819 9 12-‐11-‐2002
AB958 13 11-‐6-‐2002
AB958 12 11-‐6-‐2002
AB570 12 12-‐13-‐2002
AB819 4 12-‐11-‐2002
AB958 9 11-‐6-‐2002
AB570 11 12-‐13-‐2002
AB819 6 12-‐11-‐2002
AB958 17 11-‐6-‐2002
AB570 4 12-‐13-‐2002
AB819 3 12-‐11-‐2002
AB819 8 12-‐11-‐2002
AB958 5 11-‐6-‐2002
AB570 13 12-‐13-‐2002
AB570 9 12-‐13-‐2002
AB595 33 2-‐20-‐1997
AB595 17 2-‐20-‐1997
AB595 16 2-‐20-‐1997
AB595 12 2-‐20-‐1997
AB595 29 2-‐20-‐1997
✤ Now have 3 variants / 20 clones
✤ Are we still missing lower frequency variants?
✤ Would we get the same counts if the experiment were repeated?
Cloning/SGS
Clone_0
Clone_19
Clone_1
Clone_2
Clone_3
Clone_4
Clone_5
Clone_6
Clone_7
Clone_8
Clone_9
Clone_10
Clone_11
Clone_12
Clone_13
Clone_14
Clone_15
Clone_16
Clone_17
Clone_18
0.01
Cloning/SGS
Clone_0
Clone_19
Clone_1
Clone_2
Clone_3
Clone_4
Clone_5
Clone_6
Clone_7
Clone_8
Clone_9
Clone_10
Clone_11
Clone_12
Clone_13
Clone_14
Clone_15
Clone_16
Clone_17
Clone_18
0.01
✤ Sampling variance could be quite high.
Clone_0
Clone_1
Clone_2
Clone_3
Clone_4
Clone_5
Clone_6
Clone_8
Clone_9
Clone_10
Clone_11
Clone_12
Clone_13
Clone_14
Clone_15
Clone_16
Clone_17
Clone_7
Clone_19
Clone_18
0.001
Replicate 1
Replicate 2
NGS approach
✤ Prepare amplicons, e.g. Blood → HIV RNA → cDNA → PCR 3 regions
✤ Multiplex multiple samples/regions on the plate
✤ Obtain 1000s of reads / sample from a single run 454 Junior
Library Prep
emulsion PCR
Sequencing
Data analysis
Env: C2-V3-C3(416 bp)
Pol: RT(534 bp)
Gag: p24(253 bp)
PacBio RS
>FYJLQU001AI1WJ rank=0036132 x=99.0 y=3537.0 length=250GGACATCAAGCAGCCATGCAAATGTTAAAAGAGACCATCAATGAG...>FYJLQU001AI1WJ rank=0036132 x=99.0 y=3537.0 length=25028 28 28 35 37 37 37 37 37 35 33 33 35 35 35 ...
>FYJLQU001AWHGJ rank=0036147 x=252.0 y=3537.5AAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAGACAGT...>FYJLQU001AWHGJ rank=0036147 x=252.0 y=3537.5 length=35421 18 18 32 33 33 35 35 35 35 25 27 31 28 31 ...
FASTQ output which needs to be converted to interpretable results: 10,000 - 1,000,000 of records like this
Massive data sets: needs tools to analyze
Quality informatics tools are essential.
NGS/454
✤ 9 variants identified.
✤ Would need >200 clones to detect lowest frequency ones reliably.
Sources of error
Library Prep
emulsion PCR
Sequencing
Data analysis
Viral template resamplingPCR recombinationPCR error
PCR errorMultiple templates on a bead
Base calling errorsDetection errors
Software limitationsImproper statistical analyses
454 sequencing error rates
✤ Sequencing clonal populations of bacetriophages measured a sequencing error of 0.25% per base.
✤ Most common errors are homopolymer runs that are too long or too short, e.g. AAAA could be reported as AAA or AAAAA.
✤ Solution: We developed an algorithm to map reads to “reference sequences” (e.g. subtype-specific HIV/HCV sequences or germline IgG alleles) which corrects for most of such errors.
✤ Many such algorithms exist; we are currently conducting a rigorous comparison among them.
Correcting sequencing error
✤ If one has 10000 reads covering a 400 bp amplicon and the reported sequencing error rate is a uniform 1%, then, on average, ✤ each read will have 4 errors ✤ each nucleotide position will have 100 (random) mutations
✤ Just because a sequencer reports the presence of a mutation, that does not meet that the mutation is real.
✤ We (and other groups) have developed statistical models and algorithms than can reliably detect minority variants at 0.25-0.5% frequencies, given sufficient coverage.
UCSD processing pipeline site report
Real
Instrument error
http://www.datamonkey.org
Experimental error
✤ In order to detect low frequency variants, we need a lot of input templates (e.g. high viral load).
✤ For few input templates, NGS could create a sense of false depth, by resampling the same templates over and over again.
✤ PCR amplification biases can cause allelic skewing (inflate or decrease frequencies of specific variants)
Reproducibility
Gianella et al, 2011 J Virol
PIDHXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)HXB2 Position in Reverse Transcriptase (%DRM)
PID 65 100 103 106 179 181 184 188 190 215 230I4 1 0.19 0.23 0.33 99.77 0.05 0.00 0.00 0.20 0.15 0.05 0.00 0.00I4 2 0.18 0.18 0.00 99.32 0.10 0.00 0.06 0.25 0.06 0.00 0.08 0.00J6 1 0.27 0.33 0.00 0.00 0.10 0.00 0.00 0.30 0.40 0.20 0.00 0.00J6 2 0.18 0.60 0.00 0.00 0.20 0.00 0.00 1.90 0.00 0.38 0.00 0.00L3 1 0.17 0.12 0.00 IR 0.00 0.00 0.00 0.14 0.57 0.00 0.00 0.00L3 2 0.22 0.57 0.13 4.55 0.04 0.00 0.10 0.24 0.14 0.00 0.10 0.00R2 1 0.38 0.00 0.00 17.74 0.00 0.00 0.00 0.00 0.81 0.00 0.00 IRR2 2 0.27 2.27 0.00 IR 0.00 0.23 0.00 0.89 0.22 0.00 0.00 IRR6 1 0.23 0.15 0.00 0.00 0.10 0.00 0.00 0.14 0.07 0.07 0.00 0.00R6 2 2.36 0.17 0.09 1.19 0.00 0.00 0.00 0.30 0.00 0.11 0.00 0.00U1 1 0.27 0.20 0.00 IR 0.00 0.00 0.17 0.17 0.00 0.00 0.00 0.64U1 2 0.34 0.00 0.00 IR 0.00 0.00 0.00 0.00 0.19 0.00 0.00 0.00U6 1 0.25 0.00 0.00 0.00 0.20 0.00 0.00 0.25 0.08 0.00 0.00 0.00U6 2 0.10 0.84 0.00 0.15 0.34 0.00 0.09 0.36 0.00 0.27 0.00 0.00U7 1 0.35 0.00 0.00 100 0.00 0.00 0.25 0.25 0.00 0.00 0.00 0.63U7 2 0.14 0.61 0.00 100 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00
One possible solution: Primer ID
✤ Tag each template with a random sequence tag/Primer ID in the cDNA primer.
✤ Use the sequence tag/Primer ID to identify PCR resampling.✤ Use the resampled sequences to create a consensus sequence.✤ Use the number of sequence tags/Primer IDs to define the number of
templates.
Jabara C et al PNAS 2011
✤ Creating a consensus sequence for each resampled template using Primer ID mitigates error from PCR and sequencing
ATGACGTC%
ATGACGTC%
ATGACGTC%ATGACGTC%ATGACGTC%
ATGACGTC%
ATGACGTC%
ATGACGTC%
Resampled)Templates)with)PCR)and)Sequencing)Errors) ) )Primer)ID)
Jabara C et al PNAS 2011
Good reproducibility between runs
y = 0.9943x R² = 0.80872
0
5
10
15
20
25
0 5 10 15 20 25
Ru
n 1
Run 2
Ron Swanstrom (pers. comm.)
Lowering the limit of detectionFisher et al J Virol 2012
TABLE 2 Resistance detected with bulk sequencing during first-line (bulk sequencing) and second-line (bulk sequencing and UDPs) failure
Patientno.
Mutation(s) during first-line NNRTI failuredetected by bulk sequencinga
Mutation(s) during second-line PI failure detected by:b
Bulk sequencing UDPS (frequency [%])
Reverse transcriptase ProteaseReversetranscriptase Protease NRTI NNRTI PI
1 A62V, M184I, V108I,Y181C, H221Y
M46I, L89M, I93L None M36I, L89M, I93L K65R (1.1), D67N (0.9), D67E (0.9),K219R (0.7)
V90I (0.8), A98E (0.9),K101E (5.9), K103R(44.7), K103N (5.1),K103E (3.2), V179I(0.5), Y181C (0.6),F227L (0.8), F227S(0.6), K238R (0.5)
I54T (0.5), M36I (99.3),L63P (0.9), L89M(99.3), I93L (99.7)
2 M184V, V106M M36I, L63P,L89M, I93L
None M36I, L63P,L89M, I93L
K65R (3.8), D67N (8.4), F77L (0.7),M184V (8.0), L210S (0.7), T215(0.8)
V90I (0.8), K101R(0.7), K103R (2.2),Y181C (1.9), F227S(0.5)
L23P (0.5), M36I (98.7),L63P (99.6), L89M(99.5), I93L(99.1)
3*c M184V, V90I, K103N,Y181C
K20R, M36I, L63P,L89M, I93L
None K20R, M36I,L89M, I93L
V118A (0.5)*, K219E (0.5)* V179I (5.8) I54M (0.7)*, I84V (0.9)*,K20R (98.3), M36I(98.6), I62V (1.2),L63P (2.0), A71T (1.3),L89M (99.6), I93L(99.8)
4 M184V, V108I, Y181C,H221Y
K20R, M36I,D60E, L89M,I93L
None K20R, M36I, D60E,L89M, I93L
K65R (2.7), D67N (1.5), F116S (0.5),M184V (2.6)
K101E (0.6), K101R(0.6), P225T (1.5),F227L (0.6), K238T(3.3), K238R (0.7)
M46V (0.7), F53L (0.6),F53S (0.5), K20R(73.5), M36I (67.5),M36L (31.6), D60E(59.3), L63P (38.0),L89M (79.0), I93L(99.7)
5 M184V M36L, L63P, I93L K103N M36L, L63P, I93L K65R (1.6), K65E (0.5), D67N (2.6),M184V (3.1)
K101E (1.3), K101R(0.6), K103N (55.2),V179D (1.0), P225T(1.0), F227L (0.6),F227S (0.6), K238T(2.9)
F53L (0.7), N88S (0.7),N88D (0.6), K20R(0.5), M36I (3.0),M36L (96.5), I93L(99.7)
6 M184V, K103N D60E, L63P, I93L None D60E, L63P, L93L K65R (1.0), K65E (0.6), K219E (0.5) K103E (0.8), G190E(0.7)
V82A (0.6), K20R (34.5),M36I (28.1), D60E(97.9), I62V (0.7),L63P (81.2), L93L(99.6)
7 M41L, K65R, V75I, M184V,K103R, V179D
M36I, L63S, T74S,I93L
V179D M36I, L63S, T74S,I93L
K65R (2.2), T215A (0.7), K219R (0.7),K219E (0.5)
V90I (1.0), V179D(6.1)
M36I (93.2), D60E (8.1),L63S (90.7), L63P(9.0), T74S (90.2), I93L(99.6)
a For the NNRTI failure episode, NRTI mutations are in roman, and NNRTI mutations are in italics.b For UDPS mutations, major PI resistance mutations are shown in bold, accessory mutations are in italics, and other amino acid variants at PI resistance loci are in roman type.c Asterisks indicate the detection of minor variants below the predicted threshold, based on the sample input (viral load of 520 copies/ml).
Fisheret
al.
6234jvi.asm
.orgJournalof
Virology
Clinical relevance
✤ NGS-based assays will detect many more DRAMs than current tests.✤ Multiple studies provide evidence that SOME low level NRTI and NNRTI DRAMs are
associated with subsequent virologic failure (also for FI)✤ Picture less clear with PI, likely due to the polyallelic nature of resistance✤ II to be investigated directly as are HCV antivirals✤ “The extent to which the detection of low-abundance DRMs will affect patient management is
still unknown but it is hoped that use of such an assay in clinical practice, will help resolve this important question”
Evaluation of a Bench-Top HIV Ultra-Deep Pyrosequencing Drug-Resistance Assay in the Clinical LaboratoryAvidor et al J Clin Microbiol 2013.
Tropism analysis using NGS
✤ Because NGS provide sequences, one can ask questions that require the knowledge of the entire sequence.
✤ CCR5 vs CXCR4 usage has implications for treatment (with fusion inhibitors), and clinical outcomes
Tropism analysis, clinical relevance
✤ Can either be measured experimentally (e.g. Enhanced Sensitivity Trofile Assay, ESTA), or by computational analyses of env V3 loop sequences (e.g. Geno2Pheno)
✤ Low level (e.g. 2%) X4 variants are predictive of FI failure, e.g. in the Maraviroc versus Efavirenz in Treatment-Naive Patients (MERIT) study
Swenson L C et al. Clin Infect Dis. 2011;53:732-742
N=312N=35
Does the choice of platform matter✤ Largely, no.
Archer et al PLoS ONE 2012
High throughput dual infection detection
✤ Blood → HIV RNA → cDNA → PCR 3 regions
✤ Sequenced 16 samples concurrently on single 454 GS FLX Titanium plate
✤ Processed reads (~5 mins/patients on a computer cluster) and generated phylogenies
✤ Interpreted nucleotide diversity > 2% (RT, gag) and > 5% (env), confirmed by phylogenetic bootstrap, as evidence of dual infection
Env: C2-V3-C3(416 bp)
Pol: RT(534 bp)
Gag: p24(253 bp)
Pacold et al ARHR 2010
identified samples A, B, C, E, F, and G as singly infected(Supplementary Figs. 1–6 and 11–13; Supplementary Data areavailable online at www.liebertonline.com/aid) and samplesD1, D2, H, and I as dually infected (Fig. 2 and SupplementaryFigs. 7–10 and 14–16). DI results specific to the coding regionsof each sample are shown in Table 2.
For nearly all the samples, the high read coverage of UDSidentified greater maximum divergence than SGS (Table 2).Duplicate UDS runs performed on the same sample cDNA forthe same coding regions agreed in DI status for all 20 cases.Combined phylogenies of UDS and SGS for each sample areshown in Figure 2 and the supplemental figures. The onesample (H) in which the divergence found by SGS in both C2–V3 and RT exceeded that of UDS was the sample with thelowest viral load tested, 1113 HIV RNA copies/ml, in whichthe calculated input copy number that was interrogated byUDS was only 52.3. UDS of the gag p24 region identified DIonly for sample I, which had the highest SM-Index of thecohort and was also the only sample whose UDS and SGS ofthe C2–V3 and RT coding regions both identified DI (Fig. 2).
Cost and time analyses
We estimated cost and time per sample for SM-Index, SGS,and UDS based on a batch of 16 samples (corresponding to asingle UDS run). The cost per sample for population-based pol
sequence was $278.18, for SGS of two coding regions$2,646.39, and for UDS of three coding regions $1,075.10.Costs of each sequencing type are summarized in Table 3. Ittook 3 hours to produce one sample’s population-based polsequence, 42 hours for one sample’s SGS, and 9.5 hours forone sample’s UDS. Cost and time estimates for parallel stepslike RNA extraction are highly throughput-dependent. UDScan be customized to produce fewer reads per sample at alower cost. As previously noted,11 many factors (such as pricereductions related to quantity) influence cost estimates andmay cause large price differences for experiments using thesame technologies.
Discussion
Systematic identification of HIV DI in large cohorts haspreviously relied on a variety of screeningmethods, includingpopulation-based sequencing analysis from different timepoints,2 counting sequencing ambiguities,9 heteroduplexmobility assays,29 and molecular analysis of a single codingregion.2 Single genome sequencing is the current standard toidentify distinct strains in a viral population; however, SGS istoo slow, expensive, and labor-intensive to be used as ascreening method for the presence of DI in hundreds orthousands of biological samples. In this study, two alternativemethods to detect DI were assessed. The SM-Index identified
FIG. 2. Sample I, UDS duplicate 1. First year of infection. DI in env, pol, and gag. UDS are represented as red circles and SGSas blue squares. Variant abundances per node and branches with >90% bootstrap support are labeled.
DETECTION OF HIV DUAL INFECTION 1295
Pacold et al ARHR 2010
High throughput dual infection detection
SGS:
UDS:
25 reads per sample-region
4,650 reads per sample-region
A B C D1 D2 H
A B C D1 E F D2 G H
E F GLow viral
load
✤ For all dually infected samples, UDS identified a greater within-sample divergence than SGS.
✤ Samples E and F both had divergence exceeding the DI threshold, but only Sample F exhibited DI-like population structure.
✤ UDS required 40% of the cost and 20% of the time for SGS.
“Gold-standard”
Pacold et al ARHR 2010
Method comparison
SGS NGS
Robustness for confirming DI High High
Throughput potential Low High
Labor High Medium
Time High Low
Cost High Medium (and dropping)
San Diego Primary Infection Cohort
L537
Q294
U189
Months after initial infection
12 24 36
N112
D224
K613
K908
P265
P853
S155
U796
Months after initial infection
12 24 36
4 CI!!!!!7 SI!
1 strain detected!2 strains detected!
✤ Samples sequenced to date show a prevalence of DI of 11/61 = 18%.
✤ Of the 7 SI cases:✤ 5 were SI in the first year of initial
infection (incidence: 8.2%)✤ 2 in the second year (incidence:
3.3%)
✤ Dual infections are much more frequent than expected.
Pacold et al AIDS 2012
Viral Dynamics of SI CasesSubject( Coding(Regions( Ini2al( Superinfec2ng( Recombinant(
1((K6)(RT# Replaced# Persists# Not#Detected#
C24V3# Replaced# Persists# Not#Detected#
2((K9)(RT# Replaced# Persists# Not#Detected#
C24V3# Replaced# Persists# Not#Detected#
3((D2)(RT# Persists# Persists# Persists#
C24V3# Persists# Transient# Persists#
4((P2)(RT# Persists# Not#Detected# Not#Detected#
C24V3# Persists# Persists# Persists#
5((P8)(RT# Persists# Transient# Not#Detected#
C24V3# Persists# Transient# Transient#
6((S1)(RT# Persists# Transient# Persists#
C24V3# Replaced# Transient# Persists#
7((U7)(RT# Persists# Persists# Transient#
C24V3# Persists# Transient# Transient#
4 6 8 10 12
100
200
300
400
K6 (p = 0.35)
4 6 8 10 12 14 16
010
020
030
0K9 (p = 0.10)
2 4 6 8 10 14
050
100
150
D2 (p = 0.0026)
Sqrt
(vira
l loa
d)
5 10 15 20 25 30 35
200
300
400
500
P2 (p = 0.66)
5 10 15 20 25 30 35
010
020
030
040
0P8 (p = 0.093)
5 10 15 20
5010
015
020
0
S1 (p = 0.0061)
2 4 6 8 10
050
010
0015
00
U7 (p = 0.0044)
EDI, months
Viral load dynamics for seven super-infected
patients
Open circle - beforeShaded circle - after
p-values are for the presence of a structural
shift
Clinical consequences
Molecular epidemiology of HIV-1
✤ Because HIV is a measurably evolving pathogen that accumulates sequence diversity within hosts at rates as high as 1-2% per year within the polymerase (pol) gene, viral sequences are nearly unique to each infected person.
✤ This distinct feature of the virus allows one to interrogate sequences for evidence of recent relatedness, and thus infer potential transmission links.
Establishing links
✤ Putative transmission links are established if the genetic distance between two pol sequences is below a threshold D (e.g. 1.5%)
✤ Median intra-subtype pairwise genetic distance is ~5%, and the probability that two randomly selected HIV-1 subtype B sequences are ≤1.5% distant is very low (p = 0.0022 for the SD AEH cohort and p = 0.0002 for a random sample)
San Diego Acute and Early Cohort
Den
sity,
AU
0 5 10 15
0.0
0.1
0.2
0.3
0.4
0.5
0.0 0.5 1.0 1.5
0.0
0.4
0.8
Random database sample
Den
sity,
AU
0 2 4 6 8 10 12 14
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.0 0.5 1.0 1.5
0.0
1.0
San Diego HIV molecular network (bulk sequences)
Direction resolvedbased on EDI
Viral load, log10 (copies/ml)
N/A
1.5-2.5
2.5-3.5
3.5-4.5
4.5-5.5
5.5-6.5
>6.5
TNS < 0.8
2
2
2
3
2
7
2
1912
2
2
74
2
2
2
2
2
19
2
2
6
2
2
3
2
3
2 2
5
2
21
2
2
122
2
2
2
3
2
2
2
10
2 2
2
22 2
10
2
TNS ! 0.8
N Number of timepoints (if > 1)
Linking transmission partners using NGS
✤ Because a substantial proportion of individuals may be multiply infected, we need to be able to draw links between minority populations.
✤ NGS data have been used in HPTN 052 (to confirm transmission links between serodiscordant couples)
A denser network of connections
✤ 64 new edges and 16 new nodes (a yield of ~1 connection / 2 NGS samples) were added to the network,
✤ The inclusion of NGS data ✤ increased the size of the largest
cluster from 62 to 156 nodes ✤ increased the number of “hubs” by
7 (from 51 to 58).
It pays to target highly connected nodes
Degree = 7
Degree = 1
Degree = 7
Degree = 1
Targeting a low degree node has a local effect
Targeting a high degree node has a global effect
Concept Contact Network Transmission network
Node Individual HIV+ individual
Edge A contact that could lead to HIV transmission, e.g. sexual, shared needle
Transmission event
Degree = edges connected to a node
Number of contacts associated with a node
Number of transmissions associated with a node
HIV+ HIV-Contact w/o tranmission
Transmission
Degree = 7
Degree = 1
Degree = 3
Transmission network is a subset of the contact network
Regulatory approval: the bad news
✤ No NGS platforms have been cleared/approved by FDA
✤ No standards to use for comparison
✤ No clear agreement on bioinformatics handling
✤ Lack of proficiency panels and reference materials
✤ Rapid change
Regulatory approval: the good news
✤ The industry, academia, and agencies (FDA, CAP, NCBI, etc) are actively collaborating on the issue
✤ Informatics rapidly improving and stabilizing
✤ Clinical relevance studies are ongoing
✤ This is primarily driven by human genomic applications, so HIV/HCV applications will benefit from the larger effort
✤ The Forum on Collaborative HIV research has held a series of roundtables to discuss issues relevant to HIV/HCV research, including the “Next Generation Sequencing Roundtable” in December 2012.
AcknowledgementsUCSDDavey SmithJason YoungSara Gianella WeibelSusan LittleDouglas RichmanRichard HaubrichGabe WagnerLance Hepler
UBCRichard HarriganArt FY PoonLife IncMary Pacold
Recommended