1
Fully phased allele-level sequencing of highly polymorphic HLA genes is greatly facili- tated by SMRT ® sequencing technology. In the present work, we have evaluated mul- tiple DNA barcoding strategies for multiplexing several loci from multiple individuals, using three different tagging methods. Specifically MHC class I genes HLA-A, -B, and –C were indexed via DNA Barcodes by either tailed primers or barcoded SMRTbell™ adapters. Eight different 16-bp barcode sequences were used in symmetric & asymme- tric pairing. Eight DNA barcoded adapters in symmetric pairing were independently ligated to a pool of HLA-A, -B and –C for eight different individuals, one at a time and pooled for sequencing on a single SMRTcell. Amplicons generated from barcoded pri- mers were pooled upfront for library generation. Eight symmetric barcoded primers were generated for HLA class I genes. These primers facilitated multiplexing of 8 sam- ples and also allowed generation of unique asymmetric pairings for simultaneous am- plification from 28 reference genomic DNA samples. The data generated from all 3 me- thods was analyzed using LAA protocol in SMRT analysis V2.3. Consensus sequences generated were typed using GenDx NGS engine HLA-typing software. Evaluation of multiplexing strategies for HLA genotyping using PacBio ® Sequencing technology HLA Sequencing on PacBio ® RSII Figure 1. Allele Sequence Coverage Comparison: A HLA-A amplifies using NGSGo Reagent & Sequenced on PacBio RS II B Traditional Sanger sequencing of exons #2 and #3. . Conclusions • The long read lengths and high consensus accuracy of SMRT Se- quencing make it well suited for analyzing HLA loci • In this pilot study we demonstrated that the Pac Bio RS II sequen- cing platform is capable of typing HLA class I alleles in a highly ac- curate way. • Robust amplification performance for symmetric and asymmetric barcode-labelled amplification primers was shown • High resolution typing results at the 3rd to 4th field resolution level were obtained concordant with pre-typing results. Only rare typing inconsistencies in case of unbalanced heterozygous amplification. • Phased consensus sequences for complex HLA class I multiplex se- quences over the entire length of an amplicon were obtained using NGSengine (GenDx) Figure 2. HLA B Diversity due to Exon Combinations Numbers above Exons denote unique CDS exons, while numbers between Exons denote the number of unique combinations with neighboring exons SMRT ® Sequencing Chemistry Figure 3. SMRT Sequencing Read-Length Distribution Distribution of read lengths from a typical SMRT Sequencing run on a PacBio® RS II using the P5/C3 chemistry. With Median read lengths >8 kb and maxi- mum up to 30kb. Full length HLA genes are sequenced and correctly phased without clo- ning or manual curation. Figure 4. SMRT Sequencing Consensus Concordance Concordance of consensus sequences by average genome coverage from SMRT Se- quencing using the P5/C3 chemistry. When sequencing errors are truly random, consensus accuracy depends only on having sufficient coverage. Multiplexing Strategies Long Amplicon Analysis Figure 8. HLA typing score for multiple library preparation strategies using NGSengine software. Three different libraries were run on the Pac Bio RSII system and typed using GenDx NGSengi- ne typing software. High resolution (3rd and 4th field) typings were obtained that are concordant with the pre-typings: HLA-A score was 100% for all methods. HLA-B score was 100%, except for the asymmetric set (96%) due to n=1 missed B*08 allele. HLA-C score was 100% (unlabeled), 88% (barcode-labeled, symm), and 93% (barcode-labeled, asymm) due to n=1 missed C*03:04 allele. Results References [1] Robinson, James, et al. “the IMGT/HLA database.” Nucleic acids research41.D1 (2013): D1222-D1227. [2] Chin, Chen-Shan, et al. “Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.” Nature methods 10.6 (2013): 563-569. [3] https://github.com/bnbowman/HlaTools Swati Ranade1, Kevin Eng1, John Harting1, Erik Rozemuller2, Nienke Westerink2, Brett Bowman1, Lance Hepler1, Maarten T Penning2 1Pacific Biosciences of California, Inc., Menlo Park, United States of America 2GenDx, Utrecht, Netherlands Figure 6. Diagram of Long Amplicon Analysis Barcoded, sub-reads are grouped by barcode pair and processed independently within each group. Sub-reads are filtered based on user-definable criteria for read quality and length. Sub- reads that pass all filters are aligned to each other and clustered based on the results. Each clus- ter is iteratively “phased” by identifying and separating sub-reads based on high-scoring muta- tions in a De novo pipeline. Each resulting sub-cluster is polished with Quiver to generate a high-quality consensus [2]. The consensus sequences are filtered to remove PCR artifacts. Figure 7. Amplification performance for barcoded Amplification pri- mers for HLA-A, -B, and –C. HLA-A, -B, and –C locus-specification amplification was performed with unlabeled (Method I) and barcode-labelled amplification primers (Method II, symmetric vs asymmetric). Amplification per- formance was scored for robustness (%) and balanced allele fraction as determined by Sanger SBT. Figure 5. Multi- plexing Strategies and Workflows A Symmetric SMRT- bell™ barcodes: These bar- codes are attached to the adapters and is the least preferred barcode method as libraries are to be pre- pared separately before pooling B Symmetric Primer barcodes: Primers are tag- ged with symmetric barco- des on both forward and reverse orientation C Asymmetric Primer Barcodes: Primers are tag- ged with asymmetric bar- codes Recombination events also add to diversity in HLA genes (2). Exon only sequencing is therefore insufficient for resolving, variation in new alleles, sometimes caused by muta- tions occurring outside exon 2& 3 as well as the CDS region. Fully phased, allele-level genotyping with phasing across exons and introns for accurate SNP determination, in a single read span is highly advantageous. Abstract One Patient one Barcode for all HLA genes Method 1: Barcoded SMRTbell Adapters Method 2: Barcoded Amplification Primers 0 10 20 30 40 50 60 70 80 90 100 Unlabelled Labelled (Symm) Labelled (Asymm) Typing score NGSengine HLA-A HLA-B HLA-C Primary cause of missed alleles in both symmetric and asymmetric barcode labeled primer was amplification imbalance Figure 9. Example of HLA-A, -B, and –C typing result as generated with NGSengine. The location of the 5’UTR and 3’UTR (blue), exons (yellow) and introns (connecting black lines) are shown. The vertical colored bars indicate the heterozygous positions. Full phasing across the entire gene was obtained (bold horizontal red line). High resolution typing without exon mismat- ches are demonstrated. For Research USe Only. Not for use in diagnostic procedures. Pacific Biosciences, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences of California, Inc. GenDx, NGSgo, NGSengine are trademarks of Genome Diagnostics b.v. All other trademarks are the property of their respective owners 2014 Copyright GenDx, all rights reserved. 1,000 kb 2,000 kb 3,000 kb 4,000 kb 5,000 kb 6,000 kb A. Full Length + SMRT Sequencing B. Traditional SBT exon 1 exon 2 exon 3 exon 4 exon 5 exon 6 42 1210 1591 168 36 3 1836 4237 1275 819 62 0 10 20 30 40 50 60 70 80 90 100 Unlabelled Labelled (Symm) Labelled (Asymm) Amplification performance HLA-A HLA-B HLA-C 0 10 20 30 40 50 60 70 80 90 100 Unlabelled Labelled (Symm) Labelled (Asymm) Balanced amplification HLA-A HLA-B HLA-C HLA-A HLA-B HLA-C Separate by Barcode Filter Subreads Overlap Cluster Asymmetric Primers Symmetric Primers Symmetric Adapters Phase Filter Consensus Sequences Quiver Quiver Report

Evaluation of multiplexing strategies for HLA genotyping

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluation of multiplexing strategies for HLA genotyping

Fully phased allele-level sequencing of highly polymorphic HLA genes is greatly facili-tated by SMRT ® sequencing technology. In the present work, we have evaluated mul-tiple DNA barcoding strategies for multiplexing several loci from multiple individuals, using three different tagging methods. Specifically MHC class I genes HLA-A, -B, and –C were indexed via DNA Barcodes by either tailed primers or barcoded SMRTbell™ adapters. Eight different 16-bp barcode sequences were used in symmetric & asymme-tric pairing. Eight DNA barcoded adapters in symmetric pairing were independently ligated to a pool of HLA-A, -B and –C for eight different individuals, one at a time and pooled for sequencing on a single SMRTcell. Amplicons generated from barcoded pri-mers were pooled upfront for library generation. Eight symmetric barcoded primers were generated for HLA class I genes. These primers facilitated multiplexing of 8 sam-ples and also allowed generation of unique asymmetric pairings for simultaneous am-plification from 28 reference genomic DNA samples. The data generated from all 3 me-thods was analyzed using LAA protocol in SMRT analysis V2.3. Consensus sequences generated were typed using GenDx NGS engine HLA-typing software.

Evaluation of multiplexing strategies for HLA genotyping using PacBio® Sequencing technology

HLA Sequencing on PacBio® RSII

Figure 1. Allele Sequence Coverage Comparison:A HLA-A amplifies using NGSGo Reagent & Sequenced on PacBio RS II B Traditional Sanger sequencing of exons #2 and #3.

.

Conclusions

• The long read lengths and high consensus accuracy of SMRT Se-quencing make it well suited for analyzing HLA loci

• In this pilot study we demonstrated that the Pac Bio RS II sequen-cing platform is capable of typing HLA class I alleles in a highly ac-curate way.

• Robust amplification performance for symmetric and asymmetric barcode-labelled amplification primers was shown

• High resolution typing results at the 3rd to 4th field resolution level were obtained concordant with pre-typing results. Only rare typing inconsistencies in case of unbalanced heterozygous amplification.

• Phased consensus sequences for complex HLA class I multiplex se-quences over the entire length of an amplicon were obtained using NGSengine (GenDx)

Figure 2. HLA B Diversity due to Exon Combinations Numbers above Exons denote unique CDS exons, while numbers between Exons denote the number of unique combinations with neighboring exons

SMRT® Sequencing Chemistry

Figure 3. SMRT Sequencing Read-Length DistributionDistribution of read lengths from a typical SMRT Sequencing run on a PacBio® RS II using the P5/C3 chemistry. With Median read lengths >8 kb and maxi-mum up to 30kb. Full length HLA genes are sequenced and correctly phased without clo-ning or manual curation.

Figure 4. SMRT Sequencing Consensus ConcordanceConcordance of consensus sequences by average genome coverage from SMRT Se-quencing using the P5/C3 chemistry.When sequencing errors are truly random, consensus accuracy depends only on having sufficient coverage.

Multiplexing Strategies Long Amplicon Analysis

Figure 8. HLA typing score for multiple library preparation strategies using NGSengine software. Three different libraries were run on the Pac Bio RSII system and typed using GenDx NGSengi-ne typing software. High resolution (3rd and 4th field) typings were obtained that are concordant with the pre-typings: HLA-A score was 100% for all methods. HLA-B score was 100%, except for the asymmetric set (96%) due to n=1 missed B*08 allele. HLA-C score was 100% (unlabeled), 88% (barcode-labeled, symm), and 93% (barcode-labeled, asymm) due to n=1 missed C*03:04 allele.

Results

References

[1] Robinson, James, et al. “the IMGT/HLA database.” Nucleic acids research41.D1 (2013): D1222-D1227.[2] Chin, Chen-Shan, et al. “Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.” Nature methods 10.6 (2013): 563-569.[3] https://github.com/bnbowman/HlaTools

Swati Ranade1, Kevin Eng1, John Harting1, Erik Rozemuller2, Nienke Westerink2, Brett Bowman1, Lance Hepler1, Maarten T Penning21Pacific Biosciences of California, Inc., Menlo Park, United States of America2GenDx, Utrecht, Netherlands

Figure 6. Diagram of Long Amplicon Analysis Barcoded, sub-reads are grouped by barcode pair and processed independently within each group. Sub-reads are filtered based on user-definable criteria for read quality and length. Sub-reads that pass all filters are aligned to each other and clustered based on the results. Each clus-ter is iteratively “phased” by identifying and separating sub-reads based on high-scoring muta-tions in a De novo pipeline. Each resulting sub-cluster is polished with Quiver to generate a high-quality consensus [2]. The consensus sequences are filtered to remove PCR artifacts.

Figure 7. Amplification performance for barcoded Amplification pri-mers for HLA-A, -B, and –C. HLA-A, -B, and –C locus-specification amplification was performed with unlabeled (Method I) and barcode-labelled amplification primers (Method II, symmetric vs asymmetric). Amplification per-formance was scored for robustness (%) and balanced allele fraction as determined by Sanger SBT.

Figure 5. Multi-plexing Strategies and Workflows A Symmetric SMRT-bell™ barcodes: These bar-codes are attached to the adapters and is the least preferred barcode method as libraries are to be pre-pared separately before pooling

B Symmetric Primer barcodes: Primers are tag-ged with symmetric barco-des on both forward and reverse orientation

C Asymmetric Primer Barcodes: Primers are tag-ged with asymmetric bar-codes

Recombination events also add to diversity in HLA genes (2). Exon only sequencing is therefore insufficient for resolving, variation in new alleles, sometimes caused by muta-tions occurring outside exon 2& 3 as well as the CDS region. Fully phased, allele-level genotyping with phasing across exons and introns for accurate SNP determination, in a single read span is highly advantageous.

Abstract

One Patient one Barcode

for all HLA genes

Method 1:Barcoded SMRTbell Adapters

Method 2:Barcoded Amplification

Primers

0

10

20

30

40

50

60

70

80

90

100

Unlabelled Labelled (Symm) Labelled (Asymm)

Typing score NGSengineHLA-A HLA-B HLA-C

Primary cause of missed alleles in both symmetric and asymmetric barcode

labeled primer was amplification imbalance

Figure 9. Example of HLA-A, -B, and –C typing result as generated with NGSengine. The location of the 5’UTR and 3’UTR (blue), exons (yellow) and introns (connecting black lines) are shown. The vertical colored bars indicate the heterozygous positions. Full phasing across the entire gene was obtained (bold horizontal red line). High resolution typing without exon mismat-ches are demonstrated.

For R

esea

rch

USe

Onl

y. N

ot fo

r use

in d

iagn

ostic

pro

cedu

res.

Paci

fic B

iosc

ienc

es, P

acBi

o, S

MRT

, SM

RTbe

ll an

d Is

o-Se

q ar

e tr

adem

arks

of P

acifi

c Bi

osci

ence

s of

Cal

iforn

ia, I

nc.

Gen

Dx,

NG

Sgo,

NG

Seng

ine

are

trad

emar

ks o

f Gen

ome

Dia

gnos

tics

b.v.

All

othe

r tra

dem

arks

are

the

pro

pert

y of

the

ir re

spec

tive

owne

rs20

14 C

opyr

ight

Gen

Dx,

all

right

s re

serv

ed.

1,000 kb 2,000 kb 3,000 kb 4,000 kb 5,000 kb 6,000 kb

A. Full Length + SMRT Sequencing

B. Traditional SBT

exon 1 exon 2 exon 3 exon 4 exon 5 exon 6

42 1210 1591 168 36 3

1836 4237 1275 819 62

0

10

20

30

40

50

60

70

80

90

100

Unlabelled Labelled (Symm) Labelled (Asymm)

Amplification performance HLA-A HLA-B HLA-C

0

10

20

30

40

50

60

70

80

90

100

Unlabelled Labelled (Symm) Labelled (Asymm)

Balanced amplificationHLA-A HLA-B HLA-C

HLA-A

HLA-B

HLA-C

Separate by Barcode

FilterSubreads Overlap Cluster

AsymmetricPrimers

SymmetricPrimers

SymmetricAdapters

Phase

FilterConsensusSequences

Quiver

Quiver

Report