1
Improved Detection of Low Level Sequence Variants by Sanger Sequencing using a New Noise Reduction Algorithm ABSTRACT Sanger sequencing using fluorescent BigDye® terminator chemistry and semiautomated capillary electrophoresis (CE) has long been considered the gold standard for identifying sequence variations such as diseasecausing mutations. The robustness, low error rate, ease of use, human interpretable visual displays of the signals generated by the instruments, and low cost per sample and target have all contributed to this reputation. Homozygous and heterozygous germ line mutations are reliably detected and reported using established DNA sequencing analysis software such as the Applied Biosystems Variant Reporter™ software. However, somatic variants with an allelic proportion of 25% or less are often undetected (i.e. not "called") by the software and thus escape awareness if not detected by careful visual Introduction Detecting and Distinguishing Minor Variants from Background Noise Minor variants are single nucleotide polymorphisms (SNPs) which present The world leader in serving science Edgar Schreiber, Harrison Leong, Stephanie Schneider, Jeff Marks, Michael Wenz, Stephan Berosik, Shiaw-Min Chen, Jonathan Erikson, Hanh Le, Joel Colburn et al. Thermo Fisher Scientific 180 Oyster Point Boulevard Genetic Analysis Solutions South San Francisco CA 94080 inspection of the electropherograms. With the rapid adoption of next generation sequencing technology (NGS) and its use for characterization of specific and discrete mutations in tumor samples, an urgent need has emerged to establish an orthogonal technology for reliable and sensitive detection of somatic mutations which may occur at proportions of 10% or lower compared to the normal allele. To this end, we have developed an innovative algorithm, software, and a protocol that specialize in the detection and reporting of minor mutations by Sanger sequencing. Moreover the algorithm preserves the ability to generate the familiar displays of the data to facilitate human review. Using panels of prepared mixtures of minor alleles in the range of 2.5%, 5%, 10% and 20%, we have achieved 94.6% sensitivity and 99.8% specificity for automated detection of mutations present at the 5% level with high quality data. In conclusion, we have demonstrated that standard protocols for fluorescent dye terminator Sanger sequencing in conjunction with the new algorithm delivered in Variant Finder software may enable the identification of de novo somatic mutations to a level of 5%. This technology will also be useful for the confirmation of minor variants identified by NGS platforms. For Research Use only – Not for use in diagnostic procedures. as a minor component i.e. with a contribution of less than 25% at a given allele. Minor variants may occur spontaneously or evolve during tumorigenesis (somatic mutation) or in viral, bacterial or mitochondrial mixed populations. Minor variants are difficult to detect by conventional fluorescent Sanger sequencing since the reduced peak trace of the minor variant may be hidden in the “background noise.” “Background noise” at the baseline of a typical fluorescent Sanger sequencing trace. The arrows point to two genuine minor variants. Could you tell? Detecting Minor Variants out of “Reproducible Noise” Experienced users of fluorescent Sanger sequencing systems may have noticed the remarkable consistency and reproducibility of the primary peak pattern in the sequence trace profiles when the same locus of interest is sequenced in different specimens. This phenomenon is due to the sequence context dependent nucleotide incorporation efficiencies during the polymerization process (Carr et al. 2008). The assumption is that the same principle applies not only to the primary predominant base but also to the other three bases that generate a characteristic pattern that is commonly referred to as the “background noise.” The feature of “reproducible noise” can be exploited to algorithmically reveal the signal indicative of a potential minor variant. To this end the dye traces of a bi directional (forward and reverse strand) sequencing from a test The Variant Finder tool scans the traces of a quartet of: • normal control forward strand • normal control reverse • test sample forward • test sample reverse for the presence of variants that User interface of the Variant Finder Detector tool: sample files (control and test samples) are transferred into the Analysis fields for forward reverse traces. An To this end, the dye traces of a bidirectional (forward and reverse strand) sequencing from a test sample where a minor variant is potentially present is compared to the dye traces of a normal control sample where it is known that the variant is bona fide absent. How is the Signal of the Minor Variant Distinguished from Background Noise? The main idea is that the background noise of the control sample can be used to remove background noise in the test sample if the noise between the two is similar enough. If the primary sequence backbone for the two samples is the same and the samples were processed in a similar fashion, there is a good chance that the background noise will be sufficiently similar. With this noise removed, variants in the test sample are revealed because the variants are not associated with the common primary sequence backbone. To complete the variant detection decision, pattern recognition is used to distinguish bona fide variant signals from any noise remaining in the traces. occur in matching (i.e. sequence complementary) positions on both strands. At least one normal control pair (fwd/rev) must be present. Many test sample pairs for the same amplicon can be analyzed in the same session. output directory is selected and variant finding is started by “ANALYZE.” When analysis is completed change to Variant Finder Viewer for review Candidate(s) for minor variant are presented In the VIEWER window for visual review and manual call for “Accept” or “Reject” Full view of trace quartet Black bars indicate variant candidates Noisepurified trace view for variant candidate Verifying Very Low Level Minor Variants: A 5% variant in the human TP53 Gene Using a slider in the center of the viewer, peaks can be scrutinized at highest detail. Colors for basespecific traces can be switched on or off. Correlating Minor Variant Findings by Ion Torrent™ PGM™ and Sanger CE Variant Finder Original trace view for variant candidate in test sample Original trace view for normal control sample switched on or off. This allows a thorough assessment whether a peak represents a genuine variant or a nonspecific noise signal. Sequence Scanner View Variant Finder Viewer Sensitivity & Specificity Statistics Summary Conclusions The Variant Finder tool facilitates detection and calling of minor variants as low as 5% from Sanger sequencing traces (.ab1 files). The algorithm neutralizes background noise signal by comparison of test sample(s) and a normal control sample. The minor variant is detected on forward and reverse strands. The minor variant candidates are presented for review in a convenient viewer tool and reported in a csv output file. The tool aids in verification of minor variant findings by NGS. Sample a3 Variant c35G>A at 5% (NGS) is barely visible in traditional Sequence Scanner viewer. Sample a3 Variant c35G>A at 5% (NGS) is clearly detected on both strands in the new Variant Finder viewer (note: Ttrace only is shown). Summary for sensitivity (in %) for detection of rare (minor) variants at known % level (xaxis) 375 file sets generated on Applied Biosystems 3730 or 3500 Genetic Analyzers were analyzed: Overall Specificity was 99.8% Sensitivity at 5% rare is 94.6% ROC curve plotting Sensitivity vs Specificity for the 5% variants (n=75) at different algorithm settings: The performance of the current algorithm is indicated by the red circle. The tool aids in verification of minor variant findings by NGS. For Research Use only – Not for use in diagnostic procedures. Acknowledgements: Geoff Bien and David Chi Life Technologies Services Lab West Sacramento CA and Manjula Aliminati for providing the KRAS NGS and Sanger data, Nakul Natarad and Kara Norman for early access to Acrometrix® MegaMix™ technology. © 2015 Thermo Fisher Scientific, Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified.

Detection of Low Level Sequence Variants by Sanger Sequencing | ESHG 2015 Poster PM16.56

Embed Size (px)

Citation preview

Page 1: Detection of Low Level Sequence Variants by Sanger Sequencing | ESHG 2015 Poster PM16.56

Improved Detection of Low Level Sequence Variants by Sanger Sequencing using a New Noise Reduction Algorithm

ABSTRACTSanger sequencing using fluorescent BigDye® terminator chemistry and semi‐automated capillary electrophoresis (CE) has long been considered the gold standard for identifying sequence variations such as disease‐causing mutations.  The  robustness, low error rate, ease of use, human interpretable visual displays of  the signals generated by the instruments, and low cost per sample and target have all contributed to this reputation. Homozygous and heterozygous germ line mutations are reliably detected and reported using established DNA sequencing analysis software such as the Applied Biosystems Variant Reporter™ software. However, somatic variants with an allelic proportion of 25% or less are often undetected (i.e. not "called") by the software and thus escape awareness if not detected by careful visual 

Introduction

Detecting and Distinguishing Minor Variants from Background Noise

Minor variants are single nucleotide polymorphisms (SNPs) which present 

1 Proprietary & Confidential

The world leader in serving science

Edgar Schreiber, Harrison Leong, Stephanie Schneider, Jeff Marks, Michael Wenz, Stephan Berosik, Shiaw-Min Chen, Jonathan Erikson, Hanh Le, Joel Colburn et al.

g

Thermo Fisher Scientific 180 Oyster Point BoulevardGenetic Analysis Solutions South San Francisco CA 94080

inspection of the electropherograms. With the rapid adoption of next generation sequencing technology (NGS) and its use for characterization of specific and discrete mutations in tumor samples, an urgent need has emerged to establish an orthogonal technology for reliable and sensitive detection of somatic mutations which may occur at proportions of 10% or lower compared to the normal allele.

To this end, we have developed an innovative algorithm, software, and a protocol that specialize in the detection and reporting of minor mutations by Sanger sequencing.   Moreover the algorithm preserves the ability to generate the familiar displays of the data to facilitate human review.  Using panels of prepared mixtures of minor alleles in the range of 2.5%, 5%, 10% and 20%, we have achieved 94.6% sensitivity and 99.8% specificity for automated detection of mutations present at the 5% level with high quality data.  

In conclusion, we have demonstrated that standard protocols for fluorescent dye terminator Sanger sequencing in conjunction with the new algorithm delivered in Variant Finder software may enable the identification of de novo somatic mutations to a level of 5%.  This technology will also be useful for the confirmation of minor variants identified by NGS platforms.

For Research Use only – Not for use in diagnostic procedures.

p y p ( ) pas a minor component i.e. with a contribution of less than 25% at a given allele. 

Minor variants may occur spontaneously or evolve during tumorigenesis (somatic mutation) or in viral, bacterial or mitochondrial mixed populations. 

Minor variants are difficult to detect by conventional fluorescent Sanger sequencing since the reduced peak trace of the minor variant may be hidden in the “background noise.”

“Background noise” at the baseline of a typical fluorescent Sanger sequencing trace. The arrows point to two genuine minor variants. Could you tell?

Detecting Minor Variants out of “Reproducible Noise”

Experienced users of fluorescent Sanger sequencing systems may have noticed the remarkable consistency and reproducibility of the primary peak pattern in the sequence trace profiles when the same locus of interest is sequenced in different specimens. This phenomenon is due to the sequence context dependent nucleotide incorporation efficiencies during the polymerization process (Carr et al. 2008). The assumption is that the same principle applies not only to the primary predominant base but also to the other three bases that generate a characteristic pattern that is commonly referred to as the “background noise.”  The feature of “reproducible noise” can be exploited to algorithmically reveal the signal indicative of a potential minor variant. 

To this end the dye traces of a bi directional (forward and reverse strand) sequencing from a test

The Variant Finder tool scans the traces  of a quartet of:  

• normal control forward  strand• normal control reverse• test sample forward• test sample reverse

for the presence of  variants that 

User interface of the Variant Finder  Detector tool: sample files (control and test samples) are transferred into the Analysis fields for forward reverse traces.  An 

To this end, the dye traces of a bi‐directional (forward and reverse strand) sequencing from a test sample where a minor variant is potentially present is compared to  the dye traces of a normal control sample where it is known that the variant is bona fide absent.

How is the Signal of the Minor Variant Distinguished from Background Noise? The main idea is that the background noise of the control sample can be used to remove background noise in the test sample if the noise between the two is similar enough.   If the primary sequence backbone for the two samples is the same and the samples were processed in a similar fashion, there is a good chance that the background noise will be sufficiently similar.  With this noise removed, variants in the test sample are revealed because the variants are not associated with the common primary sequence backbone.   To complete the variant detection decision, pattern recognition is used to distinguish bona fide variant signals from any noise remaining in the traces.

occur  in matching (i.e. sequence complementary)  positions on both strands.

At  least one normal control pair (fwd/rev) must be present.

Many test sample pairs  for the same amplicon  can be analyzed in the same session.

output directory is selected and  variant finding  is started by “ANALYZE.” 

When analysis  is completed change to Variant Finder Viewer for review

Candidate(s) for minor variant are presented In the VIEWER window for visual review and  manual call for “Accept” or “Reject”

Full view of trace quartet Black bars indicate variant candidates

Noise‐purified trace view for variant candidate

Verifying Very Low Level Minor Variants:  A 5% variant in the human TP53 Gene

Using a slider            in the center of the viewer, peaks can be scrutinized at highest detail. Colors for base‐specific traces can be switched on or off.

Correlating Minor Variant Findings by Ion Torrent™ PGM™ and Sanger CE Variant Finder

Original trace view for variant candidate in test sample

Original trace view for normal control sample

switched on or off. This allows a thorough assessment whether a peak represents a genuine variant or a nonspecific noise signal.

Sequence Scanner View Variant Finder ViewerSensitivity & Specificity Statistics Summary

Conclusions• The Variant Finder tool facilitates detection and calling of minor 

variants as low as 5% from Sanger sequencing traces (.ab1 files).

• The algorithm neutralizes background noise signal by comparison of test sample(s) and a normal control sample. 

• The minor variant is detected on forward and reverse strands.

• The minor variant candidates are presented for review in a convenient viewer tool and reported in a csv output file.

• The tool aids in verification of minor variant findings by NGS.

Sample a3 Variant c35G>A at 5% (NGS)is barely visible in traditional Sequence Scanner viewer.

Sample a3 Variant c35G>A at 5% (NGS)is clearly detected on both strands in the new Variant Finder viewer (note: T‐trace only is shown).

Summary for sensitivity (in %) for detection of rare (minor) variants at known % level (x‐axis) 

375 file sets generated on Applied Biosystems 3730 or 3500 Genetic Analyzers were analyzed:Overall Specificity was 99.8%Sensitivity at 5% rare is 94.6%

ROC curve plotting Sensitivity vsSpecificity for the 5% variants (n=75)  at different algorithm settings:

The performance of the current algorithm is indicated by the red circle.

The tool aids in verification of minor variant findings by NGS.

• For Research Use only – Not for use in diagnostic procedures.

Acknowledgements: Geoff Bien and David Chi Life Technologies Services Lab  West Sacramento CA and Manjula Aliminati for providing  the KRAS NGS and Sanger data, Nakul Natarad and Kara Norman for early access to Acrometrix® MegaMix™ technology.

© 2015 Thermo Fisher Scientific, Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified.