Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these...

Omixon WorkshopsConsiderations for Analyzing Targeted NGS Data - IntroductionTim Hague, CEO

Targeted Data

Introduction

Many mapping, alignment and variant calling algorithms

Most of these have been developed for whole genome sequencing and to some extent population genetic studies

Premise

In contrast, NGS based diagnostics deals with particular genes or mutations of an individual

Different diagnostic targets present specific challenges

Present analysis issues related to differences in:

Sequencing technologies

Targeting technologies

Target specifics

Pseudogenes and segmental duplication

Roche 454Illumina IonTorrentt

NGS Sequencers

Illumina

Ion Torrent

Roche 454

(SOLiD)

Mind The Gap

Moore B, Hu H, Singleton M, De La Vega, FM, Reese MG, Yandell M. Genet Med. 2011 Mar;13(3):210-7.

Sequencing Technology

Differences: Homopolymer error rates G/C content errors Read length Sequencing protocols (single vs paired reads)

Targeting Methods

PCR primers (e.g. amplicons) Hybridization probes (e.g. exome kits)

Targeting Technology

Differences: Exact matching regions vs regions with SNPs

Results in: Need for mapping against whole chromosomes to

avoid false positives

Analysis Targets

Differences:

Rate of polymorphism

Repetitive structures

Mutation profiles

G/C content

Single genes vs multi gene complexes

BRCA1/2 HLA CFTR1/2000 1/29 1/2000

Distributions of insertions and deletions

Distribution of repeat elements

Segmental Duplications

Sometimes called Low Copy Repeats (LCRs)

Highly homologous, >95% sequence identity

Rare in most mammals

Comprise a large portion of the human genome (and other primate genomes)

Important for understanding HLA

Many LCRs are concentrated in "hotspots„

Recombinations in these regions are responsible for a wide range of disorders, including: – Charcot-Marie-Tooth syndrome type 1A– Hereditary neuropathy with liability to pressure palsies– Smith-Magenis syndrome– Potocki-Lupski syndrome

Segmental Duplications

Data analysis shouldn’t be like this!

Data Analysis Tools

Differences: Detection rates of complex variants (sensitivity) False positive rates (accuracy) Speed Ease of use

“Depending upon which tool you use, you can see pretty big differences between even the same genome called with different tools—nearly as big as the two Life Tech/Illumina genomes.”

Mark Yandel in BioIT-World.com, June 8, 2011

Examples

Missing variants

SNPs, a DNP and deletions

Identify More Valid Variants

Find Homopolymer Indels

Examples

Coverage differences

[0-432]

[0-96]

Four Times Exon Coverage

[0-24]

[0-10]

Higher Exome Coverage

First Conclusion

Read accuracy is not the limiting factor in accurate

variant analysis

Example - Dense Region of SNPs

Second Conclusion

As variant density increases the performance of most tools

goes down

Variant Calling

There are few popular variant callers: GATK, SAMtools mpileup, VarScan

The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment step

These recalibration and realignment steps are highly recommended to be run before any variant call

Deduplication and removing non-primary alignments may also be required

Indel Realigner Problem

Variants That Can be Hard to Find

DNPs TNPs Small indels next to SNPs 30+ bp indels Homopolymer indels Homopolymer indel and SNP together Indels in palindromes Dense regions of variants

Contact

Tim Hague, CEO

Omixon Biocomputing Solutions

Tim.Hague@omixon.com

+36 70 318 4878

Download our Omixon Target™ Evaluation Version

OMIXON.COM

Targeted Data Introduction Many mapping, alignment and variant calling algorithms Most of these...

Documents

Achieve improved variant detection in single cell sequencing infographic

Practical Guideline for Whole Genome Sequencing · PDF fileHow have BIG data problems been solved in next generation sequencing? Base calling Aligning Variant Calling gkno.me. Whole

Variant Calling (using High-‐throughput Sequencing Data) · Variant Calling ... SNPs, indels, structural varia on ... sequences there is only one set of diﬀerences

Whole Exome Sequencing for Variant Discovery and Prioritisation

Research Article A Comparison of Variant Calling Pipelines Using Genome …downloads.hindawi.com/journals/bmri/2015/456479.pdf · · 2015-11-24A Comparison of Variant Calling Pipelines

Release 1.0 BBGLab · IntOGen, Release 1.0 6.When variant calling information was available, samples from the same cohort and sequencing type were further classiﬁed according to

Whole Exome Sequencing for Variant Discovery and Prioritisation

High Sensitivity Sanger Sequencing for Minor Variant Detection

Variant Calling - Université de Lille · Variant callers are not concordant Mean single-nucleotide variants (SNV) concordance over 15 exomes between five alignment and variant-calling

SNP calling from Next Generation Sequencing data

Variant Calling Workshop: Bioinformatics Tools

Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant

analytics - Homolog.us · NGS Workflow: Bina Analytics solutions . 4. Sequencing . 2º Analysis . 3º Analysis Interpretation . Raw Reads Variant Calling (SV and SNVs) RNAseq analysis

Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016

Assessment of variant calling pipelines for clinical diagnosismed.stanford.edu/content/dam/sm/gbsc/PMWC17_VandhanaKrishnan.… · Assessment of variant calling pipelines for clinical

Variant Calling Workshop Chris Fields Variant Calling Workshop | Chris Fields | 20151 PowerPoint by Casey Hanson

Haplotype-based variant detection from short-read sequencing

Assessment of variant calling pipelines for clinical diagnosis

Bioinformatics Pipeline for Next Generation Sequencing ... · Figure 3-7. Worklow diagram showing the steps to be followed after GATK variant calling. Cases and controls are merged

variant detection An introduction to (small)clavius.bc.edu/~erik/CSHL-advanced-sequencing/CSHL...An introduction to (small) variant detection Erik Garrison Advanced Sequencing 13 Cold