View
219
Download
1
Category
Tags:
Preview:
Citation preview
Omixon WorkshopsConsiderations for Analyzing Targeted NGS Data - IntroductionTim Hague, CEO
Targeted Data
Introduction
Many mapping, alignment and variant calling algorithms
Most of these have been developed for whole genome sequencing and to some extent population genetic studies
Premise
In contrast, NGS based diagnostics deals with particular genes or mutations of an individual
Different diagnostic targets present specific challenges
Goal
Present analysis issues related to differences in:
Sequencing technologies
Targeting technologies
Target specifics
Pseudogenes and segmental duplication
Roche 454Illumina IonTorrentt
NGS Sequencers
Illumina
Ion Torrent
Roche 454
(SOLiD)
Mind The Gap
Moore B, Hu H, Singleton M, De La Vega, FM, Reese MG, Yandell M. Genet Med. 2011 Mar;13(3):210-7.
Sequencing Technology
Differences: Homopolymer error rates G/C content errors Read length Sequencing protocols (single vs paired reads)
Targeting Methods
PCR primers (e.g. amplicons) Hybridization probes (e.g. exome kits)
Targeting Technology
Differences: Exact matching regions vs regions with SNPs
Results in: Need for mapping against whole chromosomes to
avoid false positives
Analysis Targets
Differences:
Rate of polymorphism
Repetitive structures
Mutation profiles
G/C content
Single genes vs multi gene complexes
BRCA1/2 HLA CFTR1/2000 1/29 1/2000
Distributions of insertions and deletions
Distribution of repeat elements
Segmental Duplications
Sometimes called Low Copy Repeats (LCRs)
Highly homologous, >95% sequence identity
Rare in most mammals
Comprise a large portion of the human genome (and other primate genomes)
Important for understanding HLA
Many LCRs are concentrated in "hotspots„
Recombinations in these regions are responsible for a wide range of disorders, including: – Charcot-Marie-Tooth syndrome type 1A– Hereditary neuropathy with liability to pressure palsies– Smith-Magenis syndrome– Potocki-Lupski syndrome
Segmental Duplications
Data analysis shouldn’t be like this!
Data Analysis Tools
Differences: Detection rates of complex variants (sensitivity) False positive rates (accuracy) Speed Ease of use
“Depending upon which tool you use, you can see pretty big differences between even the same genome called with different tools—nearly as big as the two Life Tech/Illumina genomes.”
Mark Yandel in BioIT-World.com, June 8, 2011
Examples
Missing variants
SNPs, a DNP and deletions
Identify More Valid Variants
Find Homopolymer Indels
Examples
Coverage differences
[0-432]
[0-96]
Four Times Exon Coverage
[0-24]
[0-10]
Higher Exome Coverage
First Conclusion
Read accuracy is not the limiting factor in accurate
variant analysis
Example - Dense Region of SNPs
Second Conclusion
As variant density increases the performance of most tools
goes down
Variant Calling
There are few popular variant callers: GATK, SAMtools mpileup, VarScan
The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment step
These recalibration and realignment steps are highly recommended to be run before any variant call
Deduplication and removing non-primary alignments may also be required
Indel Realigner Problem
Variants That Can be Hard to Find
DNPs TNPs Small indels next to SNPs 30+ bp indels Homopolymer indels Homopolymer indel and SNP together Indels in palindromes Dense regions of variants
Contact
Tim Hague, CEO
Omixon Biocomputing Solutions
Tim.Hague@omixon.com
+36 70 318 4878
Download our Omixon Target™ Evaluation Version
Today
OMIXON.COM
Recommended