View
52
Download
1
Category
Preview:
Citation preview
Effect of Repeats on the Characterization of Structural Variation
Nancy F. Hansen, Ph.D..
September 15, 2016
Outline of my talk• Description of “PBRefine” callset
• Refinement of regions by alignment of PacBio assemblies to the human reference (Build37) with nucmer (MUMmer3.23, Kurtz et al., Genome Biology (2004))
• Characterization of SVs using mummerplot dot plots• Role of repeats in curation of structural variation
• Ambiguities in the positions of insertions and deletions due to repeats• “Correct” answer can be dependent on alignment algorithm• Evidence from different technological platforms can point to different
breakpoints
The PBRefine Pipeline
Extract reference sequence
surrounding variant predictions from
reference
Align reference sequence to PB assembly* with
MUMmer
Count end-to-
end alignment
s
Discard region as repetitive
Align assembly region back to reference with
MUMmer
Characterizevariants
More than 2
2 orfewer
* CA and hybrid Falcon assemblies for all three trio members
Why long read assemblies for structural variant prediction?
• Continuity• Consensus accuracy
Why not long read assemblies?
• Often assemblers will miss the second haplotype for diploid organisms
Accurate positions, accurate consensus for novel inserted sequences
Inaccurate genotypes for heterozygotes labeled as homozygotes
How often are variants confirmed?1. Consider only SVs for which there are one or two contigs found
in the assembly2. Require consistent position and variant type
Variant Type
Total Calls
Assembler Variants confirmed in HG002
Variants confirmed in HG003
Variants confirmed in HG004
Overall 6,784 Mt. Sinai/Falcon
1,851 (27.3%)
1,729 (25.5%)
1,708 (25.2%)
NHGRI/CA 1,808 (26.7%)
1,565 (23.1%)
1,545 (22.8%)
Insertions
743 Mt. Sinai/Falcon
171 (23.0%)
157 (21.1%)
156 (21.0%)
NHGRI/CA 155 (20.9%)
134 (18.0%)
130 (17.5%)
Deletions
6,041 Mt. Sinai/Falcon
1,680 (27.8%)
1,572 (26.0%)
1,552 (25.7%)
NHGRI/CA 1,653 (27.4%)
1,431 (23.7%)
1,415 (23.4%)
(Mummerplot, Adam Philippy)
Simple deletion
Reference
Asse
mbl
ySimple deletion
Dr
Size of deletion=Dr
Simple deletion
Simple deletion
Deletion flanked by repeated sequence
Reference
Asse
mbl
yDeletion flanked by repeated sequence
Dr
Dc
Size of deletion=Dr - Dc
Deletion flanked by repeated sequence
Simple insertion with duplication of flanking sequence
Simple insertion
Reference
Asse
mbl
y
Simple insertion with duplication of flanking sequence
Simple insertion
Insertion of an additional copy of a tandem repeat
Tandem insertion
Insertion of an additional copy of a tandem repeat
Tandem insertion
Inversions
Inversion
Inversions
Inversion
Deletion of one copy of a tandem inverted repeat
Tandem inverted repeat deletion
Deletion of one copy of a tandem inverted repeat
Tandem inverted repeat deletion
• Thank you!
• Jim Mullikin• Adam Phillippy• Sergey Koren• Brian Walenz• Ali Bashir
Recommended