Jan2016 fritz sedlazeck mapping and sv calling from pac bio

Preview:

Citation preview

Giab workshop

Fritz SedlazeckCSHL, JHU

Previous meetings: Utilizing long reads

1. How to predict the breakpoints?

2. How to assess genotype ?

3. Complex SVs?

1. Breakpoint prediction

• Over BWA-MEM alignments– First version had a bug…

• Redesigning Sniffles– Improved speed– Improved accuracy on noisy alignment– Improved read filtering -> reducing FDR– Optional realignment step

• Improved breakpoint accuracy• Improving Genotyping

Sniffles v01 error

Sniffles v02

Current limitations

• Linear: gap cost always the same• Affine: separate penalties for opening and extending a gap• Using one gap cost is considered state of the art

• Problem with PacBio/ONT: two different gap models required– Sequencing error: large high number of 1 bp indels– Real indels: extending a gap more likely than opening a new one– Sequencing error + repeats cause one gap cost to fail even for real

indels

AAAGAATTCAA-A-A-T-CA

AAAGAATTCAAAA----TCA

vs.

Convex gap costs• Costs for a gap follow a convex function of gap length

• Close to linear gap costs for 1 - 2 bp gaps• As gap gets longer penalty for "splitting" gaps increases• Problem optimal approach: O(nm2 + n2m)• Heuristic implementation O(nm)

NGM-LR workflow

NGM-LR reconcileRead within inversion Read within duplication

Deletion

Deletion

Insertions

Inversions

Translocations

Nested SV (SKBR3)

Outlook

• Finish new version of Sniffles– Assessment of noisy alignments

• NGM-LR:– MQ calculation– Runtime

• Visual inspection and comparison of SV calls