11
GIAB SV Data Jamboree @ NIST PBHoney Spots Update Will Salerno 9.15.2016 PBSuite http://sourceforge.net/projects/pb-jelly/

Sept2016 sv pb_honey

Embed Size (px)

Citation preview

Page 1: Sept2016 sv pb_honey

GIAB SV Data Jamboree @ NIST

PBHoney Spots UpdateWill Salerno9.15.2016

PBSuitehttp://sourceforge.net/projects/pb-jelly/

Page 2: Sept2016 sv pb_honey

Honey Spots

● Honey Spots is the “indel” caller for Long-Read SV detection○ Tails is “split read”

● Designed for smaller SVs ○ 50 bp to 2 Kbp

● Two components○ SpotCaller: Discover

putative SVs○ ConsensusCaller: Evaluate

SVs

Align Reads & Create Error

Channels

Process Signal & Create “Spots”

Identify Alt Reads & Create

Consensus Sequence

Remap Consensus & Call SV

Page 3: Sept2016 sv pb_honey

Honey Spots Update

● Performance Optimizations○ NA12878 (41x) 2 days to 6

hours● Less restrictive filtering

○ More sensitive calling● “Tails” can contribute to Spots

signal

Align Reads & Create Error

Channels

Process Signal & Create “Spots”

Identify Alt Reads & Create

Consensus Sequence

Remap Consensus & Report All Spots

Tails Calls <10 Kbp

Page 4: Sept2016 sv pb_honey

Existing Data SetsAJ Proband AJ Mother AJ Father NA12878 HS1011

Coverage 45x 19x 21x 41x 23x

● Eight short-read SV detection methods● PBHoney (old version)● 10x PacBio, 48x Short-Read, BioNano, aCGH

Page 5: Sept2016 sv pb_honey

Honey Spots Performance

SampleSVTyp

e SizeDist CountTruthSet

CallsTruthSet

RecoveredRecovery

Rate

HS1011

INS

(50, 100) 6,459 113 74 65.49%(101, 500) 6,277 2,850 2,383 83.61%

(501, 1000) 673 305 241 79.02%(1000, 2000) 103 259 192 74.13%

DEL

(50, 100) 5,405 25 19 76.00%(101, 500) 4,067 3,159 2,582 81.73%

(501, 1000) 600 226 170 75.22%(1000, 2000) 536 46 15 32.61%

NA12878

INS

(50, 100) 8,833 . . .(101, 500) 8,460 . . .

(501, 1000) 676 . . .(1000, 2000) 66 . . .

DEL

(50, 100) 5,010 2 2 100.00%(101, 500) 3,930 1,484 1,446 97.44%

(501, 1000) 509 201 182 90.55%(1000, 2000) 466 197 185 93.91%

Page 6: Sept2016 sv pb_honey

AJ Trio Deletions: Trio Discovery

Remove loci with any sample represented

more than once

Do discovery in Trio

Filter Proband to

altZMWs >= 10

Merge Trio With

50bp Bookends Distance

Force Call Missing in Parents

Discovery

Filter Proband altZMWs

>=10 50bp Merge

Single Sample

Filter

Present in Proband

and Parent(s)

Missing in Parents

Discovery but Forced

Total Proband

with Parent Support

Proband 10,753 8,137 7,785 7,305 6,175 886 7,061

Father 7,994 . 7,727 7,300 4,784 663 5,447

Mother 7,448 . 7,217 6,813 4,636 651 5,287

Total 26,195 23,579 11,896 11,369 6,175 886 7,061

Page 7: Sept2016 sv pb_honey

Honey Force Calling

Candidate Regions

Identify Matching Spots Reads Near

Region

Output Evidence Identify Matching Tails Reads Near

Region

Identify ‘Reference’ Supporting Reads Spanning Region

● A Candidate Region is an SV’s location, type, size.● Reads are fetched within Region ±BUFFER.● Matching Reads are those having variant of the same type within ±SIZE and ±DISTANCE.● Reference supporting Reads span Region and show no variant evidence.● Looking for a minimum of one read.

Page 8: Sept2016 sv pb_honey

AJ Trio Dels: Proband Discovery, Parent Force Calling

Do discovery in Proband

Filter Proband to

altZMWs >= 10Force in Parents

Discovery

Filter Proband altZMWs

>= 10Forced in

FatherForced in

MotherForced in Parent(s)

Proband 10,753 8,137 6,268 6,206 7,565

Page 9: Sept2016 sv pb_honey

AJ Trio Insertions: Trio Discovery

Discovery

Filter Proband altZMWs

>=10 50bp Merge

Single Sample

Filter

Present in Proband

and Parent(s)

Missing in Parents

Discovery but Forced

Total Proband

with Parent Support

Proband 24,585 13,134 12,324 11,317 7,266 2,986 10,252

Father 11,758 . 11,236 10,303 5,632 2,322 7,954

Mother 10,633 . 10,146 9,253 5,344 2,308 7,652

Total 26,195 35,525 20,227 19,051 7,266 2,986 10,252

Remove loci with any sample represented

more than once

Do discovery in Trio

Filter Proband to

altZMWs >= 10

Merge Trio With

50bp Bookends Distance

Force Call Missing in Parents

Page 10: Sept2016 sv pb_honey

AJ Trio Ins: Proband Discovery, Parents Force Calling

Discovery

Filter Proband altZMWs

>= 10Forced in

FatherForced in

MotherForced in Parent(s)

Proband 24,585 13,134 10,245 10,139 11,839

Do discovery in Proband

Filter Proband to

altZMWs >= 10Force in Parents

Page 11: Sept2016 sv pb_honey

Next-Gen Sequencing Informatics Group @ HGSC● Bioinformatics Core for the Human Genome Sequencing Center● Primary and Secondary Analysis for Production Pipelines

○ Illumina Fleet (X Ten, 2000/2500), PacBio (RS II and Sequel)○ Research and CAP/CLIA○ WGS, WES, Custom Capture, Clinical Panels

● Structural Variation● Annotation● Hadoop Data Warehouse● EMR/EHR Integration● 11 Members and Growing!

CHARGE