30
For Research Use Only. Not for use in diagnostics procedures. © Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved. Structural Variant Detection in SMRT Link 5 with pbsv Aaron Wenger 2017-06-27

Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

For Research Use Only. Not for use in diagnostics procedures. © Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved.

Structural Variant Detection in

SMRT Link 5 with pbsv

Aaron Wenger 2017-06-27

Page 2: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

STRUCTURAL VARIANT = DIFFERENCE ≥50 BP

Insertion Duplication

Inversion Tandem Repeat Translocation

Deletion

Page 3: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

VARIATION BETWEEN TWO HUMAN GENOMES

Huddleston et al. (2017) Genome Research 27(5):677-85.

vs.

5×106

5 Mb 3 Mb 10 Mb

variants

basepairs

affected

SNVs

4×105

structural variantsindels

2×104

Page 4: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME

4,000

20,000

Short reads

PacBio

repeats + GC-rich +

large insertions

Huddleston et al. (2017) Genome Research 27(5):677-85.

Seo et al. (2016) Nature 538:243-7.

Sudmant et al. (2016) Nature 526:75-81.

Page 5: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

SEQUENCING + ANALYSIS

Li and Durbin (2009) Bioinformatics 25:1754-60.

McKenna et al. (2010) Genome Research 20:1297-303.

Structural

Variants

BWASNVs +

Indels

Short

reads

?pbsv

Page 6: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

3 COMPONENTS TO PBSV

pbsv command line utility for top-level commands

pbsvutil command line utility for detailed commands

SMRT Link web interface

Page 7: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

TOP-LEVEL PBSV COMMANDS

pbsv generate-config [-h] [-o sv.cfg]

(optional) Generate a configuration file to specify options for other stages.

pbsv align [-h] [--cfg_fn sv.cfg]

ref.fa subreads.bam ref.align.bam

Map reads to a reference genome with a “structural variant aware” aligner.

pbsv call [-h] [--cfg_fn sv.cfg]

ref.fa ref.align.bam ref.sv.bed|vcf

Call structural variants from aligned reads.

Page 8: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

TOP-LEVEL PBSV COMMANDS

pbsv generate-config [-h] [-o sv.cfg]

(optional) Generate a configuration file to specify options for other stages.

pbsv align [-h] [--cfg_fn sv.cfg]

ref.fa subreads.bam ref.align.bam

Map reads to a reference genome with a “structural variant aware” aligner.

pbsv call [-h] [--cfg_fn sv.cfg]

ref.fa ref.align.bam ref.sv.bed|vcf

Call structural variants from aligned reads.

Page 9: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV ALIGN UTILIZES NGM-LR

Rescheneder, Sedlazeck, and Schatz. https://github.com/philres/ngmlr/.

gap size

pe

na

lty

sequencing errors

(frequent & independent)

structural variants

(infrequent & correlated)

pbsvutil ngmlr

Page 10: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV ALIGN UTILIZES NGM-LR

NGM-LRBWA

gap size

penalty

gap size

penalty

sequencing errors

structural variants

sequencing errors

structural variants

pbsvutil ngmlr

Page 11: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

X

PBSV ALIGN CHAINS CO-LINEAR ALIGNMENTS

Reference

Read

ZY

XZ

W W

W

pbsvutil chain

X ZYW W

Page 12: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

TOP-LEVEL PBSV COMMANDS

pbsv generate-config [-h] [-o sv.cfg]

(optional) Generate a configuration file to specify options for other stages.

pbsv align [-h] [--cfg_fn sv.cfg]

ref.fa subreads.bam ref.align.bam

Map reads to a reference genome with a “structural variant aware” aligner.

pbsv call [-h] [--cfg_fn sv.cfg]

ref.fa ref.align.bam ref.sv.bed|vcf

Call structural variants from aligned reads.

Page 13: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER

FIND SV

SIGNATURES

CIGAR D & I

≥ 50 bp

CLUSTER SV

SIGNATURES

nearby with similar

sequence

SUMMARIZE

INTO SV

consensus of

supporting reads

GENOTYPE

SV

supporting reads /

covering reads

ANNOTATE SV

Alu, LINE, SVA,

tandem repeat

FILTER SV

≥ 2 and ≥ 20%

reads support

Page 14: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER

FIND SV

SIGNATURES

CIGAR D & I

≥ 50 bp

CLUSTER SV

SIGNATURES

nearby with similar

sequence

SUMMARIZE

INTO SV

consensus of

supporting reads

GENOTYPE

SV

supporting reads /

covering reads

ANNOTATE SV

Alu, LINE, SVA,

tandem repeat

FILTER SV

≥ 2 and ≥ 20%

reads support

Page 15: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER

FIND SV

SIGNATURES

CIGAR D & I

≥ 50 bp

CLUSTER SV

SIGNATURES

nearby with similar

sequence

SUMMARIZE

INTO SV

consensus of

supporting reads

GENOTYPE

SV

supporting reads /

covering reads

ANNOTATE SV

Alu, LINE, SVA,

tandem repeat

FILTER SV

≥ 2 and ≥ 20%

reads support

Page 16: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER

FIND SV

SIGNATURES

CIGAR D & I

≥ 50 bp

CLUSTER SV

SIGNATURES

nearby with similar

sequence

SUMMARIZE

INTO SV

consensus of

supporting reads

GENOTYPE

SV

supporting reads /

covering reads

ANNOTATE SV

Alu, LINE, SVA,

tandem repeat

FILTER SV

≥ 2 and ≥ 20%

reads support

329 bp

deletion

63 bp

insertion

Page 17: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER

FIND SV

SIGNATURES

CIGAR D & I

≥ 50 bp

CLUSTER SV

SIGNATURES

nearby with similar

sequence

SUMMARIZE

INTO SV

consensus of

supporting reads

GENOTYPE

SV

supporting reads /

covering reads

ANNOTATE SV

Alu, LINE, SVA,

tandem repeat

FILTER SV

≥ 2 and ≥ 20%

reads support

heterozygous

(4 of 10)

heterozygous

(1 of 10)

329 bp

deletion

63 bp

insertion

Page 18: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER

FIND SV

SIGNATURES

CIGAR D & I

≥ 50 bp

CLUSTER SV

SIGNATURES

nearby with similar

sequence

SUMMARIZE

INTO SV

consensus of

supporting reads

GENOTYPE

SV

supporting reads /

covering reads

ANNOTATE SV

Alu, LINE, SVA,

tandem repeat

FILTER SV

≥ 2 and ≥ 20%

reads support

Alu-

heterozygous

(4 of 10)

heterozygous

(1 of 10)

329 bp

deletion

63 bp

insertion

Page 19: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV CALL: STAGED STRUCTURAL VARIANT CALLER

FIND SV

SIGNATURES

CIGAR D & I

≥ 50 bp

CLUSTER SV

SIGNATURES

nearby with similar

sequence

SUMMARIZE

INTO SV

consensus of

supporting reads

GENOTYPE

SV

supporting reads /

covering reads

ANNOTATE SV

Alu, LINE, SVA,

tandem repeat

FILTER SV

≥ 2 and ≥ 20%

reads support

Alu-

heterozygous

(4 of 10)

heterozygous

(1 of 10)

329 bp

deletion

63 bp

insertion

Page 20: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER

SMRT Analysis

Page 21: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER

SMRT Analysis

Page 22: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER

SMRT Analysis

chr1

904490

ACGCGGCCGCCTCCTCCTCCGAACGTGGCCTCCTCCGAACGCGGCCGCCTCCTCCTCCGAACGCGGCCGCCTCCTCCTCCGA

A

PASS

IMPRECISE;SVTYPE=DEL;END=904587;SVLEN=-97;SVANN=TANDEM

GT:AD:DP

0/1:9:15

Page 23: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER

SMRT Analysis

chr1 904490 904587 Deletion -97 . GT:AD:DP 0/1:9:15 SVANN=TANDEM

Page 24: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER

SMRT Analysis

Page 25: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER

SMRT Analysis

Page 26: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER

SMRT Analysis

Page 27: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PBSV: SMRT LINK STRUCTURAL VARIANT CALLER

SMRT Analysis

Page 28: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

3 COMPONENTS TO PBSV

pbsv command line utility for top-level commands

pbsvutil command line utility for detailed commands

SMRT Link web interface

Page 29: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

PacBio

ACKNOWLEDGMENTS

Schatz LabMichael Schatz

Philipp Rescheneder

Fritz Sedlazeck

gap size

penalty

convexerrorsindels

NGM-LR

Yuan Li

Chris Dunn

Ben Lerch

Jim Drake

Nat Echols

Aaron Klammer

Mary Budagyan

Page 30: Structural Variant Detection in SMRT Link 5 with pbsv€¦ · STRUCTURAL VARIANTS DETECTED IN A HUMAN GENOME 4,000 20,000 Short reads PacBio repeats + GC-rich + large insertions Huddleston

For Research Use Only. Not for use in diagnostics procedures. © Copyright 2017 by Pacific Biosciences of California, Inc. All rights reserved. Pacific Biosciences, the Pacific Biosciences logo,

PacBio, SMRT, SMRTbell, Iso-Seq, and Sequel are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science. NGS-go and NGSengine are trademarks of GenDx.

FEMTO Pulse and Fragment Analyzer are trademarks of Advanced Analytical Technologies.

All other trademarks are the sole property of their respective owners.

www.pacb.com