20
SV Detection via Anchored Assembly How can we best call structural variants? Becky Drees,Jeremy Bruestle, Cheinan Marks

Aug2014 spiral genetics anchored assembly

Embed Size (px)

DESCRIPTION

Aug2014 spiral genetics anchored assembly

Citation preview

Page 1: Aug2014 spiral genetics anchored assembly

SV Detection via Anchored Assembly

How can we best call structural variants?

Becky Drees,Jeremy Bruestle, Cheinan Marks

Page 2: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Brief Description of Anchored Assembly Method Testing vs GIAB Variant Set & Validated SV Sets

How Do We Describe SVs from Detected Breakpoints? !

SV Detection via Anchored Assembly

Page 3: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Input data

Any Species with a draft genome

Existing NGS Data No special library prep ~20x per ploidy

Page 4: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

0

0 200 400 600 800 1000 1200

1000

2000

3000

4000

5000

K-m

er C

ount

Total K-mer Quality Score

K-mer Quality Score Distribution

A* error correction

Step 1: Read Correction!

• Similar to Euler or Quake

• Corrects the read without using reference information

• Reduces error from 1% to 0.01%

Page 5: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Step 2: Remove Reference Matches

!

• Remove reads that are an exact match to reference

• Significantly reduces the complexity of the graph

• Reduces required memory usage (40GB for whole human genome)

Page 6: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

!

• Construct a read overlap graph with the remaining reads

• Provides more context than a kmer-based de Bruijn graph

7 7 7

7

8 89 9

7

8

7

R1 R2

R3 R5

R8R7

R3 R6 R9

Read overlapassembly

Step 3: Read Overlap Graph

Page 7: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

!

• Anchor assemblies to reference coordinates

• Provide breakpoint information while keeping reference bias low

Anchoring

Step 4: Anchoring

Page 8: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

!

• Assemble variant sequence from read overlap graph

• Computes minimal cost variation (similar to Smith-Waterman)

• Calls variants and QC to remove likely false positives

A A T G A C T T A G . . A

G A C T T A G A T A

A C

C T T A G A T A A C

A T T

A G A T A A C A T TT T A G A T A A C A

G

G A C T T A G A T A A C A T T G

G A T A A C A T T G

T A G

Reference

Assembled

R2

R3

R4

R5

R6

Variant validation

Step 5: Variant Validation

Page 9: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Anchored)Assembly)only)13,307)

Genome)in)a)Bo8le)only)144,463)

!

2,596,897)Sensi@vity:))95%)Precision:))99.5%)

NA12878 SNP Detection vs GIAB

Page 10: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

NA12878 Indel Detection vs GIAB

Page 11: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Chr. Mills   Pindel  50x

AA  50x AA  200x

1 2475799172 2576951 n n2 78558069 n n n2 187143096 n2 191002548 n n n3 43972635 n n n3 100737223 n n n3 100868475 n n n3 195823764 n n n5 78035993 n n n7 1528948 n n n7 20898768 22717662 n n n9 97387403 n9 137361862 n12 103954170 n n13 76345722 n n n13 11376093913 114103496 n n15 26060663 n n15 92686723 n17 3924078217 77134774 n18 74794821 n n18 76182038 n n n19 1278240 n n n19 2247173 n n n20 55992535 n n21 39080014 n n

X 94894756 n n

NA12878 SV Insertions

Mills et al. Eichler Lab, U. Washington, Sanger validated

Page 12: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

NA12878 SV Deletions

Page 13: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

How to describe SVs from breakpoints?

#CHROM   POS     ID   REF   ALT       QUAL  FILTER    1   1500000   bnd_A   T   T[1:1501108[   100   PASS  

INFO                     FORMAT   SAMPLE   DP=26;NS=1;SVTYPE=BND;MATEID=bnd_B;AID=1234   DP:ED:OV   26:72:89  

#CHROM   POS     ID   REF   ALT       QUAL  FILTER    1   1501108   bnd_B   G   ]1:1500000]G   100   PASS  

INFO                     FORMAT   SAMPLE   DP=26;NS=1;SVTYPE=BND;MATEID=bnd_A;AID=1234     DP:ED:OV   26:72:89  

As breakend records:

As SV events:

Page 14: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

How to describe SVs from breakpoints?

CHR$1$

bnd_K$ bnd_L$ bnd_M$ bnd_N$

190000$200000$ 200231$197000$

• Different events can produce similar breakpoints • Multiple breakpoints can represent a single rearrangement event

Assembled breakpoints can reveal variation that is hard to categorize

Page 15: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

How to describe SVs from breakpoints?

A single breakpoint can contain multiple sequence changes: !• Inserted sequence at deletion breakpoints • Deleted or duplicated sequence at insert breakpoints • Deleted or duplicated sequence at inversion breakpoints

CHR$1$

1700000$ 1704100$

1700100$ 1704250$

Inverted(sequence(

deleted sequence duplicated sequence

Page 16: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

How to describe SVs from breakpoints?

Many assemblies anchor to multiple genome locations • Variation in duplicated genome regions • Variation in repetitive elements • Transposons

CHR$1$

Alu$

anchors to multiple places

unique anchor

Page 17: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Contact

• More information • Trial on own data

!

[email protected] [email protected]

!

[email protected]

Page 18: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Questions?

Page 19: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Anchored Assembly SNP Distribution

Page 20: Aug2014 spiral genetics anchored assembly

Please do not distribute without permission.

Anchored Assembly SV Distribution