54
Improving and validating the Atlantic Cod genome assembly using error-corrected as well as raw PacBio reads Lex Nederbragt, NSC and CEES [email protected] @lexnederbragt OK

Improving and validating the Atlantic Cod genome assembly using PacBio

Embed Size (px)

DESCRIPTION

My talk at the PacBio European Usergroup Meeting, November 20th, 2013

Citation preview

Page 1: Improving and validating the Atlantic Cod genome assembly using PacBio

Improving and validating the Atlantic Cod genome assembly using error-corrected

as well as raw PacBio reads

Lex Nederbragt, NSC and [email protected]

@lexnederbragtOK

Page 2: Improving and validating the Atlantic Cod genome assembly using PacBio

Acknowledgements

University of Oslo

Sequencing team NSC

Ole Kristian TøressenKjetill Jakobsen

Sissel JentoftCod genome group

Jason Miller, JCVI

Pacific Biosciences

Page 3: Improving and validating the Atlantic Cod genome assembly using PacBio

The Atlantic cod genome project

Page 4: Improving and validating the Atlantic Cod genome assembly using PacBio

Cod: the genome

850 million bases (Mbp )Heterozygote

‘Wild-caught’

Page 5: Improving and validating the Atlantic Cod genome assembly using PacBio

Cod: phase 1

(Sanger sequencing)454 sequencing

Page 6: Improving and validating the Atlantic Cod genome assembly using PacBio

N50

50% of the genome is in contigs as large as the N50 value

Courtesy of Michael Schatz, CSHL

1000 bp genome

445

520

400

490

N50

Sum

Page 7: Improving and validating the Atlantic Cod genome assembly using PacBio

Cod: phase 1

(Sanger sequencing)454 sequencing

Phase 1 assembly157 887 sequences753 Mbp of 830 Mbp

Scaffoldcontig

gap

N50 460 kbp

N50 2.8 kbp

Page 8: Improving and validating the Atlantic Cod genome assembly using PacBio

Cod: phase 1

6467 scaffolds

35% gap bases

Page 9: Improving and validating the Atlantic Cod genome assembly using PacBio

The causes

Short Tandem Repeats (>20% of gaps)

Page 10: Improving and validating the Atlantic Cod genome assembly using PacBio

The causes

Polymorphic contig 2Polymorphic contig 2

Polymorphic contig 3Polymorphic contig 3

Contig 4Contig 1

Heterozygosity?

Page 11: Improving and validating the Atlantic Cod genome assembly using PacBio

Cod: phase 2

New dataIllumina sequencingPaired end >200xMate Pair 5kb >100x

Improved/new software

Page 12: Improving and validating the Atlantic Cod genome assembly using PacBio

23 pseudochromosomes

Below 5% gap bases

Longer contigs

Cod: phase 2 goal

Phase 2 goalScaffold N50 1 MbpContig N50 15 kbp

Page 13: Improving and validating the Atlantic Cod genome assembly using PacBio

Cod: phase 2 programs

Zhang et al. PLoSOne 2011

Page 14: Improving and validating the Atlantic Cod genome assembly using PacBio

Cod phase 2: status

Goal

Contig scaffold N50 gaps N50

15 kbp <5% 1.5 Mbp

Celera, 454 + Ilmn

Newbler, 454

9 kbp 5% too short

6 kbp 24% OK

Page 15: Improving and validating the Atlantic Cod genome assembly using PacBio

Enter PacBio

Page 16: Improving and validating the Atlantic Cod genome assembly using PacBio

Large Insert Sizes

Sequencing

Aim for looooong insert sizes

Photo: Tore Oldeide Elgvin

147 SMRT Cells

Chemistry Coverage Av. Raw length

C2 9.2x 3.0 kb

C2-XL 3.2x 4.6 kb

XL-XL 3.5x 5.1 kb

TOTAL 15.9x

Page 17: Improving and validating the Atlantic Cod genome assembly using PacBio

Error-correction

Celera Assembler merTrim

+

27x

234x

PacBioToCa (Koren et al)

+

13.7x

27x

9x (67%) recovered

Page 18: Improving and validating the Atlantic Cod genome assembly using PacBio

Using PacBio reads

Page 19: Improving and validating the Atlantic Cod genome assembly using PacBio

PacBio reads for cod

Error-correctedreads

Rawreads

Assembly improvement Celera PBJelly

Assembly validation blasr blasrbridgemapper

De novo assembly Celera

Page 20: Improving and validating the Atlantic Cod genome assembly using PacBio

PacBio reads for cod

Error-correctedreads

Rawreads

Assembly improvement PBJelly PBJelly

Assembly validation blasr blasrbridgemapper

De novo assembly Celera

Page 21: Improving and validating the Atlantic Cod genome assembly using PacBio

PacBio reads for cod

Error-correctedreads

Rawreads

Assembly improvement PBJelly PBJelly

Assembly validation blasr blasrbridgemapper

De novo assembly Celera

Page 22: Improving and validating the Atlantic Cod genome assembly using PacBio

PacBio reads for cod

Error-correctedreads

Rawreads

Assembly improvement PBJelly PBJelly

Assembly validation blasr blasrbridgemapper

De novo assembly Celera

Page 23: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly improvement: corrected reads

Celera, 454 reads

Goal

N50 gaps

15 kbp <5%

9 kbp 5%

+ corrected PacBio + PBJelly 11 kbp 1.5%

Page 24: Improving and validating the Atlantic Cod genome assembly using PacBio

PacBio reads for cod

Error-correctedreads

Rawreads

Assembly improvement PBJelly PBJelly

Assembly validation blasr blasrbridgemapper

De novo assembly Celera

Page 25: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly improvement: raw reads

Goal

N50 gaps

15 kbp <5%

6 kbp 24%Newbler, 454

+ raw PacBio + PBJelly30 kbp 20%

Page 26: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly improvement: raw reads

Goal

N50 gaps

15 kbp <5%

9 kbp 5%

Too good to be true?

Celera, 454 + Ilmn

+ raw PacBio + PBJelly

46 kbp 1.5%

Page 27: Improving and validating the Atlantic Cod genome assembly using PacBio

PacBio reads for cod

Error-correctedreads

Rawreads

Assembly improvement PBJelly PBJelly

Assembly validation blasr blasrbridgemapper

De novo assembly Celera

Page 28: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly validation

Sequence

Page 29: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly validation

Sequence

Aligned raw Pacbio reads

Coverage

Page 30: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly validation

Sequence

Aligned raw Pacbio reads

Coverage

Aligned corrected Pacbio reads

Page 31: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly validationRa

wpa

cbio

read

sCo

rrec

ted

pacb

io re

ads

(TG)n repeat (TG)n repeat

308 bp gap

Newbler scaffold

Page 32: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly validationRa

wpa

cbio

read

s

(AG)n repeat

939 bp gap

Newbler scaffold

Heterozygous region

Page 33: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly validationRa

wpa

cbio

read

s

Celera scaffold

Misassembly?

Page 34: Improving and validating the Atlantic Cod genome assembly using PacBio

PacBio reads for cod

Error-correctedreads

Rawreads

Assembly improvement PBJelly PBJelly

Assembly validation blasr blasrbridgemapper

De novo assembly Celera

Page 35: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly validation: bridgemapper (beta)

structural variation misassemblies

Split alignments

Page 36: Improving and validating the Atlantic Cod genome assembly using PacBio

bridgemapper (beta) on E. coli

Positions in the contig color coded Illumina + velvet

Page 37: Improving and validating the Atlantic Cod genome assembly using PacBio

s05514

bridgemapper (beta) on cod

2510 bp gap

Point to a 2350 bp scaffold

Page 38: Improving and validating the Atlantic Cod genome assembly using PacBio

s08737

bridgemapper (beta) on cod

2145 bp gap

Point to a 3 kbp scaffold

Page 39: Improving and validating the Atlantic Cod genome assembly using PacBio

PacBio reads for cod

Error-correctedreads

Rawreads

Assembly improvement PBJelly PBJelly

Assembly validation blasr blasrbridgemapper

De novo assembly Celera

Page 40: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly with error-corrected reads

Celera Assembly

Goal

Contig N50 gaps scaffolds

15 kbp <5%

9 kbp 5% too short

1.4 times genome size underassembled

CA + corrected PacBio + 454 mates 8 kbp 2% very short

Page 41: Improving and validating the Atlantic Cod genome assembly using PacBio

The improved Atlantic cod genome: status

http://en.wikipedia.org

Page 42: Improving and validating the Atlantic Cod genome assembly using PacBio

Newbler plus Celera

Scaffoldcontig

gap

Celera: Long contigs, short scaffolds

Slide courtesy of Ole Kristian Tøressen

Page 43: Improving and validating the Atlantic Cod genome assembly using PacBio

Newbler plus Celera

Scaffoldcontig

gap

Scaffoldcontig

gap

Celera: Long contigs, short scaffolds

Newbler: Short contigs, long scaffolds

Slide courtesy of Ole Kristian Tøressen

Page 44: Improving and validating the Atlantic Cod genome assembly using PacBio

Newbler plus Celera

Scaffoldcontig

gap

Scaffoldcontig

gap

Celera: Long contigs, short scaffolds

Newbler: Short contigs, long scaffolds

Scaffoldcontig

gapCombined: Long contigs, long scaffolds

Slide courtesy of Ole Kristian Tøressen

Page 45: Improving and validating the Atlantic Cod genome assembly using PacBio

Contig

Scaffold

PacBio reads

Slide courtesy of Ole Kristian Tøressen

Adding PacBio

Closed gap Reduced gap

Using PBJelly

Page 46: Improving and validating the Atlantic Cod genome assembly using PacBio

Polishing the assembly

454 and Illumina reads

Slide courtesy of Ole Kristian Tøressen

Contig

Scaffold

Contig N50: 30 - 40 kbpScaffold N50: 1 - 1.5 Mbp

Page 47: Improving and validating the Atlantic Cod genome assembly using PacBio

Imageby Mathieu Thouvenin http://www.flickr.com/photos/mathoov/4681491052/

Page 48: Improving and validating the Atlantic Cod genome assembly using PacBio

PacBio reads for cod

Error-correctedreads

Rawreads

Assembly improvement PBJelly PBJelly

Assembly validation blasr blasrbridgemapper

De novo assembly Celera

Page 49: Improving and validating the Atlantic Cod genome assembly using PacBio

PacBio reads for cod

Error-correctedreads

Rawreads

Assembly improvement PBJelly PBJelly

Assembly validation blasr blasrbridgemapper

De novo assembly Celera Celera

Page 50: Improving and validating the Atlantic Cod genome assembly using PacBio

Assembly

Goal

Contig N50 gaps scaffolds

15 kbp <5%

8 kbp 2% very short CA + corrected PacBio + 454 mates

1.6 times genome size underassembled

CA + raw PacBio reads + 454 mates 38 kbp <1% very short

Page 51: Improving and validating the Atlantic Cod genome assembly using PacBio

Lessons learned from PacBio reads

Page 52: Improving and validating the Atlantic Cod genome assembly using PacBio

Heterozygous:Large polymorphism

(100’s of bases)

Heterozygous:Large indel

(100’s of bases)

Homozygous HomozygousHomozygous

Cod genome

Page 53: Improving and validating the Atlantic Cod genome assembly using PacBio

Atlantic cod version 2

23 pseudochromosomes

Below 5% gap bases

Longer contigs

New annotation

Page 54: Improving and validating the Atlantic Cod genome assembly using PacBio

From observation to insight

Mathias Bigge, Ricordisamoa, others (wikimedia commons)

We need better programs