47
FOR RESEARCH USE ONLY. Not for use in diagnostic procedures. Everyday de novo assembly GRC Assembly Workshop at Genome Informatics Deanna M. Church Senior Director of Applications Sep 19, 2016 @deannachurch

Everyday de novo assembly

Embed Size (px)

Citation preview

Page 1: Everyday de novo assembly

FORRESEARCHUSEONLY.Notforuseindiagnosticprocedures.

Everydaydenovoassembly

GRCAssemblyWorkshopatGenomeInformatics

DeannaM.ChurchSeniorDirectorofApplicationsSep19,2016

@deannachurch

Page 2: Everyday de novo assembly

2

Acknowledgements

Theentireteamat10x

DavidJaffe

NeilWeisenfeld

VijayKumar

Preyas Shah

NCBI:

FrancoiseThibaud-Nissen

ValerieSchneider

Page 3: Everyday de novo assembly

3

Disclosures

EmployeeandShareholder

Shareholder

10xGenomics

Personalis

10xGenomicsproductsdescribedareforResearchUseOnly.Notforuseindiagnosticprocedures.

Page 4: Everyday de novo assembly

4

Questionsfromtheorganizers

Arenewassembliesusingthereference?Cantheyhelpmakethereferencebetter?Dotheymakethereferenceobsolete?

Page 5: Everyday de novo assembly

5

Agenda

Whyhaven’twealwaysdonedenovogenomeanalysis?

Page 6: Everyday de novo assembly

6

Agenda

Whyhaven’twealwaysdonedenovogenomeanalysis?WhatareLinked-Reads?

Page 7: Everyday de novo assembly

7

Agenda

Whyhaven’twealwaysdonedenovogenomeanalysis?WhatareLinked-Reads?HowdoLinked-Readsenableeverydaydenovoassembly?

Page 8: Everyday de novo assembly

8

Whyhaven’twealwaysdonedenovo genomeanalysis?

Page 9: Everyday de novo assembly

9

KellyHowe,LawrenceBerkeleyLaboratory

KellyHowe,LawrenceBerkeleyLaboratory

Page 10: Everyday de novo assembly

10

ReferencequalityisHARD

DOI:10.1038/nature03001

Page 11: Everyday de novo assembly

11

Ouractualgenome:diploid

Page 12: Everyday de novo assembly

12

Howwerepresentourgenome:haploid

Page 13: Everyday de novo assembly

13

Currentapproach:averagingoverhaplotypes

Page 14: Everyday de novo assembly

14

Currentapproach:averagingoverhaplotypes

Page 15: Everyday de novo assembly

15

Currentapproach:averagingoverhaplotypes

Page 16: Everyday de novo assembly

16

Currentapproach:averagingoverhaplotypes

Page 17: Everyday de novo assembly

17

Problem:bothallelesdifferfromeachother

Page 18: Everyday de novo assembly

18

WhatareLinked-Reads?

Page 19: Everyday de novo assembly

19

Unlinked-Reads:shortrangeinformation

Page 20: Everyday de novo assembly

20

Linked-Reads:longrangeinformation

Page 21: Everyday de novo assembly

21

StartwithlongmoleculesNA19240

Page 22: Everyday de novo assembly

22

MakingLinked-Reads

P5 16bpBCR1 Nmer gDNA Insert

Page 23: Everyday de novo assembly

23

MakingLinked-Reads

Longinputmolecule

Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule

P5 16bpBCR1 Nmer gDNA Insert

Page 24: Everyday de novo assembly

24

MakingLinked-Reads

Longinputmolecule(50Kb)

Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule

P5 16bpBCR1 Nmer gDNA Insert

Longinputmolecule(50Kb)

30xsequence~35fragments~0.2xcoverage

Standardreferencebasedanalysisrecommendations

Page 25: Everyday de novo assembly

25

MakingLinked-Reads

Longinputmolecule(50Kb)

Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule

P5 16bpBCR1 Nmer gDNA Insert

Longinputmolecule(50Kb)

56xsequence~65fragments~0.4xcoverage

Supernovaanalysisrecommendations

Page 26: Everyday de novo assembly

26

SyntheticLongReads:lessphysicalcoverage

CA B

SequencingcostPhysicalcoverage

Page 27: Everyday de novo assembly

27

Linked-Reads:greaterphysicalcoverage

CA B

SequencingcostPhysicalcoverage

Page 28: Everyday de novo assembly

28

Example– MoleculevsReadCoverage

150X avgmolecule coverage

Agivengenomiclocuswillhave

150X avg moleculedepth,and30X avg readdepth

(150Xmoleculedepth)x (0.2Xread/m)=30Xreaddepth

Chr13: BRCA2

4/4/2016 Loupe

http://loupe.fuzzplex.com/loupe/view/MTk1MzgtUEhBU0VSX1NWQ0FMTEVSX1BELTEwMTMuMC4yNi5sb3VwZQ==/reads?ranges=chr13%2B32850000-chr1… 1/1

쁛 ►

>30X avgread coverage

Page 29: Everyday de novo assembly

29

GeneratingLinked-Reads

Startwith:

HMWgDNA,100Kb+molecules1.0ng inputDNA=300copiesofthegenome

0.5ngDNA=150 copiesofthegenome,partitionedinto>1MGEMs

DNA

OilBarcodedPrimerLibrary Enzyme Collect

Page 30: Everyday de novo assembly

30

HowdoLinked-Readsenableeverydaydenovoassembly?

Page 31: Everyday de novo assembly

31

Assemblymadeeasy

FASTABCL SupernovaDenovoAssembly

1200MNA19240

http://www.biorxiv.org/content/early/2016/08/19/070425

1server348Gbmemory2dayscompute

1library1.25nginput

Page 32: Everyday de novo assembly

32

Assemblymadeeasy

FASTABCL SupernovaDenovoAssembly

1200MNA192401library

1.25nginput

http://www.biorxiv.org/content/early/2016/08/19/070425

1server(28cores)348Gbmemory2dayscompute

Page 33: Everyday de novo assembly

33

Assemblymadeeasy

Measure ValueNumberof scaffolds>=10Kb 1.17 KEdgeN50 17.45KbContig N50 118.8KbPhaseblock N50 9.3MbScaffoldN50 16.4Mb

FASTABCL SupernovaDenovoAssembly

1200MNA19240

http://www.biorxiv.org/content/early/2016/08/19/070425

1server(28cores)348Gbmemory2dayscompute

1library1.25nginput

Page 34: Everyday de novo assembly

34

Performanceovermultiplesamples

http://www.biorxiv.org/content/early/2016/08/19/070425

sample ethnicity sex cov frag N50contig

N50scaffold

N50Phaseblock

gap

NA19238 YRI F 56 115 114.6 18.7 8 2.1

NA19240 YRI F 56 125 118.8 16.4 9.3 2.3

HG00733 PR F 56 106 123.6 17.8 3.4 2.0

HG00512 HAN M 56 102 113.2 15.4 2.7 2.2

NA24385 AJ M 56 120 106.4 15.1 4.2 2.6

HGP EUR M 56 139 120.2 18.6 4.5 2.5

NA12878 EUR F 56 92 118.5 16.4 2.8 2.9

Page 35: Everyday de novo assembly

35

HighqualityAssemblyatlowercoverage

102104106108110112114116118120122

500 700 900 1,100 1,300

ContigN50

(kb)

Numberofreads(millions)

0

5

10

15

20

25

500 700 900 1,100 1,300

ScaffoldN50

(Mb)

Numberofreads(millions)

0

1

2

3

4

5

500 700 900 1,100 1,300PhaseBlockN50

(Mb)

Numberofreads(millions)

Page 36: Everyday de novo assembly

36

DeNovoPerformanceDrasticallyImproveswithIncreasedDNALength

020,00040,00060,00080,000100,000120,000

0 10,000 20,000 30,000 40,000 50,000 60,000

ContigN50

0

5

10

15

20

0 10,000 20,000 30,000 40,000 50,000 60,000

ScaffoldN50

(Mb)

0100,000200,000300,000400,000500,000

0 10,000 20,000 30,000 40,000 50,000 60,000PhaseBlock

N50

DNALength

Page 37: Everyday de novo assembly

37

SupernovaAssembler

stuff

separateassembliesofhomologousloci

http://www.biorxiv.org/content/early/2016/08/19/070425

Page 38: Everyday de novo assembly

38

Assemblyarchitecture=phaseblocks

megabubble megabubble megabubble

multi-Mbphaseblocks

manyMbscaffold

microstructure• bubbles,oftenatindeterminatepoly-A• shortgaps,oftenatpoly-A

Page 39: Everyday de novo assembly

39

Assemblyassessment

Supernova10x Othermethods

0

5

10

15

20

25

NA19238 NA19240 HG00733 HG00512 NA24385 HGP NA12878 YH NA12878 NA12878 NA12878 NA24385 NA24143

PercentGRCh37100mersmissingperassembly

Missing100mershaploid Missing100mersdiploid

Diploid Haploid

Page 40: Everyday de novo assembly

40

Comparisontotruthdata

Page 41: Everyday de novo assembly

41

Improvingthereferenceassembly?

GRCh38:chr6(NC_000006.12

NA12878,hap0,scaf.21653(prev1.1)

260Kbofnewsequence

Page 42: Everyday de novo assembly

42

Bettergenotypereconstruction

chrX:6,219,000-6,220,500(GRCh38)NLGN4X(neuroligin 4,x-linked)

Page 43: Everyday de novo assembly

43chrX:6,218,359-6,221,000(GRCh38)

Bettergenotypereconstruction

Page 44: Everyday de novo assembly

44

Questionsfromtheorganizers

Arenewassembliesusingthereference?

Supernova:denovoassemblyDiploidreconstruction

NOYes

Assemblyconstruction

Assemblyanalysis

Page 45: Everyday de novo assembly

45

Questionsfromtheorganizers

Cantheyhelpmakethereferencebetter?

Yes

Supernova:individualgenomereconstructionContributingnewsequencestopopulationgraph

Page 46: Everyday de novo assembly

46

Questionsfromtheorganizers

Dotheymakethereferenceobsolete?

NO

Supernova:NotreferenceassembliesBetterindividualgenomereconstruction

Page 47: Everyday de novo assembly

47

Thanks!