30
GigAssembler

GigAssembler. Genome Assembly: A big picture

Embed Size (px)

Citation preview

Page 1: GigAssembler. Genome Assembly: A big picture

GigAssembler

Page 2: GigAssembler. Genome Assembly: A big picture

Genome Assembly: A big picture

http://www.nature.com/scitable/content/anatomy-of-whole-genome-assembly-20429

Page 3: GigAssembler. Genome Assembly: A big picture

GigAssembler – Preprocessing

1. Decontaminating & Repeat Masking.

2. Aligning of mRNAs, ESTs, BAC ends & paired reads against initial sequence contigs. psLayout → BLAT

3. Creating an input directory (folder) structure.

Page 4: GigAssembler. Genome Assembly: A big picture

http://www.triazzle.com; The image from http://www.dangilbert.com/port_fun.htmlReference: Jones NC, Pevzner PA, Introduction to Bioinformatics Algorithms, MIT press

Page 5: GigAssembler. Genome Assembly: A big picture

RepBase + RepeatMasker

Page 6: GigAssembler. Genome Assembly: A big picture

GigAssembler: Build merged sequence contigs (“rafts”)

Page 7: GigAssembler. Genome Assembly: A big picture

Sequencing quality (Phred Score)

Page 8: GigAssembler. Genome Assembly: A big picture

Sequencing quality (Phred Score)

http://en.wikipedia.org/wiki/Phred_quality_score

Base-callingError Probability

Page 9: GigAssembler. Genome Assembly: A big picture

GigAssembler: Build merged sequence contigs (“rafts”)

Page 10: GigAssembler. Genome Assembly: A big picture

GigAssembler: Build merged sequence contigs (“rafts”)

Page 11: GigAssembler. Genome Assembly: A big picture

GigAssembler: Build sequenced clone contigs (“barges”)

Page 12: GigAssembler. Genome Assembly: A big picture

GigAssembler: Build a “raft-ordering” graph

Page 13: GigAssembler. Genome Assembly: A big picture

GigAssembler: Build a “raft-ordering” graph

Add information from mRNAs, ESTs, paired plasmid reads, BAC end pairs: building a “bridge”

Different weight to different data type: (mRNA ~ highest)

Conflicts with the graph as constructed so far are rejected.

Build a sequence path through each raft.

Fill the gap with N.

100: between rafts

50,000: between bridged barges

Page 14: GigAssembler. Genome Assembly: A big picture

Bellman-Ford algorithm

http://compprog.wordpress.com/2007/11/29/one-source-shortest-path-the-bellman-ford-algorithm/

Page 15: GigAssembler. Genome Assembly: A big picture

A

B C

D E

+6

+8

+7

+9

+2

+5

-2

-3

Find the shortest path to all nodes.

-4

+7

Take every edge and try to relax it (N – 1 times where N is the count of nodes)

Page 16: GigAssembler. Genome Assembly: A big picture

A

B C

D E

+6

+8

+7

+9

+2

+5

-2

-3

Find the shortest path to all nodes.

-4

+7

Take every edge and try to relax it (N – 1 times where N is the count of nodes)

Page 17: GigAssembler. Genome Assembly: A big picture

A

B C

D E

+6

+8

+7

+9

+2

+5

-2

-3

Find the shortest path to all nodes.

-4

+7START

Inf. Inf.

Inf. Inf.

Take every edge and try to relax it (N – 1 times where N is the count of nodes)

Page 18: GigAssembler. Genome Assembly: A big picture

A

B C

D E

+6

+8

+7

+9

+2

+5

-2

-3

Find the shortest path to all nodes.

-4

+70

START

Inf.

Inf.

+7(→ A)

+6(→ A)

Take every edge and try to relax it (N – 1 times where N is the count of nodes)

Page 19: GigAssembler. Genome Assembly: A big picture

A

B C

D E

+6

+8

+7

+9

+2

+5

-2

-3

Find the shortest path to all nodes.

-4

+70

START

+7(→ A)

+6(→ A)

+2(→ B)

+4(→ D)

Take every edge and try to relax it (N – 1 times where N is the count of nodes)

Page 20: GigAssembler. Genome Assembly: A big picture

A

B C

D E

+6

+8

+7

+9

+2

+5

-2

-3

Find the shortest path to all nodes.

-4

+70

START

+7(→ A)

+2(→ B)

+4(→ D)

+2(→ C)

Take every edge and try to relax it (N – 1 times where N is the count of nodes)

Page 21: GigAssembler. Genome Assembly: A big picture

A

B C

D E

+6

+8

+7

+9

+2

+5

-2

-3

Find the shortest path to all nodes.

-4

+70

START

+7(→ A)

+4(→ D)

+2(→ C)

-2(→ B)

Take every edge and try to relax it (N – 1 times where N is the count of nodes)

Page 22: GigAssembler. Genome Assembly: A big picture

A

B C

D E

+6

+8

+7

+9

+2

+5

-2

-3

-4

+70

START

+7(→ A)

+4(→ D)

+2(→ C)

-2(→ B)

Answer: A-D-C-B-E

Page 23: GigAssembler. Genome Assembly: A big picture

Next-generation sequencing

Page 24: GigAssembler. Genome Assembly: A big picture

Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Illumina

Page 25: GigAssembler. Genome Assembly: A big picture

Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Illumina

Page 26: GigAssembler. Genome Assembly: A big picture

Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 Roche/454

Page 27: GigAssembler. Genome Assembly: A big picture

Mardis ER, Annu. Rev. Genomics Hum. Genet., 2008 SOLiD

Page 28: GigAssembler. Genome Assembly: A big picture

An example of single molecule DNA sequencing,from Helicos (approx. 1 billion reads / run)

Pushkarev, D., N.F. Neff, and S.R. Quake. Nat Biotechnol (2009) 27, 847-50Harris, T.D., et al. Science (2008) 320, 106-9

Page 29: GigAssembler. Genome Assembly: A big picture

Mapping program

Trapnell C, Salzberg SL, Nat. Biotech., 2009

Page 30: GigAssembler. Genome Assembly: A big picture

Two strategies in mapping

Trapnell C, Salzberg SL, Nat. Biotech., 2009