20
Pusan National University Interdisciplinary Program of Bioinformatics De Novo Repeat Classification De Novo Repeat Classification and Fragment Assembly and Fragment Assembly 석석 1 석 석 석 석

De Novo Repeat Classification and Fragment Assembly

  • Upload
    ursula

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

De Novo Repeat Classification and Fragment Assembly. 석사 1 년 김 우 연. PROGRAMS related Repeat. Repeat Annotation - libraries RepeatMasker ( A.F.A. Smit and P. Green, unpubl. ) MaskerAid ( Bedell et al. 2000 ) No de novo compilation Repeat Analysis RepeatMatch ( Delcher et al. 1999 ) - PowerPoint PPT Presentation

Citation preview

Page 1: De Novo Repeat Classification and Fragment Assembly

Pusan National UniversityInterdisciplinary Program of Bioinformatics

De Novo Repeat Classification De Novo Repeat Classification and Fragment Assemblyand Fragment Assembly

석사 1 년김 우 연

Page 2: De Novo Repeat Classification and Fragment Assembly

PROGRAMS related RepeatPROGRAMS related Repeat

Repeat Annotation - libraries RepeatMasker ( A.F.A. Smit and P. Green, unpubl. ) MaskerAid ( Bedell et al. 2000 ) No de novo compilation

Repeat Analysis RepeatMatch ( Delcher et al. 1999 ) REPuter ( Kurtz et al. 2000, 2001 ) RECON, RepeatFinder, LTR_STRUC No compact overview or summary of the repeat family

Page 3: De Novo Repeat Classification and Fragment Assembly

Genome Research Received January 27, 2004 Accepted in revised form June 29, 2004

Page 4: De Novo Repeat Classification and Fragment Assembly

CONTENTSCONTENTS

Introduction Concepts Methods

De Bruijn Graphs & A-Bruijn Graphs RepeatGluer Algorithm Constructing A-Bruijn Graphs Without the Similarity Matrix Fragment Assembly FragmentGluer Algorithm

Results and Discussion

Page 5: De Novo Repeat Classification and Fragment Assembly

INTRODUCTIONINTRODUCTION

“The problem of automated repeat sequence family classification is inherently messy and ill-defined and does not appear to be amenable to a clean algorithmic attack” – Bao and Eddy (2002)

One of the difficulties in repeat classification is that many repeats represent mosaics of sub-repeats – Bailey et al. 2002

Aims Proposing a new approach to repeat classification FragmentGluer assembler

Page 6: De Novo Repeat Classification and Fragment Assembly

CONCEPSCONCEPS

Page 7: De Novo Repeat Classification and Fragment Assembly

Genomic dot-plotGenomic dot-plot

Genomic dot-plot of an imaginary sequence

An imaginary evolutionary process

Gluing repeated regions leads to the repeat graph the final genome

Page 8: De Novo Repeat Classification and Fragment Assembly

The idea of our approachThe idea of our approach

By gluing points together, repeats transform into the

A-Bruijn graph

Page 9: De Novo Repeat Classification and Fragment Assembly

Mosaic repeat organizationMosaic repeat organization

BAC from human Chromosome Y Repeat pairs by REPuter & Sub-repeats by our division Repeat multigraph Repeat graph RepeatFinder vs RECON vs REPuter

Page 10: De Novo Repeat Classification and Fragment Assembly

METHODSMETHODS

Page 11: De Novo Repeat Classification and Fragment Assembly

De Bruijn Graphs & A-Bruijn De Bruijn Graphs & A-Bruijn GraphsGraphs

De Bruijn Graph: ACTGCTGCC

ACT CTG

TGCGCT GCC

ACTGCTGCC ACTGCTGCC

Page 12: De Novo Repeat Classification and Fragment Assembly

De Bruijn Graphs & A-Bruijn De Bruijn Graphs & A-Bruijn GraphsGraphs

A-Bruijn Graph: … AT … ACT … ACAT …

Page 13: De Novo Repeat Classification and Fragment Assembly

Whirls & Bulges

Available gaps & mismatch

Page 14: De Novo Repeat Classification and Fragment Assembly

RepeatGluer AlgorithmRepeatGluer Algorithm

Construct the A-Bruijn graph Eliminate whirls Remove bulges Erosion – Remove all leaves Straighten zigzag paths Forming the consensus sequence Output repeat families

Page 15: De Novo Repeat Classification and Fragment Assembly

Constructing A-Bruijn Graphs Without Constructing A-Bruijn Graphs Without the Similarity Matrixthe Similarity Matrix

Constructing of the A-Bruijn graph assumes S and A S and { S1, …, St } can construct A-Bruijn graph of S

A set for every pair of consecutive positions in S Matrix |Si| x |Sj|

A snapshot of a “small” area of matrix A

S: A genomic sequencen: the length of SA: matrix n x n{ S1, …, St }: A set of substrings|Si|: the length of the string Si

Page 16: De Novo Repeat Classification and Fragment Assembly

Fragment AssemblyFragment Assembly

Assemblers Phrap ( Green 1994 ) Celera assembler ( Myers et al. 2000 ) EULER assembler ( Pevzner et al. 2001 )

http://nbcr.sdsc.edu/euler

ARCHNE, Phusion, CAP, TIGR

Building an accurate assembler EULER + Phrap EULER+ EULER’s accuracy in analyzing repeats & Phrap’s ability to han

dle low-coverage regions, low-quality reads, and read ends Less memory than the original EULER FragmentGluer algorithm

Page 17: De Novo Repeat Classification and Fragment Assembly

FragmentGluer AlgorithmFragmentGluer Algorithm

1. Construct the A-Bruijn graph of S2. Eliminate whirls by splitting the composed vertices 3. Remove bulges 4. Erosion procedure by removing all leaves5. Straighten zigzag paths6. Thread each read7. Definition consensus sequence8. Output repeat families9. Transform mate-pairs into mate-paths after step 610. Assemble the resulting contigs into scaffolds by the

EULER Scaffolding algorithm

Page 18: De Novo Repeat Classification and Fragment Assembly

RESULTS AND DISCUSSIONRESULTS AND DISCUSSION

Page 19: De Novo Repeat Classification and Fragment Assembly

BenchmarkingBenchmarking

Results of a study of 518 human chromosome 20 clones.

  Phrap ARACHNE EULER+

Av.# contigs/clone 6.8 13.8 6.2

Av. coverage 99.30% 98.60% 98.80%

# misassembled contigs 37 17 7

# missing repeats 5 9 4

EULER produced the least number of misassembled contigs. EULER also had the least number of missing repeat copies (4), ahead of Phrap (5) and Arachne (9). Average coverage, over 518 clones, was 99.3% for Phrap, 98.8% for EULER, and 98.6% for ARACHNE Average number of contigs per clone was the least for EULER (6.2) followed by Phrap (6.8) and ARACHNE (13.8).

Page 20: De Novo Repeat Classification and Fragment Assembly

More researchMore research

The consensus sequence analysis of FragmentGluer Detecting de novo HERVs as the consensus sequence of

FragmentGluer