Upload
ursula
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
De Novo Repeat Classification and Fragment Assembly. 석사 1 년 김 우 연. PROGRAMS related Repeat. Repeat Annotation - libraries RepeatMasker ( A.F.A. Smit and P. Green, unpubl. ) MaskerAid ( Bedell et al. 2000 ) No de novo compilation Repeat Analysis RepeatMatch ( Delcher et al. 1999 ) - PowerPoint PPT Presentation
Citation preview
Pusan National UniversityInterdisciplinary Program of Bioinformatics
De Novo Repeat Classification De Novo Repeat Classification and Fragment Assemblyand Fragment Assembly
석사 1 년김 우 연
PROGRAMS related RepeatPROGRAMS related Repeat
Repeat Annotation - libraries RepeatMasker ( A.F.A. Smit and P. Green, unpubl. ) MaskerAid ( Bedell et al. 2000 ) No de novo compilation
Repeat Analysis RepeatMatch ( Delcher et al. 1999 ) REPuter ( Kurtz et al. 2000, 2001 ) RECON, RepeatFinder, LTR_STRUC No compact overview or summary of the repeat family
Genome Research Received January 27, 2004 Accepted in revised form June 29, 2004
CONTENTSCONTENTS
Introduction Concepts Methods
De Bruijn Graphs & A-Bruijn Graphs RepeatGluer Algorithm Constructing A-Bruijn Graphs Without the Similarity Matrix Fragment Assembly FragmentGluer Algorithm
Results and Discussion
INTRODUCTIONINTRODUCTION
“The problem of automated repeat sequence family classification is inherently messy and ill-defined and does not appear to be amenable to a clean algorithmic attack” – Bao and Eddy (2002)
One of the difficulties in repeat classification is that many repeats represent mosaics of sub-repeats – Bailey et al. 2002
Aims Proposing a new approach to repeat classification FragmentGluer assembler
CONCEPSCONCEPS
Genomic dot-plotGenomic dot-plot
Genomic dot-plot of an imaginary sequence
An imaginary evolutionary process
Gluing repeated regions leads to the repeat graph the final genome
The idea of our approachThe idea of our approach
By gluing points together, repeats transform into the
A-Bruijn graph
Mosaic repeat organizationMosaic repeat organization
BAC from human Chromosome Y Repeat pairs by REPuter & Sub-repeats by our division Repeat multigraph Repeat graph RepeatFinder vs RECON vs REPuter
METHODSMETHODS
De Bruijn Graphs & A-Bruijn De Bruijn Graphs & A-Bruijn GraphsGraphs
De Bruijn Graph: ACTGCTGCC
ACT CTG
TGCGCT GCC
ACTGCTGCC ACTGCTGCC
De Bruijn Graphs & A-Bruijn De Bruijn Graphs & A-Bruijn GraphsGraphs
A-Bruijn Graph: … AT … ACT … ACAT …
Whirls & Bulges
Available gaps & mismatch
RepeatGluer AlgorithmRepeatGluer Algorithm
Construct the A-Bruijn graph Eliminate whirls Remove bulges Erosion – Remove all leaves Straighten zigzag paths Forming the consensus sequence Output repeat families
Constructing A-Bruijn Graphs Without Constructing A-Bruijn Graphs Without the Similarity Matrixthe Similarity Matrix
Constructing of the A-Bruijn graph assumes S and A S and { S1, …, St } can construct A-Bruijn graph of S
A set for every pair of consecutive positions in S Matrix |Si| x |Sj|
A snapshot of a “small” area of matrix A
S: A genomic sequencen: the length of SA: matrix n x n{ S1, …, St }: A set of substrings|Si|: the length of the string Si
Fragment AssemblyFragment Assembly
Assemblers Phrap ( Green 1994 ) Celera assembler ( Myers et al. 2000 ) EULER assembler ( Pevzner et al. 2001 )
http://nbcr.sdsc.edu/euler
ARCHNE, Phusion, CAP, TIGR
Building an accurate assembler EULER + Phrap EULER+ EULER’s accuracy in analyzing repeats & Phrap’s ability to han
dle low-coverage regions, low-quality reads, and read ends Less memory than the original EULER FragmentGluer algorithm
FragmentGluer AlgorithmFragmentGluer Algorithm
1. Construct the A-Bruijn graph of S2. Eliminate whirls by splitting the composed vertices 3. Remove bulges 4. Erosion procedure by removing all leaves5. Straighten zigzag paths6. Thread each read7. Definition consensus sequence8. Output repeat families9. Transform mate-pairs into mate-paths after step 610. Assemble the resulting contigs into scaffolds by the
EULER Scaffolding algorithm
RESULTS AND DISCUSSIONRESULTS AND DISCUSSION
BenchmarkingBenchmarking
Results of a study of 518 human chromosome 20 clones.
Phrap ARACHNE EULER+
Av.# contigs/clone 6.8 13.8 6.2
Av. coverage 99.30% 98.60% 98.80%
# misassembled contigs 37 17 7
# missing repeats 5 9 4
EULER produced the least number of misassembled contigs. EULER also had the least number of missing repeat copies (4), ahead of Phrap (5) and Arachne (9). Average coverage, over 518 clones, was 99.3% for Phrap, 98.8% for EULER, and 98.6% for ARACHNE Average number of contigs per clone was the least for EULER (6.2) followed by Phrap (6.8) and ARACHNE (13.8).
More researchMore research
The consensus sequence analysis of FragmentGluer Detecting de novo HERVs as the consensus sequence of
FragmentGluer