Upload
magnar
View
21
Download
0
Tags:
Embed Size (px)
DESCRIPTION
RNA Assembly Using extending method. Wei Xueliang 2010-04-07. Overview. Why abandon deBruijn . Why abandon Extended deBruijn . Introduction to current method. Handle the old problem. The new problem. Tod o. Why abandon deBruijn . De Bruijn Graph’s ( dis )advantage: Very Fast. - PowerPoint PPT Presentation
Citation preview
RNA Assembly Using extending method.
Wei Xueliang2010-04-07
Overview
• Why abandon deBruijn.• Why abandon Extended deBruijn.• Introduction to current method.• Handle the old problem.• The new problem.• Todo
Why abandon deBruijn.• De Bruijn Graph’s (dis)advantage: – Very Fast. – Coverage distribution and K-Value affect a
lot
• Key : the coverage is not uniform distributed in the RNA assembly.– No best K value.
Why abandon deBruijn.
• The length of the red part is 27.
deBruijn Graph of K = 28
deBruijn Graph of K = 29
deBruijn Graph of K = 30
Why abandon deBruijn.• Key : The coverage is not uniform distributed
in the RNA assembly.– No best K value.
• Can we using different K to run the program many times?
• This is not De Novo Assembly’s job. – Time. – Provide high accurate contigs with-in limited time.– Scaffolding programs.
Why abandon Extended deBruijn.• My Extended de Bruijn method: – Using two or more K value at the same time.
Why abandon Extended deBruijn.
• The change rate of coverage is above my expectation. Need many K.
• The convert between different K are difficult. • Memory problem for big K. When K > 32, each
K-index need > 50G (with Data-Sets: 10G)
• Throw the K away.
Introduction to the new method
• From Pramila’s genome assembly method. • Start from any Tag and do a correction.• If successfully corrected, continue.
Introduction to the new method
• Find all the tag which have at least 24 bps overlaps. (Magic number)
• Using these overlapping tags to extend Base and continue add more tags.
Introduction to the new method
• How to find the overlapping tags fast and with mis-match?
• Index and Union:{Tag3}, {Tag2, Tag3}, {Tag3, Tag4}Union =>{Tag1, Tag2, Tag3, Tag4}
Introduction to the new method
• How to find the next overlapping tags fast and with mis-match?
• V1 <= U3• V2 <= (U1 << 1) + 0• V3 <= (U2 << 1) + 0
Handle the old problem.
• When the length of overlapping part < 24?
Handle the old problem.
• Check the tags one by one by descending order of the length of overlap.
Handle the old problem.
A GOverlap Count % Count %
60 1 6.67% 1 4.76%52 3 20.00% 1 4.76%44 6 40.00% 2 9.52%36 10 66.67% 10 47.62%30 11 73.33% 16 76.19%24 15 100.00% 21 100.00%
Handle the old problem.
A G(High Exp)Overlap Count % Count %
56 1 6.67% 5 2.50%50 3 20.00% 10 5.00%44 6 40.00% 20 10.00%36 10 66.67% 120 60.00%30 11 73.33% 150 75.00%24 15 100.00% 200 100.00%
Handle the old problem.
• Degree of approximation.
Handle the old problem.
• Less tips.
• Do not have bubbles. – Because we doing
overlap with mis-match.
– Use whole tags
The new problem.
• Speed.
• The tail of the tag often have more errors.– Reverse Extending Problem.
Todo
• Handle Reverse Extending Problem.• Speed
• Finish the comparision between deBruijn method(velvet) and my method.
• Paired End Tag.
• Thank you very much for attention.