Upload
nathaniel-chase
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Alain DeniseBioinformatiqueLRI OrsayUMR CNRS 8623Université Paris-Sud 11
Algorithmes pour la comparaison des structures secondaires d’ARNAlgorithmes pour la comparaison des structures secondaires d’ARN
© Ebbe Sloth Andersen
Les multiples rôles de l’ARNLes multiples rôles de l’ARN
© Ebbe Sloth Andersen
Les multiples rôles de l’ARNLes multiples rôles de l’ARN
Why RNA ?Why RNA ?
Present in all cellular processes The only molecule which can be genome as well
as catalyser Origin of life (?): RNA world Frequent target for antibiotics
© E.Westhof 2005
RNA structure: tRNARNA structure: tRNA
Primary structure
Tertiary structure Secondary structure
GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAUAUCUGGAGGUCCUGUGUUCGAUCCCACAGAAUUCGCACCA
RNA structure levelsRNA structure levelsRNA structure ~ Graph of bounded degree,
containing a (known) hamiltonian path.
Arc-annotated sequences
General (Tertiary structure)
Crossing (Secondary structure
with pseudoknots)
Nested (Secondary structurewithout pseudoknots)
Plain (Primary structure)
RNA « Bio-Algorithmics » RNA « Bio-Algorithmics »
Structure prediction (given sequence) Design: sequence prediction (given structure) Structural pattern-matching Comparison of two or several structures
Why to compare RNA structures ?Why to compare RNA structures ?
• How much are they similar (or different?)
classification phylogeny
• Which parts are the more similar between the two structures?
• Is the small one similar to a part of the large one?
Comparison score + correspondence between the structures
Edition and alignmentEdition and alignment
We are given a set of basic operations and a score function associated to each of them.
Data : two structures S1 and S2.
• Edit(S1,S2) : find a best-scoring sequence of operations which changes S1 into S2.
• Align(S1,S2) : find a structure S which contains S1 and S2 as substructures, in such a way to maximize
Score(Edit(S1,S)+Edit(S2,S)).
Example: sequence comparisonExample: sequence comparison
Deux séquences v = v1v2…vn et w = w1w2…wm
Opérations d’édition : • ins(x,i) • suppr(x,i)• subs(x,y,i)
CHAT - suppr(C,1) HAT - subs(H,R,1) RAT
(Pour les séquences : édition ~ alignement : CHAT - RAT)
Example: tree comparisonExample: tree comparison
Edition vs AlignmentEdition vs Alignment
Alignment
EditionIns( )Del( )
Subs( , )
Ancestor relations are conserved
The nested caseThe nested case
Secondary structures (without pseudokots) Tree comparison
Tree comparisonTree comparison
Tree edition algorithmTree edition algorithmZhang, Shasha 1989
Tree edition algorithmTree edition algorithmScore( (f) , ’(f’) ) = Max
Subs(, ’) + Score(f,f’)
Ins(’) + Score((f) , f’ )
Del() + Score( f , ’(f’) )
Score( [(f) o t1 o … o tp] , [’(f’), t’1 o … o t’q] ) = Max
Score((f), ’(f’)) + Score([t1 o … o tp], [t’1 o … o t’q])
Ins(’) + Score( [(f) o t1 o … o tp] , [ f’, t’1 o … o t’q])
Del() + Score([ f o t1 o … o tp] , [’(f’) o t’1,… o t’q])
f t1 t2 … tp
Zhang, Shasha 1989
O(n3logn) [Klein 1998]
Score( (f) , ’(f’) ) = Max
Subs(, ’) + Score(f,f’)
Ins(’) + Score((f) , f’ )
Del() + Score( f , ’(f’) )
Tree alignment algorithmTree alignment algorithm
Score((f) o t1 o … o tp ; ’(f’) o t’1 o … o t’q ) = Max
Score((f); ’(f’)) + Score(t1 o … o tp ; t’1 o … o t’q)
Ins(’) + Maxi { Score((f) o … o ti ; f’ ) + Score(ti+1 o … o tp ; t’1 o … o t’q) }
Del() + Maxj { Score( f ; ’(f’) o t’1 o … o t’j) + Score(t1 o … o tp; t’j+1 o … o t’q) }
f t1 t2 … tp
Jiang, Wang, Zhang 1995
O(n4)
Edition vs AlignmentEdition vs AlignmentScore( [(f), t1,…,tp] , [’(f’), t’1,…,t’q] ) = Max
…
Ins(’) + Score( [(f), t1,…,tp] , [ f’, t’1,…,t’q])
…
Score( [(f), t1,…,tp] , [’(f’), t’1,…,t’q] ) = Max
…
Ins(’) + Maxi { Score( [(f), …ti] , f’ ) + Score([ti+1,…, tp], [t’1,…,t’q]) }
…
Edition vs AlignmentEdition vs AlignmentScore( , ) = Max
…
Ins( ) + Score( , )
…
Score( , ) = Max
…
Ins( ) + Maxi { Score( , ) + Score( , ) }
…
i+1i
Edition vs AlignmentEdition vs AlignmentScore( , ) = Max
…
Ins( ) + Score( , )
…
Score( , ) = Max
…
Ins( ) + Maxi { Score( , ) + Score( , ) }
…
i+1i
Can be inserted anywhere
Complexity Complexity
Edition [Zhang, Shasha 1989, Klein 1998]
• Worst-case : O(n4) [Zhang-Shasha 1989] O(n3logn) [Klein 1998,
Dulucq-Touzet 2003]
• In average : O(n3) [Dulucq-Tichit 2003]
Alignment [Jiang, Wang, Zhang 1995]
• Worst-case : O(n4)
3 operations!
AU AU
GCGC
GUGU
UAU U
Delete( )
Insert( )
Insert( )
Edition operations: problemEdition operations: problem
A-UU-AG-CC-U
A-UU U
G-CC-U
AUGG…….UCAU AUGG…….UCUU
Opérations on bases: Substitution:
Deletion / Insertion:
Operations on arcs:Arc-substitution:
Arc-deletion / Arc-insertion:
Arc-breaking / :
Arc-altering / :
A C
A
C G U A
C G
C G C G
C G C -
Edition operations on RNAEdition operations on RNA
New
A first solutionA first solution
A-UU-AG-CC-U
A-UU A
G-CC-U
AUGG…….UCAU AUGG…….UCAU
A
U
G
C
U
A
C
U
A
U
G
C
U
A
C
U
But this implies some constraints on the scores. For example:
Arc-deletion = Arc-Breaking + 2 Base-Deletion
Höchsmann, Töller, Gierich, Kurtz 2003(RNAforester)
Edition operations on RNAEdition operations on RNA
Opérations on bases: Substitution:
Deletion / Insertion:
Operations on arcs:Arc-substitution:
Arc-deletion / Arc-insertion:
Arc-breaking / :
Arc-altering / :
A C
A
C G U A
C G
C G C G
C G C -
General
Crossing
Nested
Plain
Complexity of the edition problemComplexity of the edition problem
General Crossing Nested Plain
General NP-complete
Crossing NP-complete
Nested NP-complete O(nm3)
Plain O(nm / logn)
• Jiang, Lin, Ma, Zhang 2002• Blin, Fertin, Rusu, Sinoquet 2003• Crochemore, Landau, Ziv-Ukelson 2002
If 2Score(Arc-altering) = Score(Arc-breaking) + Score (Arc-removing), then algorithm in O(n3m) or Edit(crossing,nested) et Edit(nested,nested)
Complexity of the edition problemComplexity of the edition problem
Complexity of 2ary struct. comparisonComplexity of 2ary struct. comparison
Tree operations RNA operations
Edition O(n3logn)[Zhang-Shasha 1989, Klein 1998]
NP-complete[Blin, Fertin, Sinoquet, Rusu 2003]
Alignment O(n4) [Jiang, Wang, Zhang 1995] ?
Secondary structure alignmentSecondary structure alignment
A-BCD-EFGABB-DF-FG
AB---CDEFGABBDF---FG
ABCDEFG ABBDFFG
Edition Alignment
New edition operations on treesNew edition operations on trees
Arc-breaking / :
Arc-altering / :
C G C G
C G C -
Alignment algorithm (1/5)Alignment algorithm (1/5)
f
Alignment algorithm (2/5) Alignment algorithm (2/5)
f t
Alignment algorithm (2/5) Alignment algorithm (2/5)
f t
Alignment algorithm (2/5)Alignment algorithm (2/5)
f t
Alignment algorithm (2/5)Alignment algorithm (2/5)
f t
Alignment algorithm (2/5)Alignment algorithm (2/5)
f t
Alignment algorithm (3/5)Alignment algorithm (3/5)
f t
Alignment algorithm (3/5)Alignment algorithm (3/5)
f t
Alignment algorithm (3/5)Alignment algorithm (3/5)
f t
Alignment algorithm (3/5)Alignment algorithm (3/5)
f t
Alignment algorithm (3/5)Alignment algorithm (3/5)
f t
Alignment algorithm (4/5)Alignment algorithm (4/5)
f t
Alignment algorithm (5/5)Alignment algorithm (5/5)
f t
Alignment algorithm (5/5) Alignment algorithm (5/5)
f t
Alignment algorithm (5/5) Alignment algorithm (5/5)
f t
Tree operations RNA operations
Edition O(n3logn)[Zhang-Shasha 1989, Klein 1998]
NP-complete[Blin, Fertin, Sinoquet, Rusu 2003]
Alignment O(n4) [Jiang, Wang, Zhang 1995]
O(n4) [Herrbach, AD, Dulucq, Touzet 2005]
Complexity of 2ary struct. comparisonComplexity of 2ary struct. comparison
Tree operations RNA operations
Edition O(n3logn)[Zhang-Shasha 1989, Klein 1998]
NP-complete[Blin, Fertin, Sinoquet, Rusu 2003]
Alignment O(n4) [Jiang, Wang, Zhang 1995]
O(n4) [Herrbach, AD, Dulucq, Touzet 2005]
Complexity of 2ary struct. comparisonComplexity of 2ary struct. comparison
Complexity of the alignment problem for the other structure
levels: [Blin, Touzet 2006]
Example: two tRNAsExample: two tRNAs
Homo sapiens Bacillus subtilis
Drawing: Tulip (David Auber et al., LaBRI)
Base-subs / Arc-subs
Deletions / Insertions
Arc-breaking
Arc-altering
Et dans la vraie vie ? Et dans la vraie vie ?
Alignement de RNAses PAlignement de RNAses P
Alignement de RNAses PAlignement de RNAses P
Alignement de RNAses PAlignement de RNAses P
Alignement de RNAses PAlignement de RNAses P
To do…To do…• Biological validation :
• Test on real data• Comparison with other softwares (RNAForester, MiGal [J.Allali,
M.F.Sagot])• Combined approaches ([J.Allalli, A.Ouangraoua-P.Ferraro]) • Parameters : substitution matrices etc.• Statistical evaluation of results
Relevant algorithms and parameters Useful and user-friendly programs
• Sequence/Structure alignment• Multiple alignment • …
CréditsCrédits
• Julien Allali• David Auber • Serge Dulucq• Claire Herrbach• Rym Kachouri• Yann Ponty• Michel Termier• Laurent Tichit• Hélène Touzet• Eric Westhof