Upload
hayley
View
50
Download
0
Embed Size (px)
DESCRIPTION
Algorithms for Two Versions of LCS. Problem. for Indeterminate Strings. Goal of this paper͙. • Study the classic LCS and the Constrained LCS (CLCS) problems for Indeterminate strings. • Present efficient algorithms to solve them. 5-9 Nov 2007. IWOCA 2007. 2. Longest Common Subsequence. - PowerPoint PPT Presentation
Citation preview
Algorithms for Two Versions of LCSProblem
for Indeterminate Strings
Goal of this paper͙
• Study the classic LCS and the Constrained LCS(CLCS) problems for Indeterminate strings• Present efficient algorithms to solve them
5-9 Nov 2007 IWOCA 2007 2
Longest Common Subsequence
• Given two sequences:- X = CAAGCTAAGCTAC- Y = TCAAGTAGAAC
• Common Subsequence: A Subseq common toboth X and Y.• LCS- A subseq having the highest length
5-9 Nov 2007 IWOCA 2007 3
LCS-Example1 2 3 4 5 6 7 8 9 10 11
X= C A A G C T A A G C T
A common subseq: CCT
Y= C C Length = 3G T A T
1 2 3 4 5 6
5-9 Nov 2007 IWOCA 2007 4
LCS-Example1 2 3 4 5 6 7 8 9 10 11 12
X= C A A G C T A A G C G T
Y= C C G T A T A Longest common subseq: CCTATLength = 5
1 2 3 4 5 6
5-9 Nov 2007 IWOCA 2007 5
LCS-Example1 2 3 4 5 6 7 8 9 10 11 12
X= C A A G C T A A G C G T
Y= C C G T A T A Longest common subseq: CCTATLength = 5
1 2 3 4 5 6 Another LCS: CGTATLength = 5
5-9 Nov 2007 IWOCA 2007 6
CLCS: A relatively New Variant
1 2 3 4 5 6 1 2 3 4 5 6
X= T C C A C A X= T C C A C A
Y= A C C A A G Y= A C C A A G
Z= A C Z= A C
5-9 Nov 2007 IWOCA 2007 7
Different Setting͙
• We study LCS and CLCS for indeterminatestrings (i-strings)• We call the two problems ILCS and CILCSrespectively
5-9 Nov 2007 IWOCA 2007 8
i-strings͙
• Let Σ = {A, C, G, T}• Then we can get 2^4 -1 = 15 non-empty setsof letters.• At each position of an i-string we have one ofthose sets.
5-9 Nov 2007 IWOCA 2007 9
i-stringsΣ
A C G T
A C G A C T A G T C G T
C G A C A G A T C G C T
A C G T
5-9 Nov 2007 IWOCA 2007 10
i-strings
1 2 3 4 5 6 7
X=
5-9 Nov 2007
TA C C A C
A
IWOCA 2007
TC C
11
i-strings: Equality/Match
1 2 3 4 5 6 7 X[3] = Y[1]. WHY?
X= A
Y= A
TC C A C
A
CTA C
TC
Because, X[3] п Y[1] = A ≠ Ø
C Y = X[1..3]
Y = X[3..5]
Y = X[4..6]
T TA C C C A C
Interestingly, X[1..3] ≠ X[3..5]!!!
5-9 Nov 2007 IWOCA 2007
A A
X[1..3] X[3..5] 12
i-strings: Equality/Match
1 2 3 4 5 6 7
X= A
Y= A
5-9 Nov 2007
TC C A C
A
CTA C
TC
X[3] =d Y[1]. WHY?C
Because, , X[3] п Y[1] = A ≠ Ø
Y =d X[1..3]
Y =d X[3..5]
Y =d X[4..6]
IWOCA 2007 13
ILCS1 2 3 4 5 6 7
AX=
Y=
B D D A AA
F
A C DB A A AC D F
5-9 Nov 2007 IWOCA 2007 14
CILCS1 2 3 4 5 6 7
AX=
Y=
Z=
B D D A AA
F
A C DB A A AC D F
B D D
5-9 Nov 2007 IWOCA 2007 15
CILCS1 2 3 4 5 6 7
AX=
Y=
Y=
B D D A AA
F
A C DB A A AC D F
B D D
5-9 Nov 2007 IWOCA 2007 16
Motivation͙
• Motivations for LCS and CLCS are well-known.• But, why indeterminate strings?
• Indeterminate strings are ubiquitous inbiological motifs
• And, both LCS and CLCS gets motivation frombioinformatics
5-9 Nov 2007 IWOCA 2007 17
Naive Algorithms
• Using the existing LCS and CLCS algorithms wecan solve ILCS and CILCS easily.
5-9 Nov 2007 IWOCA 2007 18
Naive ICLS Algorithm
• We use the basic and well-known O(n^2) DPsolution (Wagner&Fischer) to LCS:
5-9 Nov 2007 IWOCA 2007 19
Naive ICLS Algorithm
• We use the basic and well-known O(n^2) DPsolution (Wagner&Fischer) to LCS:
=d
5-9 Nov 2007 IWOCA 2007 20
Naive ILCS Algorithm…
• We assume a sorted order among the lettersin the sets of the i-strings
• Then, intersection can be done in O(|Σ|)time.• So total running time O(|Σ|n^2)
5-9 Nov 2007 IWOCA 2007 21
Our Goal
• Our goal is to get a better running time thanO(|Σ|n^2).
5-9 Nov 2007 IWOCA 2007 22
Our Strategy
• We want to facilitate an O(1) time evaluationfor =d i.e. indeterminate equality• To achieve that we do some preprocessing onthe input i-strings• Then we employ existing LCS algorithms
5-9 Nov 2007 IWOCA 2007 23
Preprocessing 1 for ILCS• We compute the following table:
• With the above table, the indeterminateequality can evaluated in O(1).
5-9 Nov 2007 IWOCA 2007 24
Computation of Table Σ ≡
X=
Y=
A
1
A
AT
C G T
2 3 4
TG C A
A
C TCA G
1 0 1 1 10 0 1 0 20 1 0 0 30 0 1 0 4
1 0 1 0 10 1 1 0 20 0 0 1 31 0 0 1 4
5-9 Nov 2007 IWOCA 2007 25
Computation of Table
1 0 1 10 0 1 00 1 0 00 0 1 0
1 0 1 00 1 1 00 0 0 11 0 0 1
5-9 Nov 2007 IWOCA 2007 27
Complete Algorithm
• With Table I, we can evaluate =d in O(1).• So, the DP requires O(n^2)!
• But how much to compute Table I?
5-9 Nov 2007 IWOCA 2007 29
Thank You