31
gorithms for Two Versions of Problem for Indeterminate String

Algorithms for Two Versions of LCS

  • Upload
    hayley

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Algorithms for Two Versions of LCS. Problem. for Indeterminate Strings. Goal of this paper͙. • Study the classic LCS and the Constrained LCS (CLCS) problems for Indeterminate strings. • Present efficient algorithms to solve them. 5-9 Nov 2007. IWOCA 2007. 2. Longest Common Subsequence. - PowerPoint PPT Presentation

Citation preview

Page 1: Algorithms for Two Versions of LCS

Algorithms for Two Versions of LCSProblem

for Indeterminate Strings

Page 2: Algorithms for Two Versions of LCS

Goal of this paper͙

• Study the classic LCS and the Constrained LCS(CLCS) problems for Indeterminate strings• Present efficient algorithms to solve them

5-9 Nov 2007 IWOCA 2007 2

Page 3: Algorithms for Two Versions of LCS

Longest Common Subsequence

• Given two sequences:- X = CAAGCTAAGCTAC- Y = TCAAGTAGAAC

• Common Subsequence: A Subseq common toboth X and Y.• LCS- A subseq having the highest length

5-9 Nov 2007 IWOCA 2007 3

Page 4: Algorithms for Two Versions of LCS

LCS-Example1 2 3 4 5 6 7 8 9 10 11

X= C A A G C T A A G C T

A common subseq: CCT

Y= C C Length = 3G T A T

1 2 3 4 5 6

5-9 Nov 2007 IWOCA 2007 4

Page 5: Algorithms for Two Versions of LCS

LCS-Example1 2 3 4 5 6 7 8 9 10 11 12

X= C A A G C T A A G C G T

Y= C C G T A T A Longest common subseq: CCTATLength = 5

1 2 3 4 5 6

5-9 Nov 2007 IWOCA 2007 5

Page 6: Algorithms for Two Versions of LCS

LCS-Example1 2 3 4 5 6 7 8 9 10 11 12

X= C A A G C T A A G C G T

Y= C C G T A T A Longest common subseq: CCTATLength = 5

1 2 3 4 5 6 Another LCS: CGTATLength = 5

5-9 Nov 2007 IWOCA 2007 6

Page 7: Algorithms for Two Versions of LCS

CLCS: A relatively New Variant

1 2 3 4 5 6 1 2 3 4 5 6

X= T C C A C A X= T C C A C A

Y= A C C A A G Y= A C C A A G

Z= A C Z= A C

5-9 Nov 2007 IWOCA 2007 7

Page 8: Algorithms for Two Versions of LCS

Different Setting͙

• We study LCS and CLCS for indeterminatestrings (i-strings)• We call the two problems ILCS and CILCSrespectively

5-9 Nov 2007 IWOCA 2007 8

Page 9: Algorithms for Two Versions of LCS

i-strings͙

• Let Σ = {A, C, G, T}• Then we can get 2^4 -1 = 15 non-empty setsof letters.• At each position of an i-string we have one ofthose sets.

5-9 Nov 2007 IWOCA 2007 9

Page 10: Algorithms for Two Versions of LCS

i-stringsΣ

A C G T

A C G A C T A G T C G T

C G A C A G A T C G C T

A C G T

5-9 Nov 2007 IWOCA 2007 10

Page 11: Algorithms for Two Versions of LCS

i-strings

1 2 3 4 5 6 7

X=

5-9 Nov 2007

TA C C A C

A

IWOCA 2007

TC C

11

Page 12: Algorithms for Two Versions of LCS

i-strings: Equality/Match

1 2 3 4 5 6 7 X[3] = Y[1]. WHY?

X= A

Y= A

TC C A C

A

CTA C

TC

Because, X[3] п Y[1] = A ≠ Ø

C Y = X[1..3]

Y = X[3..5]

Y = X[4..6]

T TA C C C A C

Interestingly, X[1..3] ≠ X[3..5]!!!

5-9 Nov 2007 IWOCA 2007

A A

X[1..3] X[3..5] 12

Page 13: Algorithms for Two Versions of LCS

i-strings: Equality/Match

1 2 3 4 5 6 7

X= A

Y= A

5-9 Nov 2007

TC C A C

A

CTA C

TC

X[3] =d Y[1]. WHY?C

Because, , X[3] п Y[1] = A ≠ Ø

Y =d X[1..3]

Y =d X[3..5]

Y =d X[4..6]

IWOCA 2007 13

Page 14: Algorithms for Two Versions of LCS

ILCS1 2 3 4 5 6 7

AX=

Y=

B D D A AA

F

A C DB A A AC D F

5-9 Nov 2007 IWOCA 2007 14

Page 15: Algorithms for Two Versions of LCS

CILCS1 2 3 4 5 6 7

AX=

Y=

Z=

B D D A AA

F

A C DB A A AC D F

B D D

5-9 Nov 2007 IWOCA 2007 15

Page 16: Algorithms for Two Versions of LCS

CILCS1 2 3 4 5 6 7

AX=

Y=

Y=

B D D A AA

F

A C DB A A AC D F

B D D

5-9 Nov 2007 IWOCA 2007 16

Page 17: Algorithms for Two Versions of LCS

Motivation͙

• Motivations for LCS and CLCS are well-known.• But, why indeterminate strings?

• Indeterminate strings are ubiquitous inbiological motifs

• And, both LCS and CLCS gets motivation frombioinformatics

5-9 Nov 2007 IWOCA 2007 17

Page 18: Algorithms for Two Versions of LCS

Naive Algorithms

• Using the existing LCS and CLCS algorithms wecan solve ILCS and CILCS easily.

5-9 Nov 2007 IWOCA 2007 18

Page 19: Algorithms for Two Versions of LCS

Naive ICLS Algorithm

• We use the basic and well-known O(n^2) DPsolution (Wagner&Fischer) to LCS:

5-9 Nov 2007 IWOCA 2007 19

Page 20: Algorithms for Two Versions of LCS

Naive ICLS Algorithm

• We use the basic and well-known O(n^2) DPsolution (Wagner&Fischer) to LCS:

=d

5-9 Nov 2007 IWOCA 2007 20

Page 21: Algorithms for Two Versions of LCS

Naive ILCS Algorithm…

• We assume a sorted order among the lettersin the sets of the i-strings

• Then, intersection can be done in O(|Σ|)time.• So total running time O(|Σ|n^2)

5-9 Nov 2007 IWOCA 2007 21

Page 22: Algorithms for Two Versions of LCS

Our Goal

• Our goal is to get a better running time thanO(|Σ|n^2).

5-9 Nov 2007 IWOCA 2007 22

Page 23: Algorithms for Two Versions of LCS

Our Strategy

• We want to facilitate an O(1) time evaluationfor =d i.e. indeterminate equality• To achieve that we do some preprocessing onthe input i-strings• Then we employ existing LCS algorithms

5-9 Nov 2007 IWOCA 2007 23

Page 24: Algorithms for Two Versions of LCS

Preprocessing 1 for ILCS• We compute the following table:

• With the above table, the indeterminateequality can evaluated in O(1).

5-9 Nov 2007 IWOCA 2007 24

Page 25: Algorithms for Two Versions of LCS

Computation of Table Σ ≡

X=

Y=

A

1

A

AT

C G T

2 3 4

TG C A

A

C TCA G

1 0 1 1 10 0 1 0 20 1 0 0 30 0 1 0 4

1 0 1 0 10 1 1 0 20 0 0 1 31 0 0 1 4

5-9 Nov 2007 IWOCA 2007 25

Page 26: Algorithms for Two Versions of LCS

Computation of Table

1 0 1 10 0 1 00 1 0 00 0 1 0

1 0 1 00 1 1 00 0 0 11 0 0 1

5-9 Nov 2007 IWOCA 2007 27

Page 27: Algorithms for Two Versions of LCS
Page 28: Algorithms for Two Versions of LCS

Complete Algorithm

• With Table I, we can evaluate =d in O(1).• So, the DP requires O(n^2)!

• But how much to compute Table I?

5-9 Nov 2007 IWOCA 2007 29

Page 29: Algorithms for Two Versions of LCS
Page 30: Algorithms for Two Versions of LCS
Page 31: Algorithms for Two Versions of LCS

Thank You