28
1 Longest Common Subsequence (LCS) Chapter 15 p 390 11/15/13 CS380 Algorithm Design and Analysis

Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

1

Longest Common Subsequence (LCS)

Chapter 15

p 390

11/15/13 CS380 Algorithm Design and Analysis

Page 2: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

2

Longest Common Subsequence

• Problem: Let x1x2...xm and y1y2...yn be two sequences over some alphabet. o We assume they are strings of characters

• Find a longest common subsequence (LCS) of x1x2...xm and y1y2...yn

11/15/13

CS380 Algorithm Design and Analysis

Many other string operations have the same basic structure.

Page 3: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

3

Example

• x1x2x3x4x5x6x7x8= b a c b f f c b

• y1y2y3y4y5y6y7y8y9 = d a b e a b f b c

• Longest Common Subsequence is:

11/15/13

CS380 Algorithm Design and Analysis

A subsequence is a set of characters that appear in left- to-right order, but not necessarily consecutively.

Page 4: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

4

Dynamic Programming

• LCS can be solved using dynamic programming

11/15/13

CS380 Algorithm Design and Analysis

1. Characterize the structure of an optimal solution

2. Recursively define the value of an optimal solution

3. Compute the value of an optimal solution bottom-up

4. Construct an optimal solution from the computed information

Page 5: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

5

Step 1 Characterizing

• Optimal substructure: If z = z1z2...zp is a LCS of x1x2...xm and y1y2...yn, then At least one of these most hold

o xm = yn, and z1z2...zp–1 is an LCS of x1x2...xm–1 and y1y2...yn–1,

o xm != yn, and z1z2...zp is an LCS of x1x2...xm–1 and y1y2...yn,

o xm != yn, and z1z2...zp is an LCS of x1x2...xm and y1y2...yn–1.

11/15/13

CS380 Algorithm Design and Analysis

Page 6: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

6

Step 2: Recursive Solution

11/15/13

CS380 Algorithm Design and Analysis

Page 7: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

p 394

Step 3&4

Page 8: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

8

Example

\ j i \

0 1d

2a

3b

4e

5a

6b

7f

8b

9c

0 0 0 0 0 0 0 0 0 0 0

1 b 0

2 a 0

3 c 0

4 b 0

5 f 0

6 f 0

7 c 0

8 b 011/15/13 CS380 Algorithm Design and Analysis

b,c matrices combined

Page 9: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

p395

Page 10: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

10

String Similarity

• Edit Distanceo Levenshtein

o minimize changes

• Sequence Alignmento Needleman-Wunsch

o maximize similarity by giving weights to types of differences

http://xlinux.nist.gov/dads/HTML/Levenshtein.html

Page 11: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

11

Edit Distance

• How many insertions, deletions, replacements will transform one string into another?o Damerau-Levenshtein includes transpositions

as a special case the → teh

http://en.wikipedia.org/wiki/Levenshtein_distance

Page 12: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

12

Levenshtein

Page 13: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

13

Edit Distance Matrix

ATCGTT vs AGTTAC

i

j

Backtrackingdeletion UPinsertion LEFTmatch/mismatch DIAG

A G T T A C

0 1 2 3 4 5 6

A 1

T 2

C 3

G 4

T 5

T 6

Page 14: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

14

Sequence Alignment

• Similarity based on gaps and mismatches.

• Alignmento matched pairs from both strings

o no crossings

• Generalized form of Levenshteino additional parameters:

gap penalty, δ mismatch cost ( αx,y ; αx,x = 0 )

Kleinberg, Tardos, Algorithm Design, Pearson Addison Wesley, 2006, p 278

http://www.aw-bc.com/info/kleinberg/

Page 15: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

15

Recurrence

• Two strings x1...xm and y1...yn

• In an optimal alignment, M, at least one of the following is true:

o (xm, yn) is in M

o xm is not matched

o yn is not matched

Page 16: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

16

Recurrence

• So, for i and j > 0

• opt(i,j)= min[αxi,yj + opt(i-1,j-1), δ + opt(i-1,j), // xi is not matched

δ + opt(i,j-1) ] // yj is not matched

• (xi,yj) is in an optimal alignment M for this subproblem iff the minimum achieved is achieved by the first of these three values.

Kleinberg, Tardos, Algorithm Design, Pearson Addison Wesley, 2006, p 282

Page 17: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

17

Sequence Alignment Graph

Kleinberg, Tardos, p 283

Page 18: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

18

Recover Alignment

Kleinberg, Tardos, p 284

Page 19: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

19

Sequence Alignment (space efficient)

• Hirschberg - 1975o value: need the current and previous column

Kleinberg, Tardos, p 285

Page 20: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

20

Actual Alignment

• How do we recover the actual alignment?o We need the entire matrix?

Page 21: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

21

Algorithm

Kleinberg, Tardos, p 288

Page 22: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

22

Kleinberg, Tardos, p 289

Page 23: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

23

String Matching / Searching

• Naive

• Horspool1

• Boyer-Moore1

• Rabin Karp2

• Knuth Morris Pratt2

1 Levitin, Introduction to The Design and Analysis of Algorithms, 3rd edition, Pearson Addison Wesley, p 259

2 CLRS, p 990 &1002

Not Dynamic Programmingbecause there are notsubproblems.

But precompute a tableto help you solve theproblem.

Page 24: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

Naiveint naiveSearch(string, pattern){ retVal = -1; mismatch = true: for( i=0;i < string.length - pattern.length &&

true == mismatch ; i++) { mismatch = false; for( j =0;j<pattern.length && !mismatch ;j++) { if( string[i] != pattern[j] ) { mismatch = true; } }

if ( !mismatch) { retVal = i; } }

return retVal;}

Page 25: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

25

Horspool

• match the pattern right to left

• on mismatch, shift the pattern smartlyo by 1+ character

• Preprocess string to determine shiftingo build a table for shifts for each valid character

Page 26: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

26

Example

x

String

p a c i f i c

i

String

p a c i f i c

d r i v e

String

g r o v e

p a c i f i c

String

p a c i f i c

pattern movement

character comparisons

need a better string!

Page 27: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

27

Shifting

• t(c) =

the pattern's length, m, if c is not among the first m-1 characters of the pattern

the distance from the rightmost c among the first m-1 characters of the pattern to its last character, otherwise

a b c f i ... p ... x y z

7

p a c i f i c

Page 28: Longest Common Subsequence (LCS)zeus.cs.pacificu.edu/chadd/cs380f13/Lectures/20Lecture.pdf3 Example •x1x2x3x4x5x6x7x8= b a c b f f c b •y1y2y3y4y5y6y7y8y9 = d a b e a b f b c •Longest

http://www.math.uaa.alaska.edu/~afkjm/cs351/handouts/strings.ppt

1 Levitin, Introduction to The Design and Analysis of Algorithms, 3rd edition, Pearson Addison Wesley, p 262