16
1 Reverse Factor Algorithm Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen Speeding up on two string matching algorithm s, Algorithmica, Vol.12, 1994, pp.247-267 CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK, S., L ECROQ, T., PLANDOWSKI, W. and RYTTER, W.

1 Reverse Factor Algorithm Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen Speeding up on two string matching algorithms, Algorithmica, Vol.12, 1994, pp.247-267

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

1

Reverse Factor Algorithm

Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen

Speeding up on two string matching algorithms, Algorithmica, Vol.12, 1994, pp.247-267

CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK, S., LECROQ, T., PLANDOWSKI, W. and RYTTER, W.

2

Rule 1: The Suffix to Prefix Rule • For a window to have any chance to match a pattern,

in some way, there must be a suffix of the window which is equal to a prefix of the pattern.

3

Basic Ideas• Open a window W with size |P| in the text.

T|P|

W

p

• Find the longest suffix of W is also the prefix of pattern.

T|P|

p

W

Match!

Case 1:

4

T|P|

W

p

Case 2:

T|P|

W

p

T|P|

W

p

Case 3:

|P|

If there is no such suffix, we move W with length |P|.

5

Preprocessing phase

• T=GCATCGGCGAGAGTATACAGTACG 

• P=GCAGAGAG

• L(S): a set contains all prefixes of the pattern.

}G,GC,GCA,GCAG,GCAGA,GCAGAG,GCAGAGA, {GCAGAGAG,)( SL

08 7 6 5 4 3 2 1GA GAGG AC

C

C

C A

We construct the suffix automaton of P.

Suffix Automaton

6

Preprocessing: Construct a Suffix Tree

Example:P :GCAGAGAGPR:GAGAGACGSuffixes of PR: GAGAGACG AGAGACG GAGACG AGACG GACG

ACG CG G

G

6

121

A

54

A

2

3

11109

7

8

GA

0

PR: the reversal string of P.

1

8 6 4 7 5 3

2

7

G C A T C G C A G A G A G T A T A C A G T A C G

G C A G A G A G

When there is a match, how do we move the window?

C G

G0

6

121

CG

A

54

CG

GA

CG

A

2

3

GA

11109

CG

7

8

GA

CG

GACG

1

8

6 4 7 53

2

T

P

8

G C A T C G C A G A G A G T A T A C A G T A C G

G C A G A G A G

C G

G0

6

121

CG

A

54

CG

GA

CG

A

2

3

GA

11109

CG

7

8

GA

CG

GACG

1

8

6 4 7 53

2

T

P

9

G C A T C G C A G G C A G T A T A C A G T A C G

G C A G A G A G

T

P

G

6

121

A

54

A

2

3

11109

7

8

GA

01

8

6 4 7 5 3

2

Find the longest suffix of W is also the prefix of pattern.

10

G C A T C G C A G G C A G T A T A C A G T A C G

G C A G A G A G

T

P

G

6

121

A

54

A

2

3

11109

7

8

GA

01

8

6 4 7 5 3

2

11

A Whole Example

• T=GCATCGCAGAGAGTATACAGTACG 

• P=GCAGAGAG• First attempt : 

G C A T C G C A G A G A G T A T A C A G T A C G

G C A G A G A GShift by: 5 (8 - 3)

G0

6

121

A

54

A

2

3

11109

7

8

GA

1

8

6 4 7 5 3

2

T

P

12

G C A T C G C A G A G A G T A T A C A G T A C G

G C A G A G A G

Second attempt :

Shift by: 7 (8 - 1)

C G

G0

6

121

CG

A

54

CG

GA

CG

A

2

3

GA

11109

CG

7

8

GA

CG

GACG

1

8

6 4 7 53

2

T

P

13

Third attempt:

G C A T C G C A G A G A G T A T A C A G T A C G

G C A G A G A G

Shift by: 7 (8 - 1)

T

P

C G

G0

6

121

CG

A

54

CG

GA

CG

A

2

3

GA

11109

CG

7

8

GA

CG

GACG

1

8

6 4 7 53

2

14

Third attempt:

G C A T C G C A G A G A G T A T A C A G T A C G

G C A G A G A G

T

P

C G

G0

6

121

CG

A

54

CG

GA

CG

A

2

3

GA

11109

CG

7

8

GA

CG

GACG

1

8

6 4 7 53

2

15

Conclusion

• Preprocessing phase is O(m).

• Searching phase is O(mn).

16

Reference• [A90]Algorithms for finding patterns in strings, A. V. Aho, Ha

ndbook of Theoretical Computer Science, Vol. A, Elsevier, Amsterdam, 1990, pp.255-300.

• [A85]The myriad virtues of suffix trees, Apostolico, A., Combinatorial Algorithms on words, NATO Advanced Science Institutes, Series F, Vol. 12, 1985, pp.85-96

• [AG86]The Boyer-Moore-Galil string searching strategies revisited, Apostolico, A. and Giancarlo, R., SIAM, Comput. 15, 1986, pp98-105.

• [BR92]Average running time of the Boyer-Moore-Horspool algorithm, Baeza-Yates, R. A. and Regnier, M. Theoret. Comput. Sci., 1992, pp.19-31.

• [BKR91]Analysis of algorithms and Data Structures, Banachowski, L., Kreczmar, A. and Rytter, W., Addison-Wesley. Reading, MA,1991.