40
Approximate On-line Palindrome Recognition, and Applications Amihood Amir Benny Porat

Approximate On-line Palindrome Recognition, and Applications

  • Upload
    zanta

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Approximate On-line Palindrome Recognition, and Applications. Amihood Amir Benny Porat. Moskva River. Confluence of 4 Streams. Approximate Matching. Palindrome Recognition. CPM 2014. Online Algorithms. Interchange Matching. Palindrome Recognition. - PowerPoint PPT Presentation

Citation preview

Page 1: Approximate On-line Palindrome Recognition, and Applications

Approximate On-line Palindrome Recognition,

and Applications

Amihood AmirBenny Porat

Page 2: Approximate On-line Palindrome Recognition, and Applications

Moskva River

Page 3: Approximate On-line Palindrome Recognition, and Applications

Confluence of 4 Streams

Palindrome Recognition

Approximate Matching

Interchange Matching Online Algorithms

CPM 2014

Page 4: Approximate On-line Palindrome Recognition, and Applications

Palindrome Recognition- Voz'mi-ka slovo ropot, - govoril Cincinnatu ego shurin, ostriak, -- I prochti obratno. A? Smeshno poluchaetsia? Vladimir Nabokov, Invitation to a Beheading (1)

"Take the word ropot [murmur]," Cincinnatus' brother-in-law, the wit, was saying to him, "and read it backwards. Eh? Comes out funny, doesn't it?" [--› topor: the axe]

A palindrome is a string that is the same whether read from right to left or from left to right: Examples: доход A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama!

Page 5: Approximate On-line Palindrome Recognition, and Applications

Palindrome ExampleIbn Ezra: Medieval Jewish philosopher, poet, Biblical commentator, and mathematician.

Was asked: " אבי אל חי שמך למה מלך משיח לא"יבא [ My Father, the Living God, why does the king messiah not arrive?]

His response: דעו מאביכם כי לא בוש אבוש, שוב אשוב אליכם כי בא ""מועד[ Know you from your Father that I will not be delayed. I will return to you when the time will come ]

Page 6: Approximate On-line Palindrome Recognition, and Applications

Palindromes in Computer Science

Great programming exercise in CS 101.

Example of a problem that can be solved by a RAM in linear time, but not by a 1-tape Turing machine.

(Can be done in linear time by a 2-tape TM)

Page 7: Approximate On-line Palindrome Recognition, and Applications

Palindrome ConcatenationWe may be interested in finding out whether a string is a concatenation of palindromes of length > 1.

Example: ABCCBABBCCBCAACB

Why would we be interested in such a funny problem? – we’ll soon seeExercise: Do this in linear time…

ABCCBABBCCBCAACB

Page 8: Approximate On-line Palindrome Recognition, and Applications

Stream 2 - ApproximationsAs in exact matching, there may be

errors. Find the minimum number of errors that, if fixed, will give a string that is a concatenation of palindromes of length > 1

Example: ABCCBCBBCCBCABCB

For Hamming distance:A-Porat [ISAAC 13]: Algorithm of time O(n2)

ABCCBABBCCBCAACB

Page 9: Approximate On-line Palindrome Recognition, and Applications

Stream 3 - ReversalsWhy is this funny problem interesting?

Sorting by reversals: In the evolutionary process a substring may “detach” and “reconnect” in reverse:

ABCABCDAABCBADCBAADCB

ABCABCDAABCBAD

Page 10: Approximate On-line Palindrome Recognition, and Applications

Sorting by ReversalsWhat is the minimum number of

reversals that, when applied to string A, result in string B?

History: Introduced: Bafna & Pevzner [95]NP-hard: Carpara [97]Approximations: Christie [98]

Berman, Hannenhalli, Karpinski [02]

Hartman [03]

Page 11: Approximate On-line Palindrome Recognition, and Applications

Sorting by Reversals – Polynomial time Relaxations

Signed reversals: Hannenhalli & Pevzner [99]

Kaplan, Shamir, Tarjan [00]

Tannier & Sagot [04]. . .

Disjointness: Swap Matching Muthu [96]Two constraints:

1. The length of the reversed substring is limited to 2.

2. All swaps are disjoint.

Page 12: Approximate On-line Palindrome Recognition, and Applications

• Reversal Distance (RD):– The RD between s1 and s2 is the

minimum number k, such that there exist s2’ , where HAM(s1,s2’) =k, and s1 reversal match s2.

A B D E A B C D A

E C B A B A A D AS1:S2:

RD(S1,S2) = 2

Pattern Matching with Disjoint Reversals

Page 13: Approximate On-line Palindrome Recognition, and Applications

Interleave Strings:

A B D E A B C D A

E D B A B A A D CS1:S2:

Connection between Reversal Matching and Palindrome

Matching

A C D D C A B A A B E A D B B D A E

Page 14: Approximate On-line Palindrome Recognition, and Applications

On-line InputSuppose that we get the input a byte at

a time:

For the palindrome problem:A C D AC AB BA AE B B EADD D

Page 15: Approximate On-line Palindrome Recognition, and Applications

On-line InputSuppose that we get the input a byte at

a time:

For the reversal problem:AC

CA

BA

AB

EA

BD

AE

DD

DB

Page 16: Approximate On-line Palindrome Recognition, and Applications

Main Idea – Palindrome Fingerprint

s0,s1,s2,…sm-1

ΦR(S)=r-1s0+ r-2s1+… r-msm-1 mod (p)

Φ(S)=r1s0+ r2s1+… rmsm-1 mod (p)

The Rabin Karp

Fingerprint

If rm+1ΦR(S) = Φ(S) => S is a palindrome. w.h.p.

The Reversal Fingerprint

Page 17: Approximate On-line Palindrome Recognition, and Applications

Palindrome FingerprintIf rm+1ΦR(S) = Φ(S) => S is a palindrome.

Example: S = A B C B Ar6ΦR(S)=

r6 (1/r A + 1/r2 B + 1/r3 C + 1/r4 B + 1/r5 A) =

r5 A + r4 B + r3 C + r2 B + r A = Φ(S)

ΦR(S)=r-1s0+ r-2s1+… r-msm-1 mod (p)Φ(S)=r1s0+ r2s1+… rmsm-1 mod (p)

Page 18: Approximate On-line Palindrome Recognition, and Applications

Simple Online Algorithm for Finding a Palindrome in a Text

t1,t2,t3, … ti,ti+1,ti+2 ,…ti+m, ti+m+1 , … tn

ΦR=r-1ti+ r-2ti+1+… r-mti+m mod (p)

Φ=r1ti+ r2ti+1+… rmti+m mod (p)

If not, then for the next position:

If rm+1ΦR =Φ => there is a palindrome starting in the i-th position.

Φ= Φ + rm+1ti+m+1 mod (p)

ΦR=ΦR + r-(m+1)ti+m+1 mod (p)

Note: This algorithm finds online whether the prefix of a text is a permutation. For finding online whether the text is a concatenation of permutations, assume even-length permutations, otherwise, every text is a concatenation of length-1 permutations.

Page 19: Approximate On-line Palindrome Recognition, and Applications

Palindrome with mismatches

Start with 1 mismatch case.

Page 20: Approximate On-line Palindrome Recognition, and Applications

1-Mismatchs0,s1,s2, … sm-1

l

ii mq

1

S=

Choose l prime numbers q1,…,ql < m such that

Page 21: Approximate On-line Palindrome Recognition, and Applications

1-Mismatchs0,s1,s2, … sm-1

s0,s2,s4 …

s1s3,s5 …

s0,s3,s6 …

s1,s4,s7 …

s2,s5,s8 …

mod 2

mod 3

S=

S2,0=S2,1=

S3,0=S3,1=S3,2=

For each qi construct qi subsequences of S as follows: subsequence Sqi,j

is all elements of S whose index is j mod qi.

Examples: q1=2, q2=3

Page 22: Approximate On-line Palindrome Recognition, and Applications

Examples0,s1,s2, s3,s4,s5

s0,s2,s4

s1s3,s5

s0,s3

s1,s4

s2,s5

mod 2

mod 3

S=

S2,0=

S2,1=

S3,0=

S3,1=

S3,2=

Page 23: Approximate On-line Palindrome Recognition, and Applications

1-Mismatch• We need to compare:

• We prove that in the partitions strings:

s0 , s1, s2, … sm-2 ,sm-1

sm-1, sm-2, sm-3 … s1 , s0

Sq,j= SRq,(m-1-j)mod q

Page 24: Approximate On-line Palindrome Recognition, and Applications

Examples0,s1,s2,s3,s4,s5

s0,s2,s4

s0,s3

s1,s4

S=

s5,s4,s3,s2,s1,s0SR=

s5s3,s1

s5,s2

s4,s1

s0,s2,s4

s1s3,s5

s0,s3

s1,s4

s2,s5

S2,0=

S2,1=

S3,0=

S3,1=

S3,2=

S2,0=

SR2,1=

S3,0=

SR3,2=

S3,1=

SR3,1=

Page 25: Approximate On-line Palindrome Recognition, and Applications

Exact Matching

Lemma: S=SR Sq,j = SRq,(m-1-j) mod q

for all q and all 0 ≤ j ≤ q.

Page 26: Approximate On-line Palindrome Recognition, and Applications

1-MismatchLemma:

There is exactly one mismatch

There is exactly one subpattern in each group that does not match.

C.R.T

Page 27: Approximate On-line Palindrome Recognition, and Applications

Chinese Remainder Theorem

Let n and m two positive integers.

In our case: if two different indices, i and j, have an error, and only one subsequence is erroneous, since the product of all q’s > m, it means that i=j.

nbna modmod mbma modmod

nmbnma modmod

Page 28: Approximate On-line Palindrome Recognition, and Applications

ComplexityThere exists a constant c such that, for

any x<m, there are at least x/log m prime numbers between x and cx.

Therefore, choose prime numbers between log m and c log m.

mm

logloglog

Page 29: Approximate On-line Palindrome Recognition, and Applications

Complexity• For each qi we compute 2qi different

fingerprints:• Overall space:

• Each character participates in exactly two fingerprints (the regular and the reverse).

• Overall time:

mmOq

l

oii loglog

log22

mmOl

logloglog2

Page 30: Approximate On-line Palindrome Recognition, and Applications

OnlineAll fingerprint calculations can be done

online

We know the m at every input character, to compute the comparisons.

Conclude: Our algorithm is online.

Page 31: Approximate On-line Palindrome Recognition, and Applications

k-MismatchesUse Group testing…

Page 32: Approximate On-line Palindrome Recognition, and Applications

k-MismatchesGroup Testing• Given n items with some positive

ones, identify all positive ones by a small number of tests.

• Each test is on a subset of items.• Test outcome is positive iff there is a

positive item in the subset.

Page 33: Approximate On-line Palindrome Recognition, and Applications

k-Mismatch

• Group: partition of the text.

• Test: distinguish between: (using the 1-mismatch algorithm)– match– 1-mismatch– more then 1-mismatch

Page 34: Approximate On-line Palindrome Recognition, and Applications

k-Mismatchess0,s1,s2, … sm-1

s0,s2,s4 …

s1s3,s5 …

s0,s3,s6 …

s1,s4,s7 …

s2,s5,s8 …

mod 2

mod 3

l

i

kil mqqqqq

1321 s.t ...

S=

S2,0=S2,1=

S3,0=S3,1=S3,2=

Similar to the 1-mismatch algorithm just with more prime numbers…

Each Sq,j is a group in our group testing

Page 35: Approximate On-line Palindrome Recognition, and Applications

Our tests

• We define The reversal pair of Sq,j to be SR

q,(m-1-j)mod q

• Each partition is “tested against” its reversal pair.

Page 36: Approximate On-line Palindrome Recognition, and Applications

Correctnesss0,s1,s2, … sj …. sm-1

For any group of k character i1,i2,..ikThere exists a partition where sj appears alone

i2

i5 i7

i9

i

C.R.T

Page 37: Approximate On-line Palindrome Recognition, and Applications

Correctnesss0,s1,s2, … sj …. sm-1

If sj invokes a mismatch we will catch it.

i2

i5 i7

i9

i

Page 38: Approximate On-line Palindrome Recognition, and Applications

Complexity• Overall space:

• Overall time:

m

mkOloglog

log2

2

mmkO

logloglog2

42

Page 39: Approximate On-line Palindrome Recognition, and Applications

Approximate Reversal Distance

Using the palindrome up to k-mismatches algorithm, can be solved in

time, and

space.

nkknO log2

2knO

Page 40: Approximate On-line Palindrome Recognition, and Applications

спасибо