Upload
zanta
View
47
Download
0
Embed Size (px)
DESCRIPTION
Approximate On-line Palindrome Recognition, and Applications. Amihood Amir Benny Porat. Moskva River. Confluence of 4 Streams. Approximate Matching. Palindrome Recognition. CPM 2014. Online Algorithms. Interchange Matching. Palindrome Recognition. - PowerPoint PPT Presentation
Citation preview
Approximate On-line Palindrome Recognition,
and Applications
Amihood AmirBenny Porat
Moskva River
Confluence of 4 Streams
Palindrome Recognition
Approximate Matching
Interchange Matching Online Algorithms
CPM 2014
Palindrome Recognition- Voz'mi-ka slovo ropot, - govoril Cincinnatu ego shurin, ostriak, -- I prochti obratno. A? Smeshno poluchaetsia? Vladimir Nabokov, Invitation to a Beheading (1)
"Take the word ropot [murmur]," Cincinnatus' brother-in-law, the wit, was saying to him, "and read it backwards. Eh? Comes out funny, doesn't it?" [--› topor: the axe]
A palindrome is a string that is the same whether read from right to left or from left to right: Examples: доход A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama!
Palindrome ExampleIbn Ezra: Medieval Jewish philosopher, poet, Biblical commentator, and mathematician.
Was asked: " אבי אל חי שמך למה מלך משיח לא"יבא [ My Father, the Living God, why does the king messiah not arrive?]
His response: דעו מאביכם כי לא בוש אבוש, שוב אשוב אליכם כי בא ""מועד[ Know you from your Father that I will not be delayed. I will return to you when the time will come ]
Palindromes in Computer Science
Great programming exercise in CS 101.
Example of a problem that can be solved by a RAM in linear time, but not by a 1-tape Turing machine.
(Can be done in linear time by a 2-tape TM)
Palindrome ConcatenationWe may be interested in finding out whether a string is a concatenation of palindromes of length > 1.
Example: ABCCBABBCCBCAACB
Why would we be interested in such a funny problem? – we’ll soon seeExercise: Do this in linear time…
ABCCBABBCCBCAACB
Stream 2 - ApproximationsAs in exact matching, there may be
errors. Find the minimum number of errors that, if fixed, will give a string that is a concatenation of palindromes of length > 1
Example: ABCCBCBBCCBCABCB
For Hamming distance:A-Porat [ISAAC 13]: Algorithm of time O(n2)
ABCCBABBCCBCAACB
Stream 3 - ReversalsWhy is this funny problem interesting?
Sorting by reversals: In the evolutionary process a substring may “detach” and “reconnect” in reverse:
ABCABCDAABCBADCBAADCB
ABCABCDAABCBAD
Sorting by ReversalsWhat is the minimum number of
reversals that, when applied to string A, result in string B?
History: Introduced: Bafna & Pevzner [95]NP-hard: Carpara [97]Approximations: Christie [98]
Berman, Hannenhalli, Karpinski [02]
Hartman [03]
Sorting by Reversals – Polynomial time Relaxations
Signed reversals: Hannenhalli & Pevzner [99]
Kaplan, Shamir, Tarjan [00]
Tannier & Sagot [04]. . .
Disjointness: Swap Matching Muthu [96]Two constraints:
1. The length of the reversed substring is limited to 2.
2. All swaps are disjoint.
• Reversal Distance (RD):– The RD between s1 and s2 is the
minimum number k, such that there exist s2’ , where HAM(s1,s2’) =k, and s1 reversal match s2.
A B D E A B C D A
E C B A B A A D AS1:S2:
RD(S1,S2) = 2
Pattern Matching with Disjoint Reversals
Interleave Strings:
A B D E A B C D A
E D B A B A A D CS1:S2:
Connection between Reversal Matching and Palindrome
Matching
A C D D C A B A A B E A D B B D A E
On-line InputSuppose that we get the input a byte at
a time:
For the palindrome problem:A C D AC AB BA AE B B EADD D
On-line InputSuppose that we get the input a byte at
a time:
For the reversal problem:AC
CA
BA
AB
EA
BD
AE
DD
DB
Main Idea – Palindrome Fingerprint
s0,s1,s2,…sm-1
ΦR(S)=r-1s0+ r-2s1+… r-msm-1 mod (p)
Φ(S)=r1s0+ r2s1+… rmsm-1 mod (p)
The Rabin Karp
Fingerprint
If rm+1ΦR(S) = Φ(S) => S is a palindrome. w.h.p.
The Reversal Fingerprint
Palindrome FingerprintIf rm+1ΦR(S) = Φ(S) => S is a palindrome.
Example: S = A B C B Ar6ΦR(S)=
r6 (1/r A + 1/r2 B + 1/r3 C + 1/r4 B + 1/r5 A) =
r5 A + r4 B + r3 C + r2 B + r A = Φ(S)
ΦR(S)=r-1s0+ r-2s1+… r-msm-1 mod (p)Φ(S)=r1s0+ r2s1+… rmsm-1 mod (p)
Simple Online Algorithm for Finding a Palindrome in a Text
t1,t2,t3, … ti,ti+1,ti+2 ,…ti+m, ti+m+1 , … tn
ΦR=r-1ti+ r-2ti+1+… r-mti+m mod (p)
Φ=r1ti+ r2ti+1+… rmti+m mod (p)
If not, then for the next position:
If rm+1ΦR =Φ => there is a palindrome starting in the i-th position.
Φ= Φ + rm+1ti+m+1 mod (p)
ΦR=ΦR + r-(m+1)ti+m+1 mod (p)
Note: This algorithm finds online whether the prefix of a text is a permutation. For finding online whether the text is a concatenation of permutations, assume even-length permutations, otherwise, every text is a concatenation of length-1 permutations.
Palindrome with mismatches
Start with 1 mismatch case.
1-Mismatchs0,s1,s2, … sm-1
l
ii mq
1
S=
Choose l prime numbers q1,…,ql < m such that
1-Mismatchs0,s1,s2, … sm-1
s0,s2,s4 …
s1s3,s5 …
s0,s3,s6 …
s1,s4,s7 …
s2,s5,s8 …
mod 2
mod 3
S=
S2,0=S2,1=
S3,0=S3,1=S3,2=
For each qi construct qi subsequences of S as follows: subsequence Sqi,j
is all elements of S whose index is j mod qi.
Examples: q1=2, q2=3
Examples0,s1,s2, s3,s4,s5
s0,s2,s4
s1s3,s5
s0,s3
s1,s4
s2,s5
mod 2
mod 3
S=
S2,0=
S2,1=
S3,0=
S3,1=
S3,2=
1-Mismatch• We need to compare:
• We prove that in the partitions strings:
s0 , s1, s2, … sm-2 ,sm-1
sm-1, sm-2, sm-3 … s1 , s0
Sq,j= SRq,(m-1-j)mod q
Examples0,s1,s2,s3,s4,s5
s0,s2,s4
s0,s3
s1,s4
S=
s5,s4,s3,s2,s1,s0SR=
s5s3,s1
s5,s2
s4,s1
s0,s2,s4
s1s3,s5
s0,s3
s1,s4
s2,s5
S2,0=
S2,1=
S3,0=
S3,1=
S3,2=
S2,0=
SR2,1=
S3,0=
SR3,2=
S3,1=
SR3,1=
Exact Matching
Lemma: S=SR Sq,j = SRq,(m-1-j) mod q
for all q and all 0 ≤ j ≤ q.
1-MismatchLemma:
There is exactly one mismatch
There is exactly one subpattern in each group that does not match.
C.R.T
Chinese Remainder Theorem
Let n and m two positive integers.
In our case: if two different indices, i and j, have an error, and only one subsequence is erroneous, since the product of all q’s > m, it means that i=j.
nbna modmod mbma modmod
nmbnma modmod
ComplexityThere exists a constant c such that, for
any x<m, there are at least x/log m prime numbers between x and cx.
Therefore, choose prime numbers between log m and c log m.
mm
logloglog
Complexity• For each qi we compute 2qi different
fingerprints:• Overall space:
• Each character participates in exactly two fingerprints (the regular and the reverse).
• Overall time:
mmOq
l
oii loglog
log22
mmOl
logloglog2
OnlineAll fingerprint calculations can be done
online
We know the m at every input character, to compute the comparisons.
Conclude: Our algorithm is online.
k-MismatchesUse Group testing…
k-MismatchesGroup Testing• Given n items with some positive
ones, identify all positive ones by a small number of tests.
• Each test is on a subset of items.• Test outcome is positive iff there is a
positive item in the subset.
k-Mismatch
• Group: partition of the text.
• Test: distinguish between: (using the 1-mismatch algorithm)– match– 1-mismatch– more then 1-mismatch
k-Mismatchess0,s1,s2, … sm-1
s0,s2,s4 …
s1s3,s5 …
s0,s3,s6 …
s1,s4,s7 …
s2,s5,s8 …
mod 2
mod 3
l
i
kil mqqqqq
1321 s.t ...
S=
S2,0=S2,1=
S3,0=S3,1=S3,2=
Similar to the 1-mismatch algorithm just with more prime numbers…
Each Sq,j is a group in our group testing
Our tests
• We define The reversal pair of Sq,j to be SR
q,(m-1-j)mod q
• Each partition is “tested against” its reversal pair.
Correctnesss0,s1,s2, … sj …. sm-1
For any group of k character i1,i2,..ikThere exists a partition where sj appears alone
i2
i5 i7
i9
i
C.R.T
Correctnesss0,s1,s2, … sj …. sm-1
If sj invokes a mismatch we will catch it.
i2
i5 i7
i9
i
Complexity• Overall space:
• Overall time:
m
mkOloglog
log2
2
mmkO
logloglog2
42
Approximate Reversal Distance
Using the palindrome up to k-mismatches algorithm, can be solved in
time, and
space.
nkknO log2
2knO
спасибо