Upload
neal-scott
View
214
Download
0
Embed Size (px)
Citation preview
Swaps + MismatchesSwaps + Mismatches
Based on Estrella EizenbergBased on Estrella Eizenberg
M.Sc. ThesisM.Sc. Thesis
Supervised by Supervised by Ely PoratEly Porat
Swaps + MismatchesSwaps + Mismatches
A paper on this subject by A paper on this subject by
Amihood Amir, Estrella Eizenberg, Ohad Lipsky Amihood Amir, Estrella Eizenberg, Ohad Lipsky and Ely Poratand Ely Porat
Was submitted to ESA 2004Was submitted to ESA 2004
Problem definitionProblem definition
T: a d b d a c b d a b c a b
d a b b a a b c
Mismatches:
Abrahamson 87
K-mismatchesLandau Vishkin 86Amir Lewenstein Porat 00
Problem definitionProblem definition
T: a d b d a c b d a b c a b
d c a b d b a c
Swaps:
Amir Aumann Landau M.Lewenstein N.Lewenstein 87
Cole Hariharan 00
Amir Cole Hariharan Lewenstein Porat 2001
Amir Lewenstein Porat 2000
Problem definitionProblem definition
T: a d b d a c b d a b c a b
d c a b b b a c
Minimum distance:
Counting all as mismatches: 5 err
Minimum distance: 3 err
Starting with simpler problemStarting with simpler problem
={0,1}
T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1
1 0 0 1 0 1 0
We wish to count only the mismatches
(we will leave the swaps for later) we call them non-swap-mismatches (NSM)
Starting with simpler problemStarting with simpler problem
={0,1}
T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1
1 0 0 1 0 1 0NSM[6]=2
Mismatches[6]=4
Minimum-distance[6]=(Mismatches[6]+NSM[6])/2
3-err
O(nlogm)
????
O(????+nlogm)
Starting with simpler problemStarting with simpler problem
T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1
T1: 0 1 0 1 0 1 * * * 1 0 1 0 * *
T2: * * * * * * 1 0 1 * * * * 0 1
Starting with simpler problemStarting with simpler problem
P2: 1 0* * * * *
P1: * * 0 1 0 1 0
We do the same for the pattern
We will give solution only for the odd places
(NSM[i] where i is odd)
P: 1 0 0 1 0 1 0
Starting with simpler problemStarting with simpler problem
P2: 1 0* * * * *
P1: * * 0 1 0 1 0
T1: 0 1 0 1 0 1 * * * 1 0 1 0 * *
T1 comparing with P1 doesn’t give any err neither swap nor mismatch (the same is for T2 against P2)
Without loss of generality we look only on T1 against P2
Starting with simpler problemStarting with simpler problem
P2: 1 0* * * * * T1: 0 1 0 1 0 1 * * * 1 0 1 0 * *
P2: 1 0* * * * * 1
0
Even overlap Odd overlap
We need to count how many odd overlaps we have
One NSM err
Simpler problemSimpler problem
We separate the sequence to 4 categories:
1. Starting at odd position and ending at odd position (called OO)
2. Starting at odd position and ending at even position (called OE)
3. Starting at even position and ending at odd position (called EO)
4. Starting at even position and ending at even position (called EE)
Simpler problemSimpler problem
O
O
T
P
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1-1 1 -1 1 -1 1 -1
The overlap muststart with 1
1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1
1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1 P
O(nlogm) – one convolution
Simpler problemSimpler problem
O
O
T
P
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
-1 1 -1 1 -1 1 -1 1
The overlap muststart with 1
-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1
1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1 P
O(nlogm) – one convolution
O O
O
Simpler problemSimpler problem
We deal with: O? against O? We deal with: O? against O? and with ?O against ?Oand with ?O against ?O
The same method work for E? against E?The same method work for E? against E?and ?E against ?Eand ?E against ?E
We left to deal with: We left to deal with: – OE against EOOE against EO– EO against OEEO against OE– OO against EEOO against EE– EE against OOEE against OO
OO against EEOO against EE
O
E
T
P
P
E E
E
O
EEE
E Even overlap
Odd overlap
We need to recognized when the segment contain one other
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1-1 1-1 1-1 1-1 1 1-1 1-1 1-1 1-1 11-1 1-1 1-1 1-1 1
1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1
0 1
-1
Simpler problemSimpler problem
We can easily know if we are contained or We can easily know if we are contained or we contain another segments if we know the we contain another segments if we know the segment size.segment size.
Smaller segments can’t contain larger Smaller segments can’t contain larger segmentssegments
Simpler problemSimpler problem
Then for each segment we divide the Then for each segment we divide the computation against bigger segmentcomputation against bigger segmentand against smaller segmentsand against smaller segments
We do it by computing the answer each time We do it by computing the answer each time to all segments of size ‘x’to all segments of size ‘x’
Simpler problemSimpler problem
The number of different sizes is at most The number of different sizes is at most square root of msquare root of m
What we haveWhat we have
We have an algorithm for the Simpler We have an algorithm for the Simpler problem that run in time O(n\sqrt{m}\logm)problem that run in time O(n\sqrt{m}\logm)
We have an algorithm for binary alphabet We have an algorithm for binary alphabet that run in O(n\sqrt{m}\logm)that run in O(n\sqrt{m}\logm)
With several more techniques we develop With several more techniques we develop an algorithm solving the original problem in an algorithm solving the original problem in O(n\sqrt{m}\logm)O(n\sqrt{m}\logm)
Open problemOpen problem
It is easy to see that our algorithm is at most It is easy to see that our algorithm is at most factor of O(\sqrt{\logm}) from the optimalfactor of O(\sqrt{\logm}) from the optimalalgorithm (due to redaction to counting algorithm (due to redaction to counting mismatches)mismatches)
But one can try to improve the small But one can try to improve the small alphabet casealphabet case