21
Swaps + Mismatches Swaps + Mismatches Based on Estrella Based on Estrella Eizenberg Eizenberg M.Sc. Thesis M.Sc. Thesis Supervised by Supervised by Ely Porat Ely Porat

Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Embed Size (px)

Citation preview

Page 1: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Swaps + MismatchesSwaps + Mismatches

Based on Estrella EizenbergBased on Estrella Eizenberg

M.Sc. ThesisM.Sc. Thesis

Supervised by Supervised by Ely PoratEly Porat

Page 2: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Swaps + MismatchesSwaps + Mismatches

A paper on this subject by A paper on this subject by

Amihood Amir, Estrella Eizenberg, Ohad Lipsky Amihood Amir, Estrella Eizenberg, Ohad Lipsky and Ely Poratand Ely Porat

Was submitted to ESA 2004Was submitted to ESA 2004

Page 3: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Problem definitionProblem definition

T: a d b d a c b d a b c a b

d a b b a a b c

Mismatches:

Abrahamson 87

K-mismatchesLandau Vishkin 86Amir Lewenstein Porat 00

Page 4: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Problem definitionProblem definition

T: a d b d a c b d a b c a b

d c a b d b a c

Swaps:

Amir Aumann Landau M.Lewenstein N.Lewenstein 87

Cole Hariharan 00

Amir Cole Hariharan Lewenstein Porat 2001

Amir Lewenstein Porat 2000

Page 5: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Problem definitionProblem definition

T: a d b d a c b d a b c a b

d c a b b b a c

Minimum distance:

Counting all as mismatches: 5 err

Minimum distance: 3 err

Page 6: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Starting with simpler problemStarting with simpler problem

={0,1}

T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1

1 0 0 1 0 1 0

We wish to count only the mismatches

(we will leave the swaps for later) we call them non-swap-mismatches (NSM)

Page 7: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Starting with simpler problemStarting with simpler problem

={0,1}

T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1

1 0 0 1 0 1 0NSM[6]=2

Mismatches[6]=4

Minimum-distance[6]=(Mismatches[6]+NSM[6])/2

3-err

O(nlogm)

????

O(????+nlogm)

Page 8: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Starting with simpler problemStarting with simpler problem

T: 0 1 0 1 0 1 1 0 1 1 0 1 0 0 1

T1: 0 1 0 1 0 1 * * * 1 0 1 0 * *

T2: * * * * * * 1 0 1 * * * * 0 1

Page 9: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Starting with simpler problemStarting with simpler problem

P2: 1 0* * * * *

P1: * * 0 1 0 1 0

We do the same for the pattern

We will give solution only for the odd places

(NSM[i] where i is odd)

P: 1 0 0 1 0 1 0

Page 10: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Starting with simpler problemStarting with simpler problem

P2: 1 0* * * * *

P1: * * 0 1 0 1 0

T1: 0 1 0 1 0 1 * * * 1 0 1 0 * *

T1 comparing with P1 doesn’t give any err neither swap nor mismatch (the same is for T2 against P2)

Without loss of generality we look only on T1 against P2

Page 11: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Starting with simpler problemStarting with simpler problem

P2: 1 0* * * * * T1: 0 1 0 1 0 1 * * * 1 0 1 0 * *

P2: 1 0* * * * * 1

0

Even overlap Odd overlap

We need to count how many odd overlaps we have

One NSM err

Page 12: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Simpler problemSimpler problem

We separate the sequence to 4 categories:

1. Starting at odd position and ending at odd position (called OO)

2. Starting at odd position and ending at even position (called OE)

3. Starting at even position and ending at odd position (called EO)

4. Starting at even position and ending at even position (called EE)

Page 13: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Simpler problemSimpler problem

O

O

T

P

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1-1 1 -1 1 -1 1 -1

The overlap muststart with 1

1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1

1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1 P

O(nlogm) – one convolution

Page 14: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Simpler problemSimpler problem

O

O

T

P

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

-1 1 -1 1 -1 1 -1 1

The overlap muststart with 1

-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1

1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1-1 1 -1 1 -1 1 -1 1 P

O(nlogm) – one convolution

O O

O

Page 15: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Simpler problemSimpler problem

We deal with: O? against O? We deal with: O? against O? and with ?O against ?Oand with ?O against ?O

The same method work for E? against E?The same method work for E? against E?and ?E against ?Eand ?E against ?E

We left to deal with: We left to deal with: – OE against EOOE against EO– EO against OEEO against OE– OO against EEOO against EE– EE against OOEE against OO

Page 16: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

OO against EEOO against EE

O

E

T

P

P

E E

E

O

EEE

E Even overlap

Odd overlap

We need to recognized when the segment contain one other

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1-1 1-1 1-1 1-1 1 1-1 1-1 1-1 1-1 11-1 1-1 1-1 1-1 1

1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1 1-1

0 1

-1

Page 17: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Simpler problemSimpler problem

We can easily know if we are contained or We can easily know if we are contained or we contain another segments if we know the we contain another segments if we know the segment size.segment size.

Smaller segments can’t contain larger Smaller segments can’t contain larger segmentssegments

Page 18: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Simpler problemSimpler problem

Then for each segment we divide the Then for each segment we divide the computation against bigger segmentcomputation against bigger segmentand against smaller segmentsand against smaller segments

We do it by computing the answer each time We do it by computing the answer each time to all segments of size ‘x’to all segments of size ‘x’

Page 19: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Simpler problemSimpler problem

The number of different sizes is at most The number of different sizes is at most square root of msquare root of m

Page 20: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

What we haveWhat we have

We have an algorithm for the Simpler We have an algorithm for the Simpler problem that run in time O(n\sqrt{m}\logm)problem that run in time O(n\sqrt{m}\logm)

We have an algorithm for binary alphabet We have an algorithm for binary alphabet that run in O(n\sqrt{m}\logm)that run in O(n\sqrt{m}\logm)

With several more techniques we develop With several more techniques we develop an algorithm solving the original problem in an algorithm solving the original problem in O(n\sqrt{m}\logm)O(n\sqrt{m}\logm)

Page 21: Swaps + Mismatches Based on Estrella Eizenberg M.Sc. Thesis Supervised by Ely Porat

Open problemOpen problem

It is easy to see that our algorithm is at most It is easy to see that our algorithm is at most factor of O(\sqrt{\logm}) from the optimalfactor of O(\sqrt{\logm}) from the optimalalgorithm (due to redaction to counting algorithm (due to redaction to counting mismatches)mismatches)

But one can try to improve the small But one can try to improve the small alphabet casealphabet case