26
DICTIONARY MATCHING WITH ONE GAP Amihood Amir, Avivit Levy , Ely Porat and B. Riva Shalom 1 C P M 2 0 1 4

Dictionary Matching with One Gap

  • Upload
    tirza

  • View
    67

  • Download
    0

Embed Size (px)

DESCRIPTION

Dictionary Matching with One Gap. Amihood Amir, Avivit Levy, Ely Porat and B. Riva Shalom. CPM 2014 - Moscow. ! MIND THE GAP . Outline. The DMG( Dictionary Matching with one Gap ) Problem Motivation Previous Work Bidirectional Suffix Trees Solution Lookup Table addition - PowerPoint PPT Presentation

Citation preview

Page 1: Dictionary Matching  with One Gap

1

DICTIONARY MATCHING

WITH ONE GAP

Amihood Amir, Avivit Levy ,Ely Porat and B. Riva

Shalom CPM 2014

Page 2: Dictionary Matching  with One Gap

2

CPM 2014

CPM 2014 - MOSCOW

Page 3: Dictionary Matching  with One Gap

3

CPM 2014

!MIND THE GAP

Page 4: Dictionary Matching  with One Gap

4

OUTLINEThe DMG(Dictionary Matching with one

Gap ) ProblemMotivationPrevious WorkBidirectional Suffix Trees SolutionLookup Table additionOpen Problems

CPM 2014

Page 5: Dictionary Matching  with One Gap

5

THE DMG PROBLEMA gapped pattern is a pattern P of the form:

P1{1,1} P2{2,2}… Pk-1{k-1,k-1}Pk

Each Pj is over alphabet ,{j,j} is a sequence of at least j and at most j don’t cares = @.

Example: aba{3,6}cbb aba @@@cbb aba@@@@cbb aba@@@@@cbb aba@@@@@@cbb

CPM 2014

Page 6: Dictionary Matching  with One Gap

6

THE DMG PROBLEM The DMG problem is:Preprocess: A dictionary D of d gapped

patterns P1,…, Pd over alphabet .

Query: A text T of length n over alphabet .

Output: all locations in T where a dictionary gapped pattern ends.

We focus on DMG with a single gap.

CPM 2014

Page 7: Dictionary Matching  with One Gap

7

EXAMPLEDictionary: P1 = aba {3,6} cbb

P2 = ab {3,6} bbac

P3 = aa {3,6} ac

Query 1 2 3 4 5 6 7 8 9 10 11

text: a b a a b a c b b a c

P1,1 P1,2P2,1P2,2P3,1 P3,2

CPM 2014

First =1≤i≤d{ Pi,1 } Second=1≤i≤d{ Pi,2 }

Page 8: Dictionary Matching  with One Gap

8

MOTIVATIONComputational BiologyA renew interest due to cyber security. Network intrusion detection systems

perform protocol analysis, content searching and content matching to detect harmful software.

Malware may appear in several packets!

CPM 2014

Page 9: Dictionary Matching  with One Gap

9

PREVIOUS WORKGapped pattern matching problem

was studied for a few decades,eg. [Myers, JACM 1992],[Navaro&Raffinot, Algorithmica 2004],[Bille&Thorup, ICALP 2009] , [Bille&Thorup SODA 2010], [Morgante et al., JCB 2005], [Rahman et al., COCOON 2006], [Bille et al., TCS 2012]

DMG problem not studied enough ![Kucherov&Rosinovich,TCS 1997],[Zhang et al., IPL 2010]-no bounds on the length of the gap.

CPM 2014

Page 10: Dictionary Matching  with One Gap

10

BI-DIRECTIONAL SUFFIX TREES ALGORITHM

Gapped pattern: a b{3,6}b b a c

Query: a b a a b a c b b a c

CPM 2014

Page 11: Dictionary Matching  with One Gap

11

BI-DIRECTIONAL SUFFIX TREES ALGORITHMIdea: view as [Amir et al., JAL 2000]

Gapped patterns:P1= a b a{3,6}a b a c P2= a b a{3,6}b b a P3= a b{3,6}b a aQuery:

a b a a b a c b b a cUse suffix tree TS of Second

Use suffix tree TFR of

FirstR

gap

CPM 2014

Page 12: Dictionary Matching  with One Gap

12

BI-DIRECTIONAL SUFFIX TREES ALGORITHMFor each text location l

Insert tl tl +1…tn to TS (the node h)to find labels on the path to h.

For f= l --1 to l --1Insert tftf-1…t1 to TFR (the node g)to find labels on the path to g.

Output intersection (for end locations).

Finds Pi,2 starting at location l.

Finds Pi,1 ending at location f.

CPM 2014

Page 13: Dictionary Matching  with One Gap

13

BI-DIRECTIONAL SUFFIX TREES ALGORITHM - INTERSECTIONPatterns: {(1,4),(2,9),(3,7),…,(6,5),…}

TSTFR

Range:[1,9]

Range:[2,7]

CPM 2014

3

69

1

g

5

7

2

h

Page 14: Dictionary Matching  with One Gap

14

BI-DIRECTIONAL SUFFIX TREES ALGORITHM (CONTINUED)Intersection via range queries:

Range:[2,7]

Range: [1,9]

(1,4)

(3,7)

(6,5)

(8,8)

(2,9)

CPM 2014

Page 15: Dictionary Matching  with One Gap

15

TIME & SPACE Preprocessing Time:Dictionary segments suffix tree and reverse

suffix tree: O(|D|)Preprocessing grid for range queries:

O(d log d). [Chan et al., SoCG 2011]

Preprocessing Space:Dictionary segments suffix tree and reverse

suffix tree: O(|D|)Space for grid:

O(d log d). [Chan et al., SoCG 2011]

CPM 2014

Page 16: Dictionary Matching  with One Gap

16

TIME & SPACE Query Time:For each end text location, we try every gap

size: a factor of .The number of range queries is the number of

vertical paths in a given path: O(log2 min{d, log |D|}).A range query costs: O(log log d+occ).

[Chan et al., SoCG 2011]Total: O(n()log log d log2 min{d, log |D|}+occ).

CPM 2014

3 69

1

g

Page 17: Dictionary Matching  with One Gap

17

LOOKUP TABLE ALGORITHMIdea: Instead of using range queries in a

grid to compute the intersection, we use a pre-computed lookup table.

Enables intersection in O(occ) time.

Total query time becomes:O(n()+occ).

CPM 2014

Page 18: Dictionary Matching  with One Gap

18

LOOKUP TABLE ALGORITHMInter[g,h] = all i s.t. Pi,1

R appears on the path from the root of TFR till node g and Pi,2 appears on the path from the root of TS till node h.

CPM 2014

369

15

7

2

P1=(1,4), P2=(2,9), P3=(3,7), P4=(3,2), …,P6=(6,5), P7

=(9,6)Inter[ 3, 5 ]= {4}

g h

Page 19: Dictionary Matching  with One Gap

19

LOOKUP TABLE ALGORITHMInter[g,h] = all i s.t. Pi,1

R appears on the path from the root of TFR till node g and Pi,2 appears on the path from the root of TS till node h.

CPM 2014

369

15

7

2

P1=(1,4), P2=(2,9), P3=(3,7), P4=(3,2), …,P6=(6,5), P7 =(9, 6)Inter[ 3, 5 ]= {4}

Inter[ 3, 7 ]= {3,4}g

h

Page 20: Dictionary Matching  with One Gap

20

LOOKUP TABLE ALGORITHMInter[g,h] = all i s.t. Pi,1

R appears on the path from the root of TFR till node g and Pi,2 appears on the path from the root of TS till node h.

CPM 2014

369

15

7

2

P1=(1,4), P2=(2,9), P3=(3,7), P4=(3,2), …,P6=(6,5), P7

=(9,6)Inter[ 3, 5 ]= {4}Inter[ 3, 7 ]= {3,4}Inter[ 6, 7 ]= {3,4,6} g

h

Page 21: Dictionary Matching  with One Gap

21

LOOKUP TABLE ALGORITHMInter[g,h] = all i s.t. Pi,1

R appears on the path from the root of TFR till node g and Pi,2 appears on the path from the root of TS till node h.

CPM 2014

369

15

7

2

P1=(1,4), P2=(2,9), P3=(3,7), P4=(3,2), …,P6=(6,5), P7

=(9,6)Inter[ 3, 5 ]= {4}Inter[ 3, 7 ]= {3,4}Inter[ 6, 7 ]= {3,4,6} Inter[ 9, 7 ]= {3,4,6} g h

Page 22: Dictionary Matching  with One Gap

22

LOOKUP TABLE ALG.

CPM 2014

369

1

57

2

P1=(1,4), P2=(2,9), P3=(3,7), P4=(3,2),

…,P6=(6,5),P7 =(9,6)

Inter[3,5]= {4}Inter[3,7]= {3,4}Inter[6,7]= {3,4,7}

1

3:

1

9

6

.…2 5 6 7

2

:

--41--

--

6

3

--

4

7

Page 23: Dictionary Matching  with One Gap

23

LOOKUP TABLE ALGORITHMPreprocessing:Time: Table can be computed using DP

in time O(d2 ovr + |D|) where ovr is the number of subpatterns including other subpattern as a prefix or suffix.

Space: O(d 2 + |D|).

Query time: O(n()+occ).

CPM 2014

Page 24: Dictionary Matching  with One Gap

24

OUR RESULTS Preprocessing time: O(d log d + |D|).

Space: O(d log d + |D|).Query time: O(n()log log d log2(min{d, log |D|} )+occ).

Preprocessing time: O(d2 ovr + |D|).Space: O(d 2 + |D|).Query time: O(n()+occ).

Bi-directional suffix trees & range queries

Bi-directional suffix trees & Lookup table

CPM 2014

Page 25: Dictionary Matching  with One Gap

25

OPEN PROBLEMSGeneralizing to k gapsReducing the dependency on the size

Scalability to different gap bounds in the dictionary

Online algorithm

CPM 2014

Page 26: Dictionary Matching  with One Gap

26

THANK YOU!

CPM 2014