Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono*...

Preview:

Citation preview

Approximate Point Set Pattern Matchingon Sequences and Planes

Approximate Point Set Pattern Matchingon Sequences and Planes

Tomoaki Suga,Shinichi Shimozono*

Kyushu Inst. of Tech.Fukuoka, Japan

Tomoaki Suga,Shinichi Shimozono*

Kyushu Inst. of Tech.Fukuoka, Japan

TEXT

Point Set Pattern Matching

Text: A set of points in, ex., a plane

Pattern: A small set of points

Task: Find an occurrence of the pattern as a subset

PATTERN

Approximate Point Set Matching in Practice: Example

Analysis of 2D electrophoresis imagesA set of spots on gel media plane

Searching digital music score by melodyRinger melody, Internet contents, Online “Kara-Oke”

Literature

Exact matching in d-dimensionGeometric algorithm by P. J. de. Rezende & D. T. Lee, '95

Transfer, Scaling, and Rotation in O(nmd)

Allowing local distortionsHuristic and Hardness by Akutsu et al., '99…NP-hard even in 1D matching

Approximate matching of point sequencesNo-skips, O(nm) time by V. Makinen '01

Allowing substitution in O(nm3) time

Extension to 2-dimensional matching is NP-hard

Our Results

Approximate point set pattern matching in 1DPattern matches as a subset: Extends Makinen et al.

Simple fast algorithm dealing with O(nm2) taskBy reasonable assumption on sequences in practice

Algorithm guarantees O(nm) timeLinear with text-size by average-constant time min. query

Four-Russian Speed-upObservation connected to string matching

2D approximate point set pattern matchingWith polynomial-time algorithm

1D Matching As a Target

As a basis of practical problemsAxes of 2D electrophoresis images are independent

Points in higher dimension but having the primary axis (sort order) … ex. 3D structure of proteins

Musical score searchPitch error (tone deafness) is usually fatal

Exact matching in Rhythm/Timing is impractical, but indispensable to distinguish melodies

Point Set Matching in 1D

Text and Pattern: Strictly increasing sequences of Integers

An Occurrence of the Pattern: A Subsequence of the Text

( )

( )

1

1

, , ,

, ,

m

n

T t t

P p p

=

=

K

K

( ) ( )( )1 , ,l l nT t t¢= K

Edit Distance for Point Set Approximate Matching

Distance between two same size sequences:

( ) ( )1 12

,n

i i i ii

d P Q p p q q- -=

= - - -å

Q

P

Approximate Matching and Recurrence

D(i,j) = Distance between First i Points of Pattern and best Occurrence of it in Text ending at j

( ) ( ) ( ){ }11

, min 1, i i j ki k j

D i j D i k p p t t-- £ <

= - + - - -

Distance between one-small prefix-sequences

Difference of the last two distances

D(n,m) can be obtained by Tabular Computation … in O(nm2) time

“Finite Resolution” Assumption on a Class of Sequences

Ratio of distances between two contiguous points is limited

Spots observed as stains on small gel media plane450 ticks per second in typical MIDI sequences

Modified algorithm runs in O(nm) time if sequences have finite resolution

The 3rd iteration can be finished in constant time…

Pattern

Text

A Row can be Divided into “Positive” Part & “Negative” Part

Values in “Negative” part always decrease“Ex-Minimum” can only be a candidate

Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time

( )1 0i i j kp p t t-- - - <

1i

i

-j

large ( ) small 0j kt t¬ - ® ³

Lk¬ ®

Guaranteed O(nm)-time Algorithm

Using “deque” simulating the right-most path of the Cartesian Tree [Gabow, et al., 1985]

Maintains to-be-minimum indices in “Positive” part

Min is available in amortized constant time

Constant time in average for one iteration … O(nm) time

Remove if turned to negative

…k¬ ®

Min.

jPop all larger onesPush the latest index

Computational Results onReal/Synthesized* MIDI Sequences

Simple algorithm expecting “Finite Resolution” is faster than O(nm) time algorithm

Pattern Size = 11, Time (sec.) for filling-up table

Text Size Naïve DP Fin. Res. Cartesian

3086 1.12 0.01 0.01

*18328 197 0.03 0.05

*37741 883 0.05 0.09

*386801 --- 0.58 0.94

Solaris 9 x86/Intel Pentium 4 800MHz

Four-Russian Speed-up for Point Sequences with Finite Resolution

Idea from Arlazarov et al.: Filling tabular cells by pre-computed values

O(nm/log n + n log n) time with unit-cost RAM model

As we can suppose, finite resolution assumption makes point sequences being like strings

Approximate Point Set Pattern Matching on the Plane: Hardness Results

Akutsu et al. (’95), allowing local distortionsNP-hard, even in 1D matching

V. Makinen & E. Ukkonen ('01), an extension of 1DNP-hard; deciding the order of points in matching is hard

Q. Is there any non-trivial 2D approximate point set matching computable in polynomia-time?

Extending 1D Definition to Approximate Matching on the Plane

Regard a set as sequences with two orders

Divide recursively by axis-parallel lines

P Q

Recurrence for Edit Distance

Divide P and Q into two arbitrary parts, by either a horizontal or a vertical lines

[ ] [ ] ( )

[ ] [ ] ( )( )

[ ] [ ]( ) [ ]( ) [ ]( ) [ ]( ) [ ]( )( )[ ] [ ]( )

[ ]( ) [ ]( ) [ ]( ) [ ]( )( ),, , ,

,, , ,

if , , 1 then , ; , 0,

if , , then , ; , , and

, ; ,

, ; , ,min

, ; ,

RR R R

T T T T

k li j i j k l

k li j i j k l

P i j Q k l d i j k l

P i j Q k l d i j k l

d i j k l

d i j k l p p q q

d i j k l p p q q

- -

--

- -

- -

= = =

¹ = ¥

=

ì üï ï+ - - -ï ïï ïï ïí ýï ïï ï+ - - -ï ïï ïî þ

How Pattern Matching Proceeds

x Points of a pattern should be aligned on o points of a text, by cutting and moving the bounding box

Polynomia-time Algorithm for 2D Approximate Point Set Matching

Finds the best partition/direction by DP-like recursion

Results are stored in cache for quadruples [I, j; k, l]… O(n2 m2) space

O(n2m4) time with pattern size n and text size m

Remarks & Future Works

Consider scaling in 1DTempo must be considered in musical sequence search

Looking for more applications1D approximate matching to secondary structure search of proteins

TEXT

Point Set Pattern Matching

Text: A set of points in, e.g., a plane

Pattern: A small set of points

Task: Find an occurrence of the pattern as a subset

PATTERN

TEXT

Point Set Pattern Matching

Text: A set of points in, e.g., a plane

Pattern: A small set of points

Task: Find an occurrence of the pattern as a subset

PATTERN

Approximate Point Set Matching in Practice: Example

Analysis of 2D electrophoresis imagesA set of spots on gel media plane

Searching digital music score by melodyRinger melody, Internet contents, Online “Kara-Oke”

A Row can be Divided into “Positive” Part & “Negative” Part

Absolute values in “Negative” part always increase“Ex-Minimum” can only be a candidate

Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time

( )1 0i i j kp p t t-- - - <

1i

i

-j

large ( ) small 0j kt t¬ - ® ³

Lk¬ ®

A Row can be Divided into “Positive” Part & “Negative” Part

Absolute values in “Negative” part always increase“Ex-Minimum” can only be a candidate

Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time

( )1 0i i j kp p t t-- - - <

1i

i

-j

large ( ) small 0j kt t¬ - ® ³

Lk¬ ®

Extending 1D Definition to Approximate Matching on the Plane

Regard a set as sequences with two orders

Divide recursively by axis-parallel lines

P Q

Extending 1D Definition to Approximate Matching on the Plane

Regard a set as sequences with two orders

Divide recursively by axis-parallel lines

P Q

How Pattern Matching Proceeds

x Points of a pattern should be aligned on o points of a text, by cutting and moving the bounding box

Recommended