View
216
Download
0
Category
Preview:
Citation preview
Parallel String Matching Algorithm(s) Using
Associative Processors
Parallel String Matching Algorithm(s) Using
Associative ProcessorsOriginal work by
Mary Esenwein and Dr. Johnnie Baker
Presented by Shannon Steinfadt
April 18, 2007
Original work by Mary Esenwein and Dr. Johnnie
Baker
Presented by Shannon SteinfadtApril 18, 2007
2
String Matching ProblemString Matching Problem
Aka. pattern matching or string searching
Useful in many applications such as text editing and information retrieval, DNA analysis, Homeland Security
Aka. pattern matching or string searching
Useful in many applications such as text editing and information retrieval, DNA analysis, Homeland Security
3
What are we doing?What are we doing?
Given a pattern and some text, find out if the pattern is IN the text
Is pattern AB in the text ABAA? If so, where?
Given a pattern and some text, find out if the pattern is IN the text
Is pattern AB in the text ABAA? If so, where? AB
ABAA
4
What’s the notation?What’s the notation?
P is a pattern string of length m T is a text string of length n,
usually n ≥ m
P is a pattern string of length m T is a text string of length n,
usually n ≥ m
5
Goal of String MatchingGoal of String Matching To find all occurrences of a pattern
string in the text string Locate all positions i in T such that
T[i+j-1] = P[j] for all j, 1 ≤ j ≤ m
To find all occurrences of a pattern string in the text string
Locate all positions i in T such that T[i+j-1] = P[j] for all j, 1 ≤ j ≤ m
Why use P[j]? How does it relate to T[i+j-1]?
6
Pattern VariationsPattern Variations An exact pattern A “Don’t Care” character (*) in
pattern Flexibility in matching * indicates character(s) of the text that
are irrelevant to the matching process
An exact pattern A “Don’t Care” character (*) in
pattern Flexibility in matching * indicates character(s) of the text that
are irrelevant to the matching process
7
General “Don’t Care” Character’s (*) Characteristics
General “Don’t Care” Character’s (*) Characteristics
Single character of text Multiple consecutive text characters No characters Combination of above threeExample:
Pattern AB*CD could match ABBCD, ABBBBBCD, or ABCD (* is null)
Single character of text Multiple consecutive text characters No characters Combination of above threeExample:
Pattern AB*CD could match ABBCD, ABBBBBCD, or ABCD (* is null)
8
String Matching using ASCString Matching using ASC
Three parallel algorithms using associative computing (using 1-D mesh) String matching for exact match String matching with fixed length “don’t
care” I.e., exactly 1 character
String matching with variable length “don’t care”
a “don’t care” can have any length or be null
Three parallel algorithms using associative computing (using 1-D mesh) String matching for exact match String matching with fixed length “don’t
care” I.e., exactly 1 character
String matching with variable length “don’t care”
a “don’t care” can have any length or be null
9
ASC Exact Match Algorithm ASC Exact Match Algorithm for (j = patt_length - 1; j >= 0; j--){
Responders are text[$] == patt_string[j]and counter[$] == patt_counter;
Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;
}
/* When pattern has been processed */Responders are counter[$] == patt_length;
Responders set match[$] = 1 in next cell;
for (j = patt_length - 1; j >= 0; j--){
Responders are text[$] == patt_string[j]and counter[$] == patt_counter;
Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;
}
/* When pattern has been processed */Responders are counter[$] == patt_length;
Responders set match[$] = 1 in next cell;
@ 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
A 0 0
Text[$] Match[$] Counter[$]
Pattern: BBA
Text:
ABBBABBBABA
m=pattern length
n=text length
j = pattern index
i = text indexPattern:
BBA
0
patt_counter
patt_length
3
11
@ 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 2
B 0 1
A 0 0
Text[$] Match[$] Counter[$]
Pattern: BBA
Text: ABBBABBBABA
m = pattern length
n = text length
j = pattern index
i = text index
Final State of Exact Match Algorithm
B
B
A
1
0
0
1
0
0
B
B
A
13
Algorithm for unit length "don't cares" using ASCAlgorithm for unit length "don't cares" using ASC
for (j = patt_length - 1; j >= 0; j--){
if (pattern[j] == '*')Responders are counter[$] == patt_counter;
else // pattern[j] is not the “don’t care” characterResponders are text[$] == pattern[j]
and counter[$] == patt_counter;
If no Responders are detected, exit;
Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;
}
/* When pattern has been processed */Responders are counter[$] == patt_length;
Responders set match[$] = 1 in next cell;
for (j = patt_length - 1; j >= 0; j--){
if (pattern[j] == '*')Responders are counter[$] == patt_counter;
else // pattern[j] is not the “don’t care” characterResponders are text[$] == pattern[j]
and counter[$] == patt_counter;
If no Responders are detected, exit;
Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;
}
/* When pattern has been processed */Responders are counter[$] == patt_length;
Responders set match[$] = 1 in next cell;
14
ASC Exact Match Algorithm (again)
ASC Exact Match Algorithm (again)
for (j = patt_length - 1; j >= 0; j--){
Responders are text[$] == patt_string[j]and counter[$] == patt_counter;
Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;
}
/* When pattern has been processed */Responders are counter[$] == patt_length;
Responders set match[$] = 1 in next cell;
for (j = patt_length - 1; j >= 0; j--){
Responders are text[$] == patt_string[j]and counter[$] == patt_counter;
Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;
}
/* When pattern has been processed */Responders are counter[$] == patt_length;
Responders set match[$] = 1 in next cell;
@ 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
A 0 0
Text[$] Match[$] Counter[$]
Pattern: BBA
Text:
ABBBABBBABA
m=pattern length
n=text length
j = pattern index
i = text indexPattern:
B*A
0
patt_counter
patt_length
3
16
@ 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 2
B 0 1
A 0 0
Text[$] Match[$] Counter[$]
Pattern: B*A
Text: ABBBABBBABA
m = pattern length
n = text length
j = pattern index
i = text index
Final State of Exact Match Algorithm
B
B
A
1
0
0
1
0
0
B
B
A
18
VLDC Algorithm (added)VLDC Algorithm (added)
Works on each “segment” of the pattern broken up by the * character AB*BB*A has three sections
Consecutive ** characters not necessary, not allowed
This VLDC algorithm unique Provides information to find all continuation
points of all matches following each “*”
Works on each “segment” of the pattern broken up by the * character AB*BB*A has three sections
Consecutive ** characters not necessary, not allowed
This VLDC algorithm unique Provides information to find all continuation
points of all matches following each “*”
19
VLDC ALGORITHM USING ASCVLDC ALGORITHM USING ASC
int patt_length = m;int maxcell = n + 2;/* Special handling for ‘*’ at end of pattern */if (pattern[m-1] == ‘*’){
Responders are cell index > 1;Responders set segment$[0] = 1;patt_counter = 1;k = 1; /* Reset initial segment index */
}while ((patt_length -= patt_counter) > 0 && maxcell > 0){
patt_counter = 0;for ( I = patt_length - 1; I>= 0 && pattern[I] != ‘*’; I--){
Responders are text$ == pattern[I] and counter$ == patt_counter and cell index < maxcell;
Responders add 1 to counter$ and store result in counter$ of preceding cell;
patt_counter++;}Responders are counter$ == patt_counter;
int patt_length = m;int maxcell = n + 2;/* Special handling for ‘*’ at end of pattern */if (pattern[m-1] == ‘*’){
Responders are cell index > 1;Responders set segment$[0] = 1;patt_counter = 1;k = 1; /* Reset initial segment index */
}while ((patt_length -= patt_counter) > 0 && maxcell > 0){
patt_counter = 0;for ( I = patt_length - 1; I>= 0 && pattern[I] != ‘*’; I--){
Responders are text$ == pattern[I] and counter$ == patt_counter and cell index < maxcell;
Responders add 1 to counter$ and store result in counter$ of preceding cell;
patt_counter++;}Responders are counter$ == patt_counter;
20
VLDC continuedVLDC continuedResponders set segment$[k] = patt_counter in next cell;
Responders are segment$[k] > 0;maxcell = maximum cell index value of Responders else if no Responders maxcell = 0;All cells become Responders and set counter$ = 0;patt_counter++; k++ }
/* When pattern has been processed */Responders are segment$[--k] > 0;Responders set match$ = 1;
/* Special handling for ‘*’ at start of pattern */if (pattern[0] == ‘*’){
Responders are cell index < maxcell and cell index > 1;Responders set match$ = 1;
}
Responders set segment$[k] = patt_counter in next cell;Responders are segment$[k] > 0;maxcell = maximum cell index value of Responders else if no Responders maxcell = 0;All cells become Responders and set counter$ = 0;patt_counter++; k++ }
/* When pattern has been processed */Responders are segment$[--k] > 0;Responders set match$ = 1;
/* Special handling for ‘*’ at start of pattern */if (pattern[0] == ‘*’){
Responders are cell index < maxcell and cell index > 1;Responders set match$ = 1;
}
Pattern: AB*BB*A
Text: ABBBABBBABA
After third pattern segment in VLDC Algorithm
@ 0 0 10 0 0 0 Y N
A 0 0 01 0 0 Y
B 0 0 0 0 0
B 0 0 0 0 0
B 0 0 10 0 0 0 Y N
A 0 0 01 0 0 Y
B 0 0 0 0 0
B 0 0 0 0 0
B 0 0 10 0 0 0 Y N
A 0 0 01 0 0 Y
B 0 0 10 0 0 0 Y N
A 0 0 01 0 0 Y
01 2
2
1
3
4
5
T$ M$ C$
6
1312
7
8
9
10
11
Maxcell
S0$ S1$ S2$
Patt_counter
12
Responder$
Pattern: AB*BB*A
Text: ABBBABBBABA
After second pattern segment in VLDC
Algorithm@ 0 0 0 0 0
A 0 0 1 2 0 1 0 0 Y
B 0 0 1 2 0 0 0 2 0 Y Y Y
B 0 0 1 0 0 0 2 0 Y Y N
B 0 0 0 0 0 Y N
A 0 0 1 2 0 1 0 0 Y
B 0 0 1 2 0 0 0 2 0 Y Y Y
B 0 0 1 0 0 0 2 0 Y Y N
B 0 0 0 0 0 Y N
A 0 0 10 1 0 0
B 0 0 0 0 0 Y N
A 0 0 1 0 0
012
0123
2
1
3
4
5
T$ M$ Counter$
6
1312 8
7
8
9
10
11
Maxcell
S0$ S1$ S2$
Patt_counter
12
Responder$
(Used to keep pattern
segments in order, I.e.
AB occurs before BB)
Pattern: AB*BB*A
Text: ABBBABBBABA
After first pattern segment in VLDC Algorithm
@ 0 0 2 0 0 0 0 Y
A 0 0 1 0 1 0 02 Y N
B 0 0 1 0 0 2 0 Y N
B 0 0 1 0 0 2 0 Y N
B 0 0 2 0 0 0 0 Y N Y
A 0 0 1 0 1 0 02 Y N
B 0 0 0 2 0 Y N
B 0 0 0 2 0
B 0 0 0 0 0
A 0 0 1 0 0
B 0 0 0 0 0
A 0 0 1 0 0
012
0123
0123
2
1
3
4
5
T$ M$ Counter$
6
1312 8 6
7
8
9
10
11
Maxcell
S0$ S1$ S2$
Patt_counter
12
Responder$
(Used to keep pattern
segments in order, I.e.
AB occurs before BB)
Pattern: AB*BB*A
Text: ABBBABBBABA
Final State in VLDC Algorithm
@ 0 0 0 0 0
A 1 0 1 0 2 Y
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 1 0 1 0 2 Y
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 0 0 1 0 0
B 0 0 0 0 0
A 0 0 1 0 0
012
0123
0123
2
1
3
4
5
T$ M$ Counter$
6
1312 8 6
7
8
9
10
11
Maxcell
S0$ S1$ S2$
Patt_counter
12
Responder$
(Used to keep pattern
segments in order, I.e.
AB occurs before BB)
25
Finding All Continuation Points
Finding All Continuation Points
Match starts where M$ = 1 Match to any pattern segment begins
where S$[x] == segment length i.e. where any S$[x] > 0
Continuation of match in S$[x-1] whose cell/PE index is >= (S$[x] + segment size) of S$[x]’s cell/PE index
Match starts where M$ = 1 Match to any pattern segment begins
where S$[x] == segment length i.e. where any S$[x] > 0
Continuation of match in S$[x-1] whose cell/PE index is >= (S$[x] + segment size) of S$[x]’s cell/PE index
Pattern: AB*BB*A
Text: ABBBABBBABA
Using the Final State in VLDC Algorithm
@ 0 0 0 0 0
A 1 0 1 0 2
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 1 0 1 0 2
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 0 0 1 0 0
B 0 0 0 0 0
A 0 0 1 0 0
2
1
3
4
5
T$ M$ C$
6
7
8
9
10
11
S0$ S1$ S2$
12
•Start with index 2, where there’s a match M$=1•Work from S2$ down and left, count down 2 values and move into S1$, count down 2 values and move to S0$
•That produces: 246 ABBBA•Any index >= 4 in S1[$] whose value is >0 will also produce a correct match
•2710 ABBBABBBA•2810 ABBBABBBA
Some of the additional matches are:2410 ABBBABBBA 2412 ABBBABBBABA2812 ABBBABBBABA6810 ABBBA6812 ABBBABA
27
Existing AlgorithmsExisting Algorithms Sequential Algorithms
Naïve algorithm: O(mn) Knuth, Morris, & Pratt, or Boyer-Moore: O(m+n)
Parallel Algorithms A PRAM exact string matching: O(n) On a reconfigurable mesh: O(1) on n(n-m+1) PEs On a SIMD hypercube (limited to {0,1}): O(lg n) on
n/lg n PEs On a neural network: O(1) on nm PEs ASC algorithms: O(m) time on O(n) PEs
Sequential Algorithms Naïve algorithm: O(mn) Knuth, Morris, & Pratt, or Boyer-Moore: O(m+n)
Parallel Algorithms A PRAM exact string matching: O(n) On a reconfigurable mesh: O(1) on n(n-m+1) PEs On a SIMD hypercube (limited to {0,1}): O(lg n) on
n/lg n PEs On a neural network: O(1) on nm PEs ASC algorithms: O(m) time on O(n) PEs
28
Question to considerQuestion to consider
The “don’t care” character allows non-matching for an arbitrary length. This is discussed on slide 13. Instead, consider “*” to allow a non-match for two characters and make necessary changes in trace in Slide 15-16.
The “don’t care” character allows non-matching for an arbitrary length. This is discussed on slide 13. Instead, consider “*” to allow a non-match for two characters and make necessary changes in trace in Slide 15-16.
Recommended