View
282
Download
0
Category
Preview:
DESCRIPTION
Knuth–Morris–Pratt algorithm
Citation preview
KMP AlgorithmKnuth-Morris-Pratt
String Searching Problem
Input: A word string W and a text string S
Check if W exists as a substring of S, and if it does then return its location.
Output: The position in S at which W is found
Brute Force
S:
W:
A B C A B C A B A B A C
C A B A
Brute Force
S:
W:
A B C A B C A B A B A C
C A B A
Brute Force
S:
W:
A B C A B C A B A B A C
C A B A
Brute Force
S:
W:
A B C A B C A B A B A C
C A B A
Brute Force
S:
W:
A B C A B C A B A B A C
C A B A
Brute Force
S:
W:
A B C A B C A B A B A C
C A B A
Brute Force
S:
W:
A B C A B C A B A B A C
C A B A
Worst Case of Brute Force
S:
W:
A A A A A A A A A A A A
A A A C
A A C
Worst Case of Brute Force
S:
W:
A A A A A A A A A A A A
A A A C
A A C
If |S|=n, |W|=m then the algorithm runs in O(mn) time.
Better AlgorithmsBackward AlgorithmBoyer and Moore AlgorithmColussi AlgorithmCrochemore and Perrin AlgorithmGalil Gianardo AlgorithmGalil and Seiferas AlgorithmHorsepool Algorithm Knuth Morris and Pratt AlgorithmKMP Skip AlgorithmMax-Suffix Matching AlgorithmMorris and Pratt AlgorithmQuick Searching Algorithm
Raita AlgorithmReverse Factor AlgorithmReverse Colussi AlgorithmSelf Max-Suffix AlgorithmSimon Algorithm Skip Search AlgorithmSmith Algorithm Tuned Boyer and Moore AlgorithmTwo Way AlgorithmUniqueness Algorithm Wide Window AlgorithmZhu and Takaoka Algorithm
KMP
Linear Time Avoids comparisons with elements of S that
have already been involved in a comparison, i.e. backtracking in S never occurs
Time: O(m+n) Space: O(m+n)
KMP Differs from brute force by always keeping
track of the information that it gains from previous comparisons
A failure function or partial matching table (T) is computed which tells us how much of the last comparison can be reused if it fails
T[i]=the longest prefix of W that is also a proper suffix of W[0..i]
KMPT shows how much of the beginning of W matches up to the portion of S immediately preceding the failed comparison.
. . A B C A B C A B A .
A B C A B A
A B C A B A
No need to repeat these comparisonsResume comparing here
Sliding Window Approach
Nearly all exact string matching algorithms use the slide window approach
Whenever a mismatch is found, slide the window to the right
Sliding Window Approach
Nearly all exact string matching algorithms use the slide window approach
Whenever a mismatch is found, slide the window to the right
Suffix to Prefix RuleFor a window to have any chance to match a pattern, in some way, there must be a suffix of the window which is equal to a prefix of the pattern.
KMPT shows how much of the beginning of W matches up to the portion of S immediately preceding the failed comparison.
. . A B C A B C A B A .
A B C A B A
A B C A B A
No need to repeat these comparisonsResume comparing here
KMPT shows how much of the beginning of W matches up to the portion of S immediately preceding the failed comparison.
. . A B C A B C A B A .
A B C A B A
A B C A B A
No need to repeat these comparisonsResume comparing here
KMP example
mSWi
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
A B C A B C D A A B C D A B C D A D
A B C D A D
0 1 2 3 4 5
KMP example
mSWi
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
A B C A B C D A A B C D A B C D A D
A B C D A D
0 1 2 3 4 5
KMP example
mSWi
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
A B C A B C D A A B C D A B C D A D
A B C D A D
0 1 2 3 4 5
KMP example
mSWi
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
A B C A B C D A A B C D A B C D A D
A B C D A D
0 1 2 3 4 5
KMP example
mSWi
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
A B C A B C D A A B C D A B C D A D
A B C D A D
0 1 2 3 4 5
KMP example
mSWi
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
A B C A B C D A A B C D A B C D A D
A B C D A D
0 1 2 3 4 5
KMP example
mSWi
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
A B C A B C D A A B C D A B C D A D
A B C D A D
0 1 2 3 4 5
KMP example
mSWi
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
A B C A B C D A A B C D A B C D A D
A B C D A D
0 1 2 3 4 5
KMP
Calculating the longest valid suffix during runtime will be very inefficient
Pre-processing can eliminate the problem, as the suffix also exists in W itself
KMP
The algorithm preprocesses the word W to produce the prefix function, which gives the number of steps the pattern can skip for every possible location of a mismatch
Components of KMP
Compute Prefix Function: For a given W, compute a table T of equal length where T[i] gives the length of the longest prefix of W that is also a proper suffix of W[0..i].
KMP Matcher Function: Actual searching.
Example of a prefix function
WT
A C A A C A C B
0
Example of a prefix function
WT
A C A A C A C B
0 0
Example of a prefix function
WT
A C A A C A C B
0 0 1
Example of a prefix function
WT
A C A A C A C B
0 0 1 1
Example of a prefix function
WT
A C A A C A C B
0 0 1 1 2
Example of a prefix function
WT
A C A A C A C B
0 0 1 1 2 3
Example of a prefix function
WT
A C A A C A C B
0 0 1 1 2 3 2
Example of a prefix function
WT
A C A A C A C B
0 0 1 1 2 3 2 0
Example
SWT
A C B A C A A C A A C A C A A C A B
A C A A C A B
0 0 1 1 2 3 0
Example
SWT
A C B A C A A C A A C A C A A C A B
A C A A C A B
0 0 1 1 2 3 0
Example
SWT
A C B A C A A C A A C A C A A C A B
A C A A C A B
0 0 1 1 2 3 0
Example
SWT
A C B A C A A C A A C A C A A C A B
A C A A C A B
0 0 1 1 2 3 0
Example
SWT
A C B A C A A C A A C A C A A C A B
A C A A C A B
0 0 1 1 2 3 0
Example
SWT
A C B A C A A C A A C A C A A C A B
A C A A C A B
0 0 1 1 2 3 0
Example
SWT
A C B A C A A C A A C A C A A C A B
A C A A C A B
0 0 1 1 2 3 0
Example
SWT
A C B A C A A C A A C A C A A C A B
A C A A C A B
0 0 1 1 2 3 0
Example
SWT
A C B A C A A C A A C A C A A C A B
A C A A C A B
0 0 1 1 2 3 0
Matcher FunctionKMP(String S, String W):
set T to prefixFunc(W) //Compute the partial match table
set q to 0 //Candidate character of W initially 0
for every i in range 0 to n-1
while q>0 and W[q] is not equal to S[i]
set q to T[q-1] //Mismatch, backtrack if you can
if W[q] is equal to S[i]
increment q //Match, move to next character
if q is equal to m
print i-m+1 //Entire W has been found
set q to T[q-1] //Find others
Prefix FunctionprefixFunc(List W):
set T[0] to 0 //Set first element of table to 0
set k to 0 //Candidate character initially 0
for every q in range 1 to m-1
while k>0 and W[k] is not equal to W[q]
set k to T[k-1] //Mismatch, backtrack if possible
if W[k] is equal to W[q]
increment k //Match, move to next character
Set T[q] to k //Store result
return T
Runtime AnalysisAlthough the algorithm as implemented here contains a loop within a loop, it runs in linear time. This is because the backtracking statement, which essentially shifts the sliding window to the right, can only execute a maximum of n times in the entire run of the for loop. The remaining body of the for loop runs executes exactly n times itself, giving a runtime of O(n) for the matching function.Similar reasoning applies to the prefix function.
Recommended