Upload
kylynn-wall
View
33
Download
1
Embed Size (px)
DESCRIPTION
Knuth-Morris-Pratt. String matching algorithm. Ivaylo Kenov. Telerik Corporation. http:/telerikacademy.com. Telerik Academy Student. Table of Contents. Background and idea The “naive” approach Basic definitions Preprocessing Search algorithm Complexity Additional information. - PowerPoint PPT Presentation
Citation preview
Knuth-Morris-PrattString matching algorithm
Ivaylo Kenov
Telerik Corporationhttp:/telerikacademy.
com
Telerik Academy Student
Table of Contents
1. Background and idea
2. The “naive” approach
3. Basic definitions
4. Preprocessing
5. Search algorithm
6. Complexity
7. Additional information
2
Background and ideaWhat is the problem?
Background and idea The problem of string matching. We have string text and pattern word.
Check if word occurs in text. If so, return the position where pattern occurs.
If not, return -1.
The “naive” approach
New to string searching
The naive approach (1) Very obvious solution – compare element by element.
O(m*n) complexity – not good!
Example:String Text
Pattern Word
The naive approach (2) Step 1: compare word[0] with text[0]
Step 2: compare word[1] with text[1]
Text
Word
Text
Word
The naive approach (3) Step 1: compare word[2] with text[2]
Mismatch found – shift word one index to the right and repeat!
Text
Word
Text
Word
The naive approach (4) A match will be found after three shifts to the right of the word!
Problem with the “naive” approach – two much comparisons over the same character!
TextWord
The “naive” approach
Live demo
Knuth-Morris-PrattWithout repeating!
Knuth-Morris-Pratt Linear time algorithm for string matching.
O(n) complexity. Backtracking never occurs. Already visited characters are not repeated!
Useful with binary data and small-alphabet strings.
Basic definitionsEasy theory!
Basic definitions (1) Prefix – a substring with which our string starts. Example: “abcdef” starts with
“abc”.
Suffix – a substring with which our string ends. Example: “abcdef” ends with
“def”.
Proper prefix and proper suffix – if the length of the substring is less than the length of the string.
Basic definitions (2) Border - if a substring is proper prefix and proper suffix at the same time. Example: “ab” is border of
“abcab”.
Width of border – length of the border.
The empty string “” is proper prefix, proper suffix and border at the same time of any string!
Basic definitions (3) How much the algorithm shifts the pattern?
The shift distance is determined by the widest border of the matching prefix of word.
Distance = length of the matching prefix – length of the widest border.
PreprocessingBuilding every border!
Preprocessing (1) If a, b are borders of text and length of a < length of b, then a is border of b.
A border r of x can be extended by a, if ra is border of xa.
18
Preprocessing (2) We build an array table, which contains information about border widths.
When preprocessing a value, we already know the previous ones and use the extending of the borders for checking.
Border can be extended if tableb[i] = tablei.
If not next border to check is table[table[i]].
19
Preprocessing (3)
20
void FailFunction(string word) { int index = 0; int borderWidth = -1; failureTable[index] = borderWidth; while (index < word.Length) { while (borderWidth >= 0 && word[index] != word[borderWidth]) { borderWidth = failureTable[borderWidth]; } index++; borderWidth++; failureTable[index] = borderWidth; } }
Algorithm for building the table:
Preprocessing (4)
21
Example for table: For pattern ”ababaa” the widths of
the borders in array b have the following values. For instance we have table[5] = 3, since the prefix “ababa” of length 5 has a border of width 3.
Note: zero element is always -1.
PreprocessingLive demo
Search algorithmFinding the word!
Search algorithm (1)
24
static int KMPSearch(string text, string word, int position) { int index = 0; int borderWidth = 0; int currentPosition = 1;
while (index < text.Length) { while (borderWidth >= 0 && text[index] != word[borderWidth]) { borderWidth = failureTable[borderWidth]; }
index++; borderWidth++;
Continues…
The search algorithm is similar:
Search algorithm (2)
25
Continues…if (borderWidth == word.Length) { if (position == currentPosition) { return (index - borderWidth); } else { currentPosition++; } borderWidth = failureTable[borderWidth]; } }
return -1; }
Algorithm continues:
Search algorithm (3)
26
How it works:
Example:
Search algorithmLive demo
ComplexityLinear time algorithm!
Complexity
29
The table building algorithm is O(m) where m is the length of the pattern.
The search algorithm is O(n) where n is the length of the text.
Overall complexity therefore is O(n).
Additional information Wikipedia: http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm#Worked_example_of_the_table-building_algorithm
Knuth-Morris-Pratt explained: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm
Examples and concept: http://wcipeg.com/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
30
форум програмиране, форум уеб дизайнкурсове и уроци по програмиране, уеб дизайн – безплатно
програмиране за деца – безплатни курсове и уроцибезплатен SEO курс - оптимизация за търсачки
уроци по уеб дизайн, HTML, CSS, JavaScript, Photoshop
уроци по програмиране и уеб дизайн за ученициASP.NET MVC курс – HTML, SQL, C#, .NET, ASP.NET MVC
безплатен курс "Разработка на софтуер в cloud среда"
BG Coder - онлайн състезателна система - online judge
курсове и уроци по програмиране, книги – безплатно от Наков
безплатен курс "Качествен програмен код"
алго академия – състезателно програмиране, състезания
ASP.NET курс - уеб програмиране, бази данни, C#, .NET, ASP.NETкурсове и уроци по програмиране – Телерик академия
курс мобилни приложения с iPhone, Android, WP7, PhoneGap
free C# book, безплатна книга C#, книга Java, книга C#Дончо Минков - сайт за програмиранеНиколай Костов - блог за програмиранеC# курс, програмиране, безплатно
?
? ? ??
?? ?
?
?
?
??
?
?
? ?
Questions?
?http://algoacademy.telerik.com
Free Trainings @ Telerik Academy
“C# Programming @ Telerik Academy csharpfundamentals.telerik.com
Telerik Software Academy academy.telerik.com
Telerik Academy @ Facebook facebook.com/TelerikAcademy
Telerik Software Academy Forums forums.academy.telerik.com