32
Knuth-Morris-Pratt String matching algorithm Ivaylo Kenov Telerik Corporation http:/telerikacademy .com Telerik Academy Student

Knuth-Morris-Pratt

Embed Size (px)

DESCRIPTION

Knuth-Morris-Pratt. String matching algorithm. Ivaylo Kenov. Telerik Corporation. http:/telerikacademy.com. Telerik Academy Student. Table of Contents. Background and idea The “naive” approach Basic definitions Preprocessing Search algorithm Complexity Additional information. - PowerPoint PPT Presentation

Citation preview

Page 1: Knuth-Morris-Pratt

Knuth-Morris-PrattString matching algorithm

Ivaylo Kenov

Telerik Corporationhttp:/telerikacademy.

com

Telerik Academy Student

Page 2: Knuth-Morris-Pratt

Table of Contents

1. Background and idea

2. The “naive” approach

3. Basic definitions

4. Preprocessing

5. Search algorithm

6. Complexity

7. Additional information

2

Page 3: Knuth-Morris-Pratt

Background and ideaWhat is the problem?

Page 4: Knuth-Morris-Pratt

Background and idea The problem of string matching. We have string text and pattern word.

Check if word occurs in text. If so, return the position where pattern occurs.

If not, return -1.

Page 5: Knuth-Morris-Pratt

The “naive” approach

New to string searching

Page 6: Knuth-Morris-Pratt

The naive approach (1) Very obvious solution – compare element by element.

O(m*n) complexity – not good!

Example:String Text

Pattern Word

Page 7: Knuth-Morris-Pratt

The naive approach (2) Step 1: compare word[0] with text[0]

Step 2: compare word[1] with text[1]

Text

Word

Text

Word

Page 8: Knuth-Morris-Pratt

The naive approach (3) Step 1: compare word[2] with text[2]

Mismatch found – shift word one index to the right and repeat!

Text

Word

Text

Word

Page 9: Knuth-Morris-Pratt

The naive approach (4) A match will be found after three shifts to the right of the word!

Problem with the “naive” approach – two much comparisons over the same character!

TextWord

Page 10: Knuth-Morris-Pratt

The “naive” approach

Live demo

Page 11: Knuth-Morris-Pratt

Knuth-Morris-PrattWithout repeating!

Page 12: Knuth-Morris-Pratt

Knuth-Morris-Pratt Linear time algorithm for string matching.

O(n) complexity. Backtracking never occurs. Already visited characters are not repeated!

Useful with binary data and small-alphabet strings.

Page 13: Knuth-Morris-Pratt

Basic definitionsEasy theory!

Page 14: Knuth-Morris-Pratt

Basic definitions (1) Prefix – a substring with which our string starts. Example: “abcdef” starts with

“abc”.

Suffix – a substring with which our string ends. Example: “abcdef” ends with

“def”.

Proper prefix and proper suffix – if the length of the substring is less than the length of the string.

Page 15: Knuth-Morris-Pratt

Basic definitions (2) Border - if a substring is proper prefix and proper suffix at the same time. Example: “ab” is border of

“abcab”.

Width of border – length of the border.

The empty string “” is proper prefix, proper suffix and border at the same time of any string!

Page 16: Knuth-Morris-Pratt

Basic definitions (3) How much the algorithm shifts the pattern?

The shift distance is determined by the widest border of the matching prefix of word.

Distance = length of the matching prefix – length of the widest border.

Page 17: Knuth-Morris-Pratt

PreprocessingBuilding every border!

Page 18: Knuth-Morris-Pratt

Preprocessing (1) If a, b are borders of text and length of a < length of b, then a is border of b.

A border r of x can be extended by a, if ra is border of xa.

18

Page 19: Knuth-Morris-Pratt

Preprocessing (2) We build an array table, which contains information about border widths.

When preprocessing a value, we already know the previous ones and use the extending of the borders for checking.

Border can be extended if tableb[i] = tablei.

If not next border to check is table[table[i]].

19

Page 20: Knuth-Morris-Pratt

Preprocessing (3)

20

void FailFunction(string word) { int index = 0; int borderWidth = -1; failureTable[index] = borderWidth; while (index < word.Length) { while (borderWidth >= 0 && word[index] != word[borderWidth]) { borderWidth = failureTable[borderWidth]; } index++; borderWidth++; failureTable[index] = borderWidth; } }

Algorithm for building the table:

Page 21: Knuth-Morris-Pratt

Preprocessing (4)

21

Example for table: For pattern ”ababaa” the widths of

the borders in array b have the following values. For instance we have table[5] = 3, since the prefix “ababa” of length 5 has a border of width 3.

Note: zero element is always -1.

Page 22: Knuth-Morris-Pratt

PreprocessingLive demo

Page 23: Knuth-Morris-Pratt

Search algorithmFinding the word!

Page 24: Knuth-Morris-Pratt

Search algorithm (1)

24

static int KMPSearch(string text, string word, int position) { int index = 0; int borderWidth = 0; int currentPosition = 1;

while (index < text.Length) { while (borderWidth >= 0 && text[index] != word[borderWidth]) { borderWidth = failureTable[borderWidth]; }

index++; borderWidth++;

Continues…

The search algorithm is similar:

Page 25: Knuth-Morris-Pratt

Search algorithm (2)

25

Continues…if (borderWidth == word.Length) { if (position == currentPosition) { return (index - borderWidth); } else { currentPosition++; } borderWidth = failureTable[borderWidth]; } }

return -1; }

Algorithm continues:

Page 26: Knuth-Morris-Pratt

Search algorithm (3)

26

How it works:

Example:

Page 27: Knuth-Morris-Pratt

Search algorithmLive demo

Page 28: Knuth-Morris-Pratt

ComplexityLinear time algorithm!

Page 29: Knuth-Morris-Pratt

Complexity

29

The table building algorithm is O(m) where m is the length of the pattern.

The search algorithm is O(n) where n is the length of the text.

Overall complexity therefore is O(n).

Page 31: Knuth-Morris-Pratt

форум програмиране, форум уеб дизайнкурсове и уроци по програмиране, уеб дизайн – безплатно

програмиране за деца – безплатни курсове и уроцибезплатен SEO курс - оптимизация за търсачки

уроци по уеб дизайн, HTML, CSS, JavaScript, Photoshop

уроци по програмиране и уеб дизайн за ученициASP.NET MVC курс – HTML, SQL, C#, .NET, ASP.NET MVC

безплатен курс "Разработка на софтуер в cloud среда"

BG Coder - онлайн състезателна система - online judge

курсове и уроци по програмиране, книги – безплатно от Наков

безплатен курс "Качествен програмен код"

алго академия – състезателно програмиране, състезания

ASP.NET курс - уеб програмиране, бази данни, C#, .NET, ASP.NETкурсове и уроци по програмиране – Телерик академия

курс мобилни приложения с iPhone, Android, WP7, PhoneGap

free C# book, безплатна книга C#, книга Java, книга C#Дончо Минков - сайт за програмиранеНиколай Костов - блог за програмиранеC# курс, програмиране, безплатно

?

? ? ??

?? ?

?

?

?

??

?

?

? ?

Questions?

?http://algoacademy.telerik.com

Page 32: Knuth-Morris-Pratt

Free Trainings @ Telerik Academy

“C# Programming @ Telerik Academy csharpfundamentals.telerik.com

Telerik Software Academy academy.telerik.com

Telerik Academy @ Facebook facebook.com/TelerikAcademy

Telerik Software Academy Forums forums.academy.telerik.com