27
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T.V . Lakshman and Randy H. Katz Publisher: ANCS'06, December 3–5, 2006 Present: Yu-Tso Chen Date: November, 6, 2007

1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

1

Fast and Memory-Efficient Regular Expression

Matching for Deep Packet Inspection

Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman and Randy H. Katz

Publisher: ANCS'06, December 3–5, 2006

Present: Yu-Tso Chen

Date: November, 6, 2007

Page 2: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

2

Outline

1. Introduction 2. Definitions and problem description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple

Patterns 5. Evaluation Result 6. Conclusion

Page 3: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

3

Introduction

Three unique complex features• 1) Large numbers of wildcards can cause DFA to

grow exponentially

• 2) Wildcard are used with length restriction(‘?’, ‘+’) will increase the resource

• 3) Groups of characters are also commonly used such interaction can result in highly complex state machine(ex.”^220[\x09-]*ftp”)

Page 4: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

4

Introduction (cont.)

Make following contributions• 1) Analyze the computational and storage cost of

building individual DFAs

• 2) Two rewrite rules for specific regular expressions

• 3) Combine multiple DFAs into a small number of group

Page 5: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

5

Outline

1. Introduction 2. Definitions and problem

description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple

Patterns 5. Evaluation Result 6. Conclusion

Page 6: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

6

Regular Expression Patterns

Compares the regular expressions used in two networking applications (Snort, Linux L-7 filter & XML filtering)• 1)Both types of app. Use wildcards

(‘.’,’?‘,’+’,’*’) contain larger numbers of them

• 2) Classes of characters (“[ ]”) are used only in packet scanning applications

• 3) High percentage of scanning app. Have length restrictions on some of the classes or wildcards

Page 7: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

7

Regular Expression Patterns

Page 8: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

8

Solution Space for Regular Expression Matching

A single regular expression of length n can be expressed as an NFA with O(n)

When the NFA is converted into a DFA, it may generate states

The processing complexity for each character in the input is O(1) in DFA, but is O(n2) for an NFA when all n states are active at the same time

nO

Page 9: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

9

Solution Space for Regular Expression Matching (cont.)

To handle m regular expressions, two choices are possible :• Processing them individually in m automata

• Compiling them into a single automaton

Page 10: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

10

Problem Statement

DFA-based approaches in this paper• Our goal is to achieve O(1) computation cost

• The focus of the study is to reduce memory overhead of DFA

There are two sources of memory usage in DFAs : states and transitions• We consider the number of states as the prim

ary factor

Page 11: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

11

Outline

1. Introduction 2. Definitions and problem description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple

Patterns 5. Evaluation Result 6. Conclusion

Page 12: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

12

Design Considerations

Define Completeness of Matching Results :• Exhaustive Matching:M(P,S)={substring S’ of S | S

’ is accepted by the DFA of P}

• It is expensive and often unnecessary to report all matching substrings

• We propose a new concept, Non-overlapping Matching, that relaxes the requirements of exhaustive matching

• Non-overlapping Matching:

• Ex : ab* if input abbb non-overlapping matching will report one match instead of three

• Exhaustive Matching will report, ab, abb, abbb

}P, ofDFA by the accepted ,|S of Si substring{),( SjSiSjSiSPM

Page 13: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

13

Design Considerations (cont.)

Define DFA Execution Model for Substring Matching : We focus on patterns without ‘^’ attached at the beginning• Repeater searches

• One-pass search – this approach can truly achieve O(1) computation cost per character

Page 14: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

14

DFA Analysis for Individual Regular Expressions

The study is based on the use of exhaustive matching & one-pass search

Page 15: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

15

Case 4 : DFA of Quadratic Size The DFA needs to remember the

number of Bs it has seen and their locations

Page 16: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

16

Case 5 : DFA of Exponential Size An exponential number of states

(22+1)are needed to represent these two wildcard characters

AAB(AABBCD) is different from ABA(ABABCD) because a subsequence input BCD

Page 17: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

17

Regular Expression Rewrites

Rewrite Rule(1)• “^SEARCH\s+[^\n]{1024}” to

“^SEARCH\s [^\n]{1024}”

• “^A+[A-Z]{j}” to “^A [A-Z]{j}” • We can prove match “^A+[A-Z]{j}” also match “^A

[A-Z]{j}”

Page 18: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

18

Regular Expression Rewrites (cont.)

Rewrite Rule(2)• We don’t need to keep track of the second AUTH\s

• If there is a ‘\n’ within the next 100 bytes, the return character must also be within 100 bytes to the second AUTH\s

• If there is no ‘\n’ within the next 100 bytes, the first already matched the pattern

• “([^A]|A[^U]|AU[^T]|AUT[^H]|AUTH[^\s]|AUTH\s[^\n]{0,99}\n)*AUTH\s[^\n]{100}”

Page 19: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

19

Outline

1. Introduction 2. Definitions and problem description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple

Patterns 5. Evaluation Result 6. Conclusion

Page 20: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

20

Selective Grouping of Multiple Patterns

The composite DFA may experience exponential growth in size, although none of the individual DFA has an exponential component

Page 21: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

21

Regular Expressions Grouping Algorithm

Definition of interaction : two patterns interact with each other if their composite DFA contains more states than the sum of two individual ones

Page 22: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

22

Grouping Algorithm

Page 23: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

23

Outline

1. Introduction 2. Definitions and problem description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple

Patterns 5. Evaluation Result 6. Conclusion

Page 24: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

24

Evaluation Result

Effect of Rule Rewriting

Page 25: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

25

Evaluation Result (cont.)

Page 26: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

26

Outline

1. Introduction 2. Definitions and problem description 3. Matching of Individual Patterns 4. Selective Grouping of Multiple

Patterns 5. Evaluation Result 6. Conclusion

Page 27: 1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National

27

Conclusion

Rewriting techniques –

memory-efficient DFA-based approaches are possible

Selectively groups patterns together –

speed up the matching process