Upload
melvin-norris
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Algorithms to Accelerate Multiple Regular Expressions
Matching for Deep Packet Inspection
Sailesh KumarSarang Dharmapurikar
Fang YuPatrick CrowleyJonathan Turner
Presented by: Sailesh Kumar
2 - Sailesh Kumar - 04/21/23
Overview
Why regular expressions acceleration is important?
Introduction to our approach» Delayed Input DFA (D2FA)
D2FA construction
Simulation results
Memory mapping algorithm
Conclusion
3 - Sailesh Kumar - 04/21/23
Why Regular Expressions Acceleration?
RegEx are now widely used» Network intrusion detection systems, NIDS» Layer 7 switches, load balancing» Firewalls, filtering, authentication and monitoring» Content-based traffic management and routing
RegEx matching is expensive»Space: Large amount of memory»Bandwidth: Requires 1+ state traversal per byte
RegEx is performance bottleneck»In enterprise switches from Cisco, etc»Cisco security appliances
– Use DFA, 1+ GB memory, still sub-gigabit throughput
»Need to accelerate RegEx!
4 - Sailesh Kumar - 04/21/23
Can we do better?
Well studied in compiler literature» What’s different in Networking?» Can we do better?
Construction time versus execution time (grep)» Traditionally, (construction + execution) time is the metric» In networking context, execution time is critical» Also, there may be thousands of patterns
DFAs are fast» But can have exponentially large number of states» Algorithms exist to minimize number of states» Still 1) low performance and 2) gigabytes of memory
How to achieve high performance?» Use ASIC/FPGA
– On-chip memories provides ample bandwidth– Volume and need for speed justifies custom solution
» Limited memory, need space efficient representation!
5 - Sailesh Kumar - 04/21/23
Introduction to Our Approach
How to represent DFAs more compactly?» Can’t reduce number of states» How about reducing number of transitions?
– 256 transitions per state– 50+ distinct transitions per state (real world datasets)– Need at least 50+ words per state
Three rulesa+, b+c, c*d+
2
1 3b
4
5
a
d
a
c
a b
d
a
c
b
cb
b
a
c
d
d
d
c
4 transitionsper state
Look at state pairs: there are many common transitions.How to remove them?
6 - Sailesh Kumar - 04/21/23
Introduction to Our Approach
How to represent DFAs more compactly?» Can’t reduce number of states» How about reducing number of transitions?
– 256 transitions per state– 50+ distinct transitions per state (real world datasets)– Need at least 50+ words per state
Three rulesa+, b+c, c*d+
1 3
a
a
a b
b
2
5
4
cb
b
c
d
d
d
c
4 transitionsper state
AlternativeRepresentation
d
c
a
b
d
c
a
1 3
a
a
a b
b
2
5
4
cb
b
c
d
d
d
c
d
c
a
b
d
c
a
Fewer transitions,less memory
7 - Sailesh Kumar - 04/21/23
D2FA Operation
1 3
a
a
a b
b
2
5
4
cb
b
c
d
d
d
c
d
c
a
b
d
c
a
1 3
a
2
5
4
cc
b
d
Input stream: a b d DFA and D2FA visits thesame accepting state after
consuming a character
Heavy edges are called default transitionsTake default transitions, whenever, a labeled transition is missing
DFA D2FA
8 - Sailesh Kumar - 04/21/23
D2FA Operation
1 3
a
a
a b
b
2
5
4
cb
b
c
d
d
d
c
d
c
a
b
d
c
a
1 3
a
2
5
4
cc
b
d
Any set of default transitions will suffice ifthere are no cycles of default transitions
Thus, we need to construct trees of default transitions
So, how to construct space efficient D2FAs?while keeping default paths bounded
2
1
3
4
d
c
b
2
1
3
4
c
b
d
a
5 5
a
c
c
Above two set of default transitions trees are also correctHowever, we may traverse 2 default transitions to consume a character
Thus, we need to do more work => lower performance
9 - Sailesh Kumar - 04/21/23
D2FA Construction
Present systematic approach to construct D2FA Begin with a state minimized DFA Construct space reduction graph
» Undirected graph, vertices are states of DFA» Edges exist between vertices with common transitions» Weight of an edge = # of common transitions - 1
2
1 3 b
4
5
a
d
a
c
a b
d
a
c
b
cb
b
a
c
d
d
d
c
2
1
3
4
5
3
3
3
2
32
2
2
3
3
10 - Sailesh Kumar - 04/21/23
D2FA Construction
Convert certain edges into default transitions» A default transition reduces w transitions (w = wt. of edge)» If we pick high weight edges => more space reduction» Find maximum weight spanning forest» Tree edges becomes the default transitions
Problem: spanning tree may have very large diameter» Longer default paths => lower performance
2
1 3b
4
5
a
d
a
c
a b
d
a
c
b
cb
b
a
c
d
d
d
c
2
1
3
4
5
3
3
3
2
32
2
2
3
3
# of transitionsremoved = 2+3+3+3=11 root
11 - Sailesh Kumar - 04/21/23
D2FA Construction
We need to construct bounded diameter trees» NP-hard» Small diameter bound leads to low trees weight
– Less space efficient D2FA» Time-space trade-off
We propose heuristic algorithm based upon Kruskal’s algorithm to create compact bounded diameter D2FAs
2
1 3b
4
5
a
d
a
c
a b
d
a
c
b
cb
b
a
c
d
d
d
c
2
1
3
4
5
3
3
3
2
32
2
2
3
3
12 - Sailesh Kumar - 04/21/23
D2FA Construction
Our heuristic incrementally builds spanning tree» Whenever, there is an opportunity, keep diameter small» Based upon Kruskal’s algorithm» Details in the paper
13 - Sailesh Kumar - 04/21/23
Results
We ran experiments on» Cisco RegEx rules» Linux application protocol classifier rules» Bro rules» Snort rules (subset of rules)
Size of DFA versus D2FA (No default path length bound applied)
Original DFA D2FA Normal spanning tree Refined spanning tree DFA # of
states Total # of transitions # of transitions %
reduction Max. default
length # of transitions %
reduction Max. default
length
Cisco590 17713 4.5M 36k 99.2 57 36k 99.2 17
Cisco103 21050 5.3M 53k 99.0 54 53k 99.0 19
Cisco7 4260 1.0M 28k 97.4 61 28k 97.4 23
Linux56 13953 3.5M 58k 98.3 30 58k 98.3 21
Linux10 13003 3.3M 285k 91.3 20 285k 91.3 17
Snort11 41949 10.7M 168k 98.4 9 168k 98.4 6
Bro648 6216 1.5M 7k 99.5 17 7k 99.5 8
14 - Sailesh Kumar - 04/21/23
Space-Time Tradeoff
0.001
0.01
0.1
1 2 3 4 5 6 7
Bound on default path length
No
rma
lize
d D
2F
A s
ize
Cisco103
Linux56
Snort11
Bro648
Longer default path => more work but less space
Space efficient region
Default paths have length 4+Requires 4+ memory accesses per character
We propose memory architectureWhich enables us to consume
one character per clock cycle
15 - Sailesh Kumar - 04/21/23
Summary of Memory Architecture
We propose an on-chip ASIC architecture» Use multiple embedded memories to store the D2FA
– Flexibility– Frequent changes to rules
D2FA requires multiple memory accesses» How to execute D2FA at memory clock rates?
We have proposed deterministic contention free memory mapping algorithm» Uniform access to memories» Enables D2FA to consume a character per memory access» Nearly zero memory fragmentation
– All memories are uniformly used Details and results in paper
At 300 MHz we achieve 5 Gbps worst-case throughput
16 - Sailesh Kumar - 04/21/23
Conclusion
Deep packet inspection has become challenging» RegEx are used to specify rules» Wire speed inspection
We presented an ASIC based architecture to perform RegEx matching at 10’s of Gigabit rates
As suggested in the public review, this paper is not the final answer to RegEx matching» But it is a good start
We are presently developing techniques to perform fast RegEx matching using commodity memories» Collaborators are welcome!!!
17 - Sailesh Kumar - 04/21/23
Thank you and Questions?
18 - Sailesh Kumar - 04/21/23
Backup Slides
19 - Sailesh Kumar - 04/21/23
D2FA Construction
Our heuristic incrementally builds spanning tree» Whenever, there is an opportunity, keep diameter small» Details in the paper
Graph with 31 states, max. wt. default transition tree» Our heuristic creates smaller default paths
1
2
3
4
5
6
7
8
9
1 0
11
1 2 1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
2 2
2 3
2 4
2 5
2 6
2 7
2 82 9
3 0
3 1
1
2
3
45
6
7
8
9
1 0
11
1 2 1 31 4
1 5
1 6
1 7
1 8
1 9
2 0
2 1
2 2
2 3
2 4
2 5
2 62 7
2 8
2 9
3 0
3 1
Kruskal’s algorithm,Max. default path = 8 edges
Our refined Kruskal’s algorithm,Avg. default path = 5 edges
20 - Sailesh Kumar - 04/21/23
Multiple Memories
To achieve high performance, use multiple memories and D2FA engines
Multiple memories provide high aggregate bandwidth Multiple engines use bandwidth effectively
» However, worst case performance may be low– No better than a single memory
» May need complex circuitry to handle contention We propose deterministic contention free memory
mapping and compare it to a random mapping
Me mo ry Me mo ry Me mo ry Me mo ry....
D2FAs c anner
D2FAs c anner
D2FAs c anner
....
21 - Sailesh Kumar - 04/21/23
Memory Mapping
The memory mapping algorithm can be modeled as a graph coloring» Graph is the set of default transition trees» Colors represent the memory modules» Color nodes of the trees such that
– Nodes along a default path are colored with different colors– All colors are uniformly used
We propose two methods, naïve and adaptive
4
2
4
33 233 3
3
2 1
4
1
1
3
3
23 43 2
2
4
1
4
1 1
2
3
41
Naïve coloring Adaptive coloring
22 - Sailesh Kumar - 04/21/23
Results
Adaptive mapping leads to much more uniform color usage» Memories are uniformly used, little fragmentation» Up to 20% space saving with adaptive coloring
Throughput results (300 MHz dual-port eSRAM)
0
2
4
6
8
10
1 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64
Number of concurrently scanned packets
Thr
ough
put
(Gbp
s)
a ve ra g e p e rfo rma n ce
Syn th e tica lly g e n e ra te dwo rst-ca se in p u t d a ta
Adaptive mappingRandomized mapping