Upload
hugh-simon
View
239
Download
0
Embed Size (px)
Citation preview
Ó All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of
Professor Nen-Fu Huang (E-mail: [email protected]).
Deep Packet Inspection Algorithms
DPI.2
Agenda
Introduction Pattern Matching Technologies Content Inspection Co-
processors Flow-based Application
Identification Layer 7 Security Switches Conclusion and Open Issues
DPI.3
Introduction -- Goals What?
Multi-pattern matching algorithms Search a set of patterns simultaneously
Where? Networks: In-depth packet inspection
engines How?
Search the whole packet payload to identify interested packets that contain certain pre-defined patterns
Who? Content-aware network devices Application-oriented management e.g. intrusion detection system (IDS),
anti-virus appliance, application firewall or layer-7 switch.
DPI.4
Packet Inspection Engines
Packet Capture
Decoder
Pattern Matching Engine
Output Module
Network Traffic
PatternDB
capture all packets in the subnet
fit the captured packets into data structures and identifytheir protocol type
match packets against each pattern
Generate notifications orput packets into assignedqueues
DPI.5
Intrusion Detection Systems (IDSs)
Active Response
Pattern Set
IDS
DPI.6
Challenges
Do string matching to match every payload against all patterns
The length of pattern is variable From 1 byte to 122 bytes in the case of
Snort rule set Snort: a famous open-source IDS An example of Snort’s pattern: GET
/scripts/root.exe?/c+dir
Patterns may appear anywhere in the payloads
The total number of patterns is usually a few thousands
DPI.7
Challenges
The top four routines in Snort
The pattern matching routine is the most resource-intensive task in the IDS.
Efficient algorithms for multi-pattern matching
DPI.8
A Generic Layer-7 Engine
Packet Normalizer Makes sure the
integrity of incoming packets
Eliminates the ambiguity
Decodes URI strings if necessary
Pattern-Matching Engine
Policy Engine Gather
information from pattern-matching engine and issue the verdict to allow/drop the packets
Packet Normalizer
Pattern-MatchingEngine
Policy Engine
Network
PacketStream
NormalizedTraffic
MatchedEvents
Logs,Reports
Policies
Verdicts
FilteredTraffic
PacketStream
Signatures
DPI.9
Packet Normalizer Integrity Checking IP Fragment Reassemble TCP Segment Reassemble
TCP Segments may come out-of-order SEQ out of window size Segment Overlapping
URI Decode URI hex code obfuscation (‘a’ = %61) URI unicode/UTF-8 obfuscation self-referential directories
obfuscation (/././././ = /) directories obfuscation
(/abc/a/../a/../a/ = /abc/a)
DPI.10
Pattern-Matching Engine
The most computation-intensive task in packet processing. Normally the PM engine needs to process every single byte in packet payload.
In Snort, the PM routine accounts for 31% of the total execution time
DPI.11
Applications using the pattern-matching algorithm
SpamAssassin 75% Pattern
Matching 20% Program
Overheads 5% Decode
• Snort– 62% Pattern Matching– 17% Header
Classification– 13% IP Classification– 8% Other Matching
ClamAV 90% Pattern Matching 4% Inflating 2% De-Mimeing64-ing 2% Other Mime 2% Program
Overheads
DPI.12
Pattern Matching is Expensive!
~30 Instructions/ Byte. 45K Instructions/1500 Byte packet
~50 Instructions/ 1500 Byte packet
Source: Intel Corp.
DPI.13
Policy Engine
Collect the matching events from Pattern-Matching Engine.
Clarify the relationship between matched patterns: Ordered: A policy (rule) may consist
more than one pattern and should be matched in order.
Offset, Depth: The matched position should be within a certain range or location.
Distance, Within: The distance between two matched patterns should be taken into consideration also.
DPI.14
Policy Engine -- Match processing
A rule may consist of multiple patterns.
Inter-pattern matching conditions Order: all patterns of a rule must
appear in order. Location: Offset/depth constraints Location: distance/within constraints
pattern 1
within
beginning of a packet
end of a packet
pattern 2
distance
depth
offset
DPI.15
Policy Engine (cont.)
Trace Application States Some applications are difficult to
identify by using only one signature (e.g. P2P).
Policy Engine needs to track the connection state like the following diagram:
S0S1 S2 S3
Msg Exchange
Request File
Data Exchange
DPI.16
Agenda
Introduction Pattern Matching Technologies Content Inspection Co-
processors Flow-based Application
Identification Layer 7 Security Switches Conclusion and Open Issues
DPI.17
Pattern Matching Technologies
Pattern-Matching Algorithms Software Based
Boyer-Moore Aho-Corasick (AC) Wu-Manber HMA/EHMA/ACM Pre-Filtering
Hardware Based Reconfigure Hardware (FPGA) Bloom-Filter TCAM-based GPU-based
DPI.18
Pattern Matching Problem Definition
Pattern matching A string is a finite sequence of
symbols
Let P = {Pl,P2 . . . . . Pk} be a finite set of strings which we shall call patterns
Let T be an arbitrary string which we shall call the text string
The problem is to locate and identify all substrings of T which are patterns in P
DPI.19
The Simplest Matching
Patterns
Packet
M
N
T
The smallest unit of a symbol
U
(T-mi)/U +1
DPI.20
Software Based Pattern-Matching Algorithms
Boyer-Moore Algorithm (BM)
Best average-case time complexity Good for general network situations Single pattern matching
Aho-Corasick Algorithm (AC) Best worst-case time complexity Good for networks usually under heavy
attacks BM-like Algorithms
Modify BM for multi-pattern matching Wu-Manber Algorithm HMA/EHMA/ACM Algorithms
DPI.21
Boyer-Moore Algorithm (BM)
Proposed by R.S. Boyer and J.S. Moor in 1977
The most efficient single pattern-matching algorithm O (n + rm) comparison. (n: text length, m:
pattern length, r : number of matches) O (n/m) in best case
Heuristics Pre-processing the pattern
DPI.22
Boyer-Moore Algorithm (BM)
Bad-Character HeuristicSkip table: Scan a pattern from right to left. Each time you move left, if the character you are on is not in the table already, add it; its Shift value is its distance from the rightmost character.
If a char never appears in pattern p, then you can safely skip |p| chars when you read this char.
DPI.23
Bad Character Heuristic example
Boyer-Moore Algorithm (BM)
a c d a c d aa c d a c d a
text: a c d e c d a c d a c d a
b
DPI.24
Boyer-Moore Algorithm (BM)
Bad Character Heuristic example
A 3B 2C 1D 0X 4
XBXXABCXXABCDXX
ABCD
ABCD
ABCD
ABCD
44 1
DPI.25
Boyer-Moore Algorithm (BM)
Good-Suffix HeuristicShift table: If a mis-match occurs and repeated substrings exists in the pattern, it is able to shift to the next occurrence of a substring that matches what has already been matched.
When a mis-match occurs, choose the larger skip value of these two tables.
DPI.26
Boyer-Moore Algorithm (BM)
Good Suffix Heuristic
text: a c d e c d a c d a c d aa c d a c d a
a c d a c d a
u
v
DPI.27
Boyer-Moore Algorithm (BM)
Good Suffix Heuristic example
N 1 AN 8
MAN 3NMAN 6
ANPANMANANPANMANANPANMAN
ANPANMAN
DPI.28
BM Algorithm Example 1
X X X X N M A N M A N P A N M A N
A N P A N M A N
A N P A N M A N
8
A N P A N M A N
1
Match !!
DPI.29
BM Algorithm Example 2
BadChar
A C G T
1 6 2 8
Good Suffix
G C A G A G A G
7 7 7 2 7 4 7 1
G C A T C G C A G A G A G T A T A C A G T A A GText
G C A G A G A G
GAGAGACG GAGAGACG
GAGAGACG GAGAGACG
GAGAGACG GAGAGACG
GAGAGACG GAGAGACG
1 / 1
4 / 4
1 / 7
4 / 417 text character comparisons in total.
If NOT AG then move 7characters
A
DPI.30
Pattern Matching Algorithms Single-Pattern Matching vs. Multi-
Pattern Matching A single-pattern matching algorithm
is used to search a string (or text) T for the first occurrence or all occurrences of one given pattern.
A multi-pattern matching algorithm is adopted to search the input T for all occurrences of any patterns in P
DPI.31
The Aho-Corasick (AC) Algorithm
Proposed by A.V. Aho and M.J. Corasick in 1975.
AC is a classic solution to exact set matching.
It works in time O(n + m + z) where n is the text length, m is the pattern length, and z is number of patterns occurrences in T.
AC is based on a refinement of a keyword tree.
AC is a deterministic algorithm. That is, the performance is independent of the number of patterns.
DPI.32
The Aho-Corasick (AC) Algorithm
Pros Provide the best worst-case
computation complexity. The number of state transitions for
each input symbol is at most two. No reverse scan on the input
string. No constraint on the minimum
pattern size. Performance is independent from
the number of including patterns. (|P|)
DPI.33
An Example of AC Algorithm
Example: P = {ab, ba, babb, bb}
DPI.34
The Aho-Corasick (AC) Algorithm
Automaton-based algorithm Three functions: Goto, Fail, and
Output Goto(current_st, input_code)
State transition function Every prefix of the patterns is only
represented by one state. P={she, he, his, hers}
DPI.35
0
1 2 3ehs
4 5eh
6 7si
P={she }
8r
9s
Not {h, s}
, he, his, hers
The Aho-Corasick (AC) Algorithm
DPI.36
The Aho-Corasick (AC) Algorithm
0 0 4 5 0 0 0 10 1 2 3 4 5 6 7
0 18 9
Failst
Fail(current_st)Point to the longest suffix of the current state
Output(current_st)
3579
she, hehehis
hers
st Output
DPI.37
The Aho-Corasick (AC) Algorithm
struct ACO{struct ACO
*next_state[ |A| ];struct Output
*pattern_list;};
r
e
s
seh
h
i
s
Not {h, s} 0 4
1
5 8
2 3
6 7
9
Next(current_st, current_code)= Goto+Fail
DPI.38
An example of AC Algorithm
Dashed: fail transitions; those not shown leads to the root
h
e
e
h r s
i
ss
{hers}
{he, she}
{his}
{sh}{s}
{he}{h}!={h,s}
Patterns:
hers
his
she
DPI.39
An example of AC Algorithm
h
e
e
h r s
i
ss
Text: h e i s h i s
h e
i
s
hi
s
DPI.40
The Aho-Corasick (AC) Algorithm
Cons Large memory
requirement Poor performance in
the real system A lot of external
memory access
1028 bytes per state (256x4 =1024)(When |A| = 256, 32-bit pointer)about 10MB for state machine
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
256
256
256
DPI.41
AC with Bitmap
Proposed by Nathan Tuck et al in Infocom 2004
4
1
5
6
2
8
7
3
90
next_flag
fail_ptrnext_start
pattern_list
0 16 32
44 bytes per state(256 bits = 32 bytes32 bytes + 12 bytes = 44 bytes )
DPI.42
AC with Bitmap (ACB)Procedure ACB_Matching…..
8 While j<code do
9 popcount ← popcount+State->next_flag[j];
10 End
11 State ← State->next_start + popcount*Sizeof(ACB);
12 PM ← PM Out(∪ State->pattern_list);
13 End
14 Return;
For example, if the input symbol is ‘s’, as the ASCII code of ‘s’ is 0x76, ACB matching has to read 118 bits (7x16 + 6 =118) and accumulate these bits to obtain the offset for ‘s’ (popcount).
0001001000000…..0001000… = 2 (offset = 2)
d g s
DPI.43
Wu-Manber Algorithm
Proposed by Sun Wu and Udi Manber, in
1994 A very popular multiple pattern-
matching algorithm. Have better performance than AC in
average case. Uses the “skip” idea of Boyer-Moore to
multiple patterns. (Bad character shift) Looking text in blocks instead of one by
one char. Hash functions and tables are used.
DPI.44
The preprocessing stage
m: Length of the shortest pattern (LSP) Consider only the first m characters of
each pattern. For k patterns, total characters size M
= k x m . Three tables to build:
a SHIFT table, a HASH table, and a PREFIX table.
a b c d e f g h1 2 3 4 5 6 7 a 2 c 4
m = 4
DPI.45
SHIFT table SHIFT table:
Let B be the size of the block, each string of size B in the alphabet is mapped to an index to the SHIFT table by a hash function. Hash (X) = IX doesn’t appear ; Shift[i] = m-B+1
X appears ; Shift[i] = m-q ; q is the position that X ends in some pattern.
Set to the minimum value. Maximum shift distance is
m – B + 1, where m = Length of shortest pattern, B = window size
DPI.46
Patternsa b c d1 2 3 4a 2 c 4
Shift Table
ab 2
bc 1
cd 0
12 2
23 1
34 0
a2 2
2c 1
c4 0
* 3
SHIFT table example
m=4, B = 2
Maximum shift distance = m – B + 1 = 4 – 2 + 1 = 3
m – q = 4 – 2 = 2 m – q = 4 – 3 = 1 m – q = 4 – 4 = 0
DPI.47
Shift Table example
m=5, B = 2
m – q = 5 – 4 = 1m – q = 5 – 2 = 3Shift(ve) = min {1,3} = 1
m – q = 5 – 2 = 3m – q = 5 – 5 = 0Shift(sh) = min {0,3} = 0
m – q = 5 – 3 = 2m – q = 5 – 5 = 0Shift(er) = min {0,2} = 0
DPI.48
HASH table
The same hash function as SHIFT table.
Map the last B chars of all patterns. Hash [i] contains a pointer that:
Points to a list of pointers of the patterns whose last B characters hash into i.
is an index to the PREFIX table.
DPI.49
HASH table example
DPI.50
PREFIX table
Map the first B’ chars of all patterns into the PREFIX table.
Contains the hash value of each prefix of size B’.
Used to filter patterns whose suffix is the same but with different prefix.
DPI.51
SHIFT[i]
HASH table
SHIFT table
PREFIX table
Hash = i
Pattern pointer list
SHIFT[i+1]
Hash = i+1
Data Structures used in Wu-Manber Algorithm
0
0
P1 P2 P3 P4
DPI.52
Scanning steps
1. Compute a hash value h based on the current B characters from the text (starting with ).
2. If Shift[h] >0, shift and back to 1. Otherwsie,
3. Compute the hash value of the prefix of the text; call it text_prefix.
4. Check for each p, Hash[h] <= p < Hash[h+1] whether Prefix [p] = text_prefix . When they are equal, check the actual pattern against the text directly.
tt mBm
1
DPI.53
An Example of MWM Algorithm
Patterns
a b c d
1 2 3 4
a 2 c 4
Shift Table
ab 2
bc 1
cd 0
12 2
23 1
34 0
a2 2
2c 1
c4 0
* 3
Hash Table
ab
…
12
…
a2
…
abcd
1234
a2c4
Text: 1 2 2 c a b c 1 2 3 a 2 a 2 c 4
Window size (B) = 2
DPI.54
Multi-Pattern Matching Algorithms Some algorithms apply the BM-based
algorithms iteratively for each pattern to solve the multi-pattern matching problem.
These algorithms were originally designed for single-pattern matching.
They were designed for text file searching in computers, where the length of an input string is typically larger than that of a pattern string.
However, the input string for multi-pattern matching across a network is a packet, whose length is much smaller than the sum of the length of all patterns.
Moreover, the pattern set (|P|) is generally very large in a network system.
DPI.55
Multi-Pattern Matching Algorithms BM-based approaches are not applicable for
packet inspection, because of the different pattern length, scale of the pattern database and memory capacity.
Cons Space complexity: O(|Λ|) -> O(|Λ|x|P|) Time complexity: O(|T|+|p|) ->
DPI.56
BM-like Algorithms Boyer-Moore-Horspool algorithm (BMH),
which is a variant of BM, slightly modifies the bad character heuristic to build a single skip table Shift Table
BMH is the best average-case algorithm for general pattern lengths in the single-pattern matching
The Brute Force method outperforms the BM-based approaches in the extreme cases of pattern length less than three characters or close to the length of the input string
13.7% of the patterns in the Snort pattern set have pattern lengths of less than three characters
DPI.57
BM-like Algorithms
Fisk and Varghese’s method (FV) groups all patterns to precompute the safety shifts;
Wu and Manber’s algorithm (WM) groups D-grams of the prefixes of all patterns to build a shift table based on the bad gram heuristic, where each entry contains the safety shift of each D-gram.
DPI.58
BM-like Algorithms
D
|Λ|D
Liu et al. presented an algorithm (WM-PH) that groups the prefixes of all patterns to build a large hash table, where the length of the prefix is D Bad-Character heuristic J(.) function
Cons Large table Minimum pattern
length Complicated pattern update
Usually D=3Memory = 16M entries’ size(256)3 = (28 )3 = 224 = 16M
DPI.59
Ways to Improve an Algorithm
Reduce the number of instructions Reduce the number of memory
accesses Embedded system
– 1 cycle for XOR– 10~150 cycles SRAM latency– 50~250 cycles DRAM latency
General CPU– 0.5 cycles for ADD– 80~200 cycles for one off-chip access
The rate of improvement in processor speed exceeds the improvement in memory speed.
Reduce the required memory size
DPI.60
Other Pattern Matching Algorithms
Hierarchical Matching Algorithm (HMA) for Intrusion Detection Systems
Enhanced Hierarchical Matching Algorithm (EHMA) for Intrusion Detection Systems
AC Algorithm with a Magic Number (ACM)
Pre-filtering Algorithms for Pattern Matching
DPI.61
Hierarchical Matching Algorithm (HMA)
A multiple-pattern matching algorithm Be used to search the input string for all
occurrences of any pattern in the database.
Match a packet against a set of patterns simultaneously, instead of one by one.
No constraint on the minimum pattern length.
Contains two tables The first-tier table is very small and
stored in the on-chip cache, which acts as a pre-filter.
The second-tier table is stored in the external memory, which will be accessed only when the first-tier table is matched.
DPI.62
HMA – Hierarchical Architecture
...
0x00
0x01
…0x62
0x63
0x64
0x65
0x66
0x67
0x68
...
0xFF
...
0x00
0x01
...
0xFF
...
0x00
0x01
...
0xFF
...
0x00
0x01
...
0xFF
...
0x00
0x01
...
0xFF
Off-Chip Memory
Processing Engine
On-Chip Memory
Processor
H1
H2
Memory access prob. f
Fetch a cluster of patterns and do string comparison
First-tier Table
Second-tier Table
DPI.63
HMA - Challenges
L1 memory Very small:
2~4 KB in network processors 1~2 MB in some high-end hardware
designs– Linking many memory blocks
degrades the chip performance – Power issues
How to obtain a small Tier-1 table (H1)?
DPI.64
Frequent Common-Code Searching Algorithm (FCS)
To lower the memory access probability To find a small set of codes to represent a
set of patterns. In other words, to get a set of signatures of the
patterns: F (each signature is one byte in this case)
F: Used to build the first-tier table (H1) The main idea of FCS
If one code occurs more frequent than others in the patterns, we can choose it as one of the frequent common codes.
Then we can get a smaller set of frequent common codes.
DPI.65
Pattern Spectrum
1-gram
2-gram
The pattern spectrum when |P| = 1200 from Snort’s rule set.
DPI.66
Frequent Common-Code Searching Algorithm (FCS)
red
orange
green
yellow
black
Patterns
a
e
# of entries =The size of the alphabetset
orange
black
red
green
yellow
FCS
F={a,e}
H1 First-tier Table
H1 is a mapping table which is
indexing to the corresponding cluster
DPI.67
Clustering Patterns
a
e
orange
black
red
green
yellow
orange
black
red
green
yellow
(a,n)
(a,c)
(e,d)
(e,e)
(e,l)
Cluster Balancing Strategy (CBS) Reduce collision prob. in H2
DPI.68
Clustering Patterns
a
e
orange
black
green
yellow
orange
black
green
yellow
(a,n)
(a,c)
(e,d)
(e,e)
(e,l)
seed seed
CBS
DPI.69
HMA – on-line stage
.
.
.
H1(a)
H1(b)
.
.
H1(e)
H1(f)
H1(g)
H1(h)
H1(i)
.
.
.
.
H1(z)
On-chip Cache Memory (H1)
pid fid
(a,a)
(a,b)
(a,c)
(a,n)
(a,z)
.
.
.
red
yellow
(e,a)
(e,b)
(e,c)
(e,l)
(e,z)
.
.
.
.
.
(e,d)
.
.
green(e,e)
black
.
.
.
orange
.
.
.
.
.
.
.
.
.
External Memory H2
1 memory access1 string comparison(5 character comparisons)
it is black
black
Cluster-wise matching
DPI.70
HMA – Preliminary Results
Memory Size
|F|
DPI.71
HMA – Number of L2 Memory Accesses
Filter out about 90%–63% payloads in the Tier-1
Random Input
DPI.72
HMA – Memory size and Memory latency
HMA BM-PH BMH AC-C
Memory 326.75 KB 16.013 MB 313.2 KB 439 KB
DPI.7373
HMA – Time (Cycles)
DPI.74
Other Pattern Matching Algorithms
Hierarchical Matching Algorithm (HMA) for Intrusion Detection Systems
Enhanced Hierarchical Matching Algorithm (EHMA) for Intrusion Detection Systems
AC Algorithm with a Magic Number (ACM)
Pre-filtering Algorithms for Pattern Matching
DPI.75
Enhanced HMA (EHMA) – gram?
Definition: gram A gram: a group of characters D-gram: D is the num. of grouped
characters A string = a set of grams
“green” = {‘gr’, ‘re’, ‘ee’, ‘en’} when D = 2.
DPI.76
EHMA - Strategies Enhance HMA
Frequent-common Code Search (FCS) -> General Frequent-common Gram Search
(GFGS) Obtain frequent-common gram set: F
Cluster Balancing Strategy (CBS) Add
Sampling Window Searching F in the sampling window
Safety Shift Strategy Frequency-based bad gram heuristic
– Modify bad grouped character heuristic– Frequent-common gram
Shift value in H1 and H2 table
DPI.77
EHMA –State Diagram
Tier-1 Matching
Tier-2 Matching
Have next pattern in the cluster
Hit
No next pattern in the cluster
HMA EHMA
DPI.78
EHMA – Off-line Stage
H1
H2
DPI.79
EHMA - GFGS
B1 = 1, B2 = 1, m = M = 6, W = 3
F={e, h}
DPI.80
EHMA
0
1
4
1~4
4
H1(a)
H1(b)
H1(c)
H1(d)
H1(e)
H1(f)
H1(g)
H1(h)
H1(i)
.
.
.
.
.
H1(z)
0
On-chip Cache (H1)
External Memory (H2)
shift fid
(e,a)
(e,z)
.
.
.
.
(e,e)
Pe
5..
5
5(h,a)
(h,i)
(h,t)
Ph
(h,e)
W=3, B1=1, m=6
5..
2
2 firefighter(e,f)
.
.
5..
2 farmer(e,r)
(e,t) 45..
4
teacher1
5..
4
5..
2 architect
.
.
.
.
W=3, B1+B2=2, m=6
1
4
1
4
.
.
5(h,h)
(e,g) 5
(e,h) 5
(e,s) 2 actress
(h,f) 4
(h,z)
5..
5
.
.
1
shift data
shift data
actressteacherarchitectfirefighterfarmer
256 entries
DPI.81
EHMA – on-line stage
Read Next Gram
Skip?
Read Entry from External Memory
Next Pattern?
Skip on Input
StringComparison and Output
NO
NO YES
YES
External Memory
Processing Unit
Tier-1 Matching Tier-2 Matching
pid = NULL?
YES
NO
Output Matched
Single-gram
Input String
DPI.82
EHMA - Safety Shift Strategy
As long as no F is matched in input strings, then no pattern exists.
Therefore, if no F is missed, then no pattern will be missed.
The basic concept of the safety shift strategy is that: if x is not a gram of any pattern, and any
suffix of x is not any prefix of any pattern in P, then it is safe to shift m when x is scanned;
otherwise, the number of safety shifts is the offset between the rightmost occurrence position of x in any p and the position of the frequent-common gram (f) in any p nearest to x.
DPI.83
EHMA - Examples
1 L1 table lookup0 L2 access0 string comparisonFor text of 8 chars
4 L1 table lookups1 L2 access1 string comparisonFor text of 12 chars
M=m=6, W=3, B1=B2=1
(M-W+1)-th
Best Case
DPI.84
EHMA – Number of L2 Memory Accesses
0.01
0.1
1
10
100
1000
EHMA HMA WM-PH AC-C BMH BMH-O
Ave
rage
Num
ber o
f Ext
erna
l Acc
esse
s
200 patterns 1200 patterns
0.01
0.1
1
10
100
1000
EHMA HMA WM-PH AC-C BMH BMH-O
Ave
rage
Num
ber o
f Ext
erna
l Acc
esse
s
200 patterns 1200 patterns
λ=0 λ=4
Filter out 81%-94% payload in Tier-1
DPI.85
Other Pattern Matching Algorithms
Hierarchical Matching Algorithm (HMA) for Intrusion Detection Systems
Enhanced Hierarchical Matching Algorithm (EHMA) for Intrusion Detection Systems
AC Algorithm with a Magic Number (ACM)
Pre-filtering Algorithms for Pattern Matching
DPI.86
AC with a Magic Number (ACM)
AC with Bitmap (Nathan Tuck et al in Infocom 2004)
4
1
5
6
2
8
7
3
90
next_flag
fail_ptrnext_start
pattern_list
0 16 32
44 bytes per state(256 bits = 32 bytes32 bytes + 12 bytes = 44 bytes )
DPI.87
AC with a Magic Number (ACM)
For example, if the input symbol is ‘s’, as the ASCII code of ‘s’ is 0x76, ACB matching has to read 118 bits (7x16 + 6 =118) and accumulate these bits to obtain the offset for ‘s’ (popcount).
00000001000000…..0001000… = 2 (offset = 1)
h s However, the computation load of
popcount function is heavy.
DPI.88
AC with a Magic Number (ACM): Path Decoder R?
{h, s, i} {0, 1, 2}
DPI.89
ACM: A Magic Number?
Assume there is a magic number
and define the function as
i=1, 2, ..., k
Does the magic number exist?
: % ( ) 1i if a i
1 2, ,..., {0,1,..., 1}ka a a k
{h, s, i} {0, 1, 2}
DPI.90
Chinese Remainder Theorem (CRT)
Let , where mi are integers and relatively prime; that is, gcd(mi, mj) = 1 for 1 ≦i,j ≦k , and i≠ j. Let x1, x2,..., xk be integers. Consider the system of congruences (同餘方程組 ):
where X and xi are said to be congruent modulo mi, for 1 ≦i≦k. Then there exists exactly one X and
The gcd(a, b) means the greatest common divisor of a and b.
1
k
ii
M m
1 1(mod )X x m
2 2(mod )X x m
(mod )k kX x m
0,1,..., 1X M
≡
DPI.91
Chinese Remainder Theorem (CRT)
Assume M = 3x5x7 = 105
X ≡2 (mod 3)
X ≡3 (mod 5)
X ≡2 (mod 7)
Then X = 23
DPI.92
ACM: CRT
X%m1 = x1%m1
X%m2 = x2%m2
X%m3 = x3%m3
mi are integers and relatively prime
X exists and X<m1*m2*m3
xi are integers
X {x1, x2, x3}
DPI.93
ACM: CRT
X%m1 = x1%m1
X%m2 = x2%m2
X%m3 = x3%m3
X%m1 = x1
X%m2 = x2
X%m3 = x3
xi < mi
: % ( ) 1i if a i
mi are integers and relatively prime
DPI.94
ACM: find R
Therefore, if let the function f number the symbols by prime numbers,
then by CRT, we know the magic number exists.
: % ( ) 1i if a i
1 2 1 2, ,..., { , ,..., }fk ka a a m m m
DPI.95
ACM: find X
Chinese Remainder Theorem Algorithm. Let zi = M/mi and yi = zi
-1 (mod mi) for each i = 1, 2,..., k, where zi
-1 means the multiplicative inverse of zi. (Note that zi
-1 exists if gcd(zi, mi) = 1.) Then the solution to the congruence system of the Chinese Remainder Theorem is 1
( ) modk
i i ii
X x y z M
DPI.96
ACM: an example
{h, s, i} {2, 3, 5}f
By CRT algorithm, the magic number is 22.
: % ( ) 1i if a i
{0, 1, 2}CRT
X%2=0
X%3=1
X%5=2
DPI.97
ACM: an example
{h, s, i} {2, 3, 5}f
magic number is 22.
22%2=0
: % ( ) 1i if a i
{0, 1, 2}CRT
22
DPI.98
ACM: an example
{h, s, i} {2, 3, 5}f
magic number is 22.
22%2=0
22%3=1
22%5=2
: % ( ) 1i if a i
{0, 1, 2}CRT
22
DPI.99
ACM: Magic Structure
next_flag
fail_ptrnext_start
pattern_list
0 16 32
MagicNumber
52 bytes per state(256 bits = 32 bytes32 bytes + 12 bytes + 8 bytes = 52 bytes )
DPI.100
ACM: on-line stage
next_flag
fail_ptrnext_start
pattern_list
0 16 32
MagicNumber
0a Not flagged
nextState = fail_ptr
DPI.101
ACM: on-line stage
next_flag
fail_ptrnext_start
pattern_list
0 16 32
MagicNumber
S flagged1
DPI.102
ACM: memory architecture
{s, h} {11, 3}f
magic number is 22.{0, 1}CRT{e, i} {2, 5}f
magic number is 6.
{0, 1}CRT
DPI.103
ACM: The Magic Number Heuristic
Heuristic: If there is only one child, then the magic number will be zero.
Observing the ACM state machine, we can find that approaching the leaves, more and more states have only one child state.
The forwarding path can be obtained directly without any computation.
Magic Structure is very efficient for sparse graph 0.078% nodes of ACM have branches more than 15
when including 1200 patterns from Snort
nextState = next_start0 4
1
5 8r
2 3e
6 7s
9seh
h
i
s
Not {h, s}
DPI.104
ACM: Memory Size
519.1 438.8
10252.9
1912.4
81.997.2
1
10
100
1000
10000
100000
ACM ACB ACO
Mem
ory
(KB
)
1200 patterns 200 patterns
Rule Database: Distinct patterns from Snort
1.183%
1.187%
5%
5%
DPI.105
ACM: Time
0
20
40
60
80
100
ACM ACB ACO ACO-100
Tim
e (c
ycle
s)
1200 patterns 200 patterns
Software: The number of instructions per clock for add, mov, mul, cmp, bt, and mod is 3, 3, 1, 3, 3, 1/71 respectively.
5.345.67
DPI.106
ACM: Total Cost An evaluation function C including the
memory and the execution time requirements: C = CM × CT.
CM: memory cost
CT: time cost
DPI.108
Other Pattern Matching Algorithms
Hierarchical Matching Algorithm (HMA) for Intrusion Detection Systems
Enhanced Hierarchical Matching Algorithm (EHMA) for Intrusion Detection Systems
AC Algorithm with a Magic Number (ACM)
Pre-filtering Algorithms for Pattern Matching
DPI.109
Pre-filter: Search Filter Model
All the substrings that filtered by the filter are clear and impossible to contain any of the defined patterns.
And those substrings passed to the pattern matching algorithm may or may not contain pre-defined patterns.
Thus, the search filter may generate false positive but not false negative. The false positive here refers to the
case that a substring without any pre-defined patterns is falsely detected and accepted as with.
An exact string matching mechanism is essential for finding out which patterns are included in the accepted substring.
DPI.110
EnteringInput string
Pre-Filtering Algorithm
String Matching Algorithm
Accepted substrings, may or may not contain pattern
Bypassed substrings, without any pattern
Patterns found in the substring
EnteringInput string
Pre-Filtering Algorithm
String Matching Algorithm
Accepted substrings, may or may not contain pattern
Bypassed substrings, without any pattern
Patterns found in the substring
Pre-filter: Search Filter Model
DPI.111
Super-Symbol Filter The basic idea of the proposed Super-
Symbol Filter (SSF) algorithm is to treat two bytes data as a super-symbol, and the using of bitmap to indicate the occurrence of each super-symbol in the pre-defined patterns.
Bitmap
…
000010101010101100
ZZ
OO
OD
FO
DE
CO
AB
AA
0x00 0x00
0x41 0x41
0x41 0x42
0x43 0x4F
0x44 0x45
0x46 0x4F
0x4F 0x44
0x4F 0x4F
0xFF
0xFF
0x5A 0x5A
… … … … … … ……
000010101010101100
ZZ
OO
OD
FO
DE
CO
AB
AA
0x00 0x00
0x41 0x41
0x41 0x42
0x43 0x4F
0x44 0x45
0x46 0x4F
0x4F 0x44
0x4F 0x4F
0xFF
0xFF
0x5A 0x5A
… … … … … … …
P2=CODE P3 =FOODP1=AAB P2=CODE P3 =FOODP1=AAB
For example, for the 8-bit ASCII-code, there are 65536 combinations of two bytes data, and a bitmap vector of 65536 entries is used.
Match Vector Constructing
DPI.112
Filtering phase in SSF-1 Algorithm
Input String Text= ABOD CODING IS FOOD
DOC DOOFBA DO
Bitmap
AA
AB
CO
DE
FO OD
OO
ZZ
0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0
AB BO OD D△ △C CO OD DI IN NG G△ △I IS S△ △F FO OO OD
AB BO OD D△ △C CO OD DI IN NG G△ △I IS S△ △F FO OO OD
1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1
DPI.113
SSF-2 Algorithm To have better accuracy and less number of false
positives, the extended SSF-2 algorithm, two match vectors are employed.
The First Match Vector (FMV) is used for the super-symbols being conjugated by the first two symbols in each of the patterns.
The Rest Match Vector (RMV) is used for the rest super-symbols in the patterns except those in the FMV.
FMV bitmap
…
000000001000100100
ZZ
OO
OD
FO
DE
CO
AB
AA
0x00 0x00
0x41 0x41
0x41 0x42
0x43 0x4F
0x44 0x45
0x46 0x4F
0x4F 0x44
0x4F 0x4F
0xFF
0xFF
0x5A 0x5A
… … … … … … ……
000000001000100100
ZZ
OO
OD
FO
DE
CO
AB
AA
0x00 0x00
0x41 0x41
0x41 0x42
0x43 0x4F
0x44 0x45
0x46 0x4F
0x4F 0x44
0x4F 0x4F
0xFF
0xFF
0x5A 0x5A
… … … … … … …
RMV bitmap
000010100010001000
ZZ
OO
OD
FO
DE
CO
AB
AA
000010100010001000
ZZ
OO
OD
FO
DE
CO
AB
AA
P2=CODE P3 =FOODP1=AAB P2=CODE P3 =FOODP1=AAB
DPI.114
SSF-2 Algorithm The algorithm looks up the FMV and RMV and
detects whether the corresponding bit of each super-symbol is 1.
Since “AB” and “OD” are not the beginning super-symbol of any patterns (by checking FMV), the filter algorithm only outputs two substrings “COD” and “FOOD”. And only one substring “COD” is false positive in this case.Input String Text = ABOD CODING IS FOOD
DOC DOOF
000000001000100100
ZZ
OO
OD
FO
DE
CO
AB
AA
OO ODFO△ FS△IS△ IG△NGINDIODCO△ CD△ODBOAB
OO ODFO△ FS△IS△ IG△NGINDIODCO△ CD△ODBOAB
1 11000000001100000
000010100010001000
ZZ
OO
OD
FO
DE
CO
AB
AA
FMV bitmap
RMV bitmap
Input String Text = ABOD CODING IS FOOD
DOC DOOF
000000001000100100
ZZ
OO
OD
FO
DE
CO
AB
AA
OO ODFO△ FS△IS△ IG△NGINDIODCO△ CD△ODBOAB
OO ODFO△ FS△IS△ IG△NGINDIODCO△ CD△ODBOAB
1 11000000001100000
000010100010001000
ZZ
OO
OD
FO
DE
CO
AB
AA
FMV bitmap
RMV bitmap
DPI.115
Evaluation To evaluate the scalability and flexibility, the popular
Snort IDS signatures are employed. In case most bits of the bitmap are set as ‘1’, we can
expect that the SSF filtering performance will be impacted dramatically as the “hit rate” will be very high.
Fortunately, by tracking the growing paths of Snort rule patterns, the percentage of setting bits for the MV, FMV, and RMV is still very small (less than 5%). Thus, the proposed approaches have a great chance to adopt the fast growth of Snort releases.
Number of Released
Patterns
SSF-1MV bitmap
SSF-2FMV
bitmap
SSF-2RMV
bitmap
Snort-2.0 2066 3213 695 3027
Snort-2.1 2617 3478 813 3296
Snort-2.2 2664 3575 835 3382
Snort-2.3 2679 3611 845 3413
Snort-2.4 2680 3611 845 3413
DPI.116
Defcon9 Trace
Filter-AlgorithmPassed by
Filter < bytes >
Filter outpercentage
Filter cost time< μs >
ACsearchcost
time
<μs>
Total cost time
< μs >
Throughput<Mbps>
Defcon-1# of matched
patterns : 377,508
times(9,846,572 bytes)
PBF 1,173,918 88% >10^7 >10^7 <10
IDP 9,782,654 0.7% 126,439 550,468 676,907 116
AC 9,846,572 0% 0 558,079 558,079 141
SSF-1 2,916,802 70% 122,841 212,307 335,148 250
SSF-2 1,917,544 81% 130,809 160,872 291,681 270
Defcon-2# of matched
patterns : 147,843
times(9,849,836 bytes)
PBF 492,491 95% >10^7 >10^7 <10
IDP 9,777,406 0.8% 125,901 529,602 655,503 120
AC 9,849,836 0% 0 537,297 537,297 146
SSF-1 1,868,185 81% 118,264 119,343 237,607 332
SSF-2 879,353 91% 127,651 68,628 196,279 401
Defcon-3# of matched
patterns : 57,458 times
(9,852,342 bytes)
PBF 197,046 98% >10^7 >10^7 <10
IDP 9,775,924 0.8% 125,810 512,169 637,970 123
AC 9,852,342 0% 0 513,081 513,081 153
SSF-1 1,350,541 86% 117,000 80,374 197,374 400
SSF-2 391,024 96% 126,523 29,739 156,262 504
Performance
Pentium-4 3.0 GHz personal computer with 1MB level-2 cache, and installed with Intel’s VTune tool
Parallel Bloom Filter (PBF), Database Processor (IDP)
DPI.117
Filter Percentage & Throughput The filtering effectiveness of IDP scheme is pretty
bad and is not capable to handle Snort’s patterns. This is due to the bitmap used in the IDP scheme has only 256 entries for one byte symbol.
And most of the entries of are set as “1” for the Snort’s patterns.
Both PBF and SSF schemes are less sensible to the growth of patterns and have a filtering percentage around 80-98%.
0.7 0.8 0.8
7081 8681
91 9688
95 98
0
20
40
60
80
100
120
Defcon-1 Defcon-2 Defcon-3
Case of Defcon9 packets
Filt
er P
erce
ntag
e (
% )
IDP SSF-1 SSF-2 PBF
DPI.118
Filter Percentage & Throughput The PBF is only suitable for hardware-based
implementation, the throughput of PBF is less than that of AC.
We can see that for the Defcon-1, the system throughput is around double speed-up (270Mbps vs 141Mbps) compared to that of original AC algorithm, and for Defcon-3, the system throughput is even more than three times speed-up (504Mbps vs 153Mbps).
The proposed SSF schemes consume far less memory (cache-resident).
0
100
200
300
400
500
600
Defcon-1 Defcon-2 Defcon-3Case of Defcon9 Trace
Thr
ough
put (
Mb
/ s ) PBF + AC IDP + AC AC SSF-1 + AC SSF-2 + AC
DPI.119
Pattern Matching Technologies
Pattern-Matching Algorithms Software Based
Boyer-Moore Aho-Corasick (AC) Wu-Manber HMA/EHMA/ACM Pre-Filtering
Hardware Based Reconfigure Hardware (FPGA) Bloom-Filter TCAM-based GPU-based
DPI.120
Reconfigure Hardware (FSM)
Implement the AC FSM in configurable Logic Elements (LEs) of FPGA.
Achieve multiple gigabit performance. (Depends on the FPGA model)
A powerful FPGA is necessary to accommodate thousands of patterns, so that it’s not practical and visible in commercial market.
DPI.121
FPGA-based pattern matching
FPGA-based
DPI.122
Shortcoming Non-scalable Limited rules
FPGA-based pattern matching
DPI.123
Bloom Filter
Given a string X, the Bloom filter computes k hash functions on it producing k hash values ranging from 1 to m. The same procedure is repeated for all the members of the pattern set.
The input text is verified by generating k hash values in the same way. If at least one of these k bits is found not set then the string is declared to be impossible to match.
Patterns in Length n are grouped into Bn.
DPI.124
Bloom Filter (Cont.)
1 2 3 4 5 6 7 8 9 …
Payload Stream
A B C D E F G H I J
……B2 B3 B4 Bw
False positive :
Mim f = (0.5)K, while m = (k x n) / Ln2
So, total space, sum(Bi) = m x (w - 1)
if k = 1, n = 2048, m = 3072 bits
k = 1, n = 3072, m = 4608 bits
if k = 4, f = 0.0625
k = 5, f = 0.0313
k = 6, f = 0.0156
Bloom Filter (B4)
Bloom Filter (B3)
Bloom Filter (B2)
1
1
1
1
0 m
0 m
0 mH1
H2
H3
Hk0 m
1
1 1
11
1 1
1
Group signature by length :
G2 (X)
G3 (X)
G4 (X)
K Hash functions H1, H2, …, Hk
DPI.125
Bloom Filter (Cont.)
Bloom Filter (B4)
Bloom Filter (B3)
Bloom Filter (B2)
1
1
1
1
0 m
0 m
0 mH1
H2
H3
Hk0 m
1
1 1
11
1 1
1
Group signature by length :
G2 (X)
G3 (X)
G4 (X)
K Hash functions H1, H2, …, Hk
P1: abP2: aaP3: xy
Text: accdefg
DPI.126
TCAM(Ternary-CAM)-Based Pattern Matching
Unlike RAM, CAM takes data as input and then returns the matched address in one clock cycle.
Brute-force methodology needs n clock cycles to find a pattern in a text with length n.
Here we propose a methodology to speed up the lookup performance up to d times. However, the required space is d times also.
Even though, a 2-Mbit TCAM (around $40) can accommodate over 3,000 patterns in multiple gigabit performance.
DPI.127
TCAM TCAM stores data with three logic
values: ‘0’, ‘1’, ‘X’ (don’t care) Multiple match modes are needed.
DPI.128
Ternary-CAM (TCAM)
Each cell takes three logic states ‘0’, ‘1’, and ‘?’(don’t
care)
Fully associative memory: compares input string with all the entries in parallel If multiple matches,
report index of the first match
Current TCAM technology Fast Match Time: 4-8 ns Size: 1M
1K entries * 1K bytes per entry
2K entries * 512 bytes per entry
k bytes
> 1K
entries
A B C D
C D E F
A B ? ?
MatchA B C ?
Input
TCAM
DPI.129
Pattern Matching with TCAM
Put all the patterns into the TCAM Assume patterns are less or
equal to TCAM width If shorter than TCAM width,
pad with ‘?’ Order the patterns
according reverse lengths When matching entry ABC,
report matching of both pattern ABC and AB
Shift one byte each time
k bytes
> 1K
entries
A B C D E F
C D E F
A B ? ?
MatchA B C ?
Input
TCAM
DPI.130
TCAM Search Analysis
Scan speed: 4-8 ns per TCAM lookup, shift one
byte at a time 1-2 Gbps worst case scan rate
Able to report occurrences of all the patterns in the input string
Limitation: require all the patterns to be shorter or equal than the TCAM width
DPI.131131
TCAM application
Packet switch
DPI.132
TCAM application
Address resolution
DPI.133
BM-Based TCAM Solution (cont.)
Shift Table
Patterns shift
a b c d e f g h i 0
1 2 3 4 5 6 7 8 9 0
h i g h s p e e d 0
* a b c d e f g h 1
* 1 2 3 4 5 6 7 8 1
* h i g h s p e e 1
* * a b c d e f g 2
* * 1 2 3 4 5 6 7 2
* * h i g h s p e 2
* * * * * * * * a 8
* * * * * * * * 1 8
* * * * * * * * h 8
* * * * * * * * * 9
Every time it searches 72-bit text string (Ti…Ti+7) in this table, and the windows slides k bytes if the text string matching an entry in Gk.
This design achieves n/9 comparisons in best case. i.e, 8Gbps throughput in 125MHz.
Example: Patterns Abcdefghi 123456789 Highspeed
Input : High12345678123456789
“high12345” match ****12345 (G4)
“123456781” match ********1 (G8)
“123456789” match 123456789 (G0)
…..
72-bits
G0
G1
G2
G8
DPI.134
GPU-based Content Inspection Technology
Graphics processing unit (GPU) GPU are capable of increasing
powerful computation The proposed scheme can exploit the
resources of the GPU, which is originally idle at most time periods, to accelerate string pattern matching
Parallel stream processors
DPI.135
NVIDIA GPU fragment shader GFLOPS. (Courtesy Ian Buck )
NV40
NV30
NV35
G70G70-512
G71
CPU
0
50
100
150
200
250
300
Jan-03 Jun-03 Apr-04 May-05 Nov-05 Mar-06
Time
GFL
OPS
NVIDIA GPU
x86 CPU
GPU-based Content Inspection Technology
DPI.136
AC String Matching Algorithm Deterministic finite state
automatonnext move function
E H I R S
0 0 1 0 0 3
1 2 1 6 0 3
2 0 1 0 8 3
3 0 4 0 0 3
4 5 1 6 0 3
5 0 1 0 8 3
6 0 1 0 0 7
7 0 4 0 0 3
8 0 1 0 0 9
9 0 4 0 0 3
DPI.137
GPU Architecture NVIDIA GeForce 7900
GTX 8 vertex shaders 24 fragment shaders
Host/FW/VTF
Cull/Clip/Setup
Shader Instruction DispatchZ-Cull
Fragment Crossbar
L2 Tex
Memory Partition
Memory Partition
Memory Partition
Memory Partition
DRAM(s) DRAM(s) DRAM(s) DRAM(s)
DPI.138
GPU Architecture (cont.)
The GPU writes results to the rendering target and draw them on screen at the end of the rendering pipeline
Texture memory
RasterizerCoordinate Transformation
Texture Mapping Pixel TestingLightingVertex Information
Rendering Target
Texture Memory
Vertex Shader
Fragment Shader
Render-to-Texture
DPI.139
Model framework
Three data structures are maintained in the GPU texture memory Automata
texture Text texture State texture
Current State texture
Next State texture
Finite State Automata
TextCurrent
State TableNext State
Table
Texture Memory
Selector
Vertex Shaders
fp 32 Shader Unit
fp 32 Shader Unit
Branch Processor
Fog ALU
Fragment Shder
Texture Cache
Geometry Information
DPI.140140
Data structure Construction phase
The size of symbol space is k The finite state machine contains n
states
36
37
38
0
n-2n-1
0 1
39
k-1
k
4096
(0,0) Automata TextureDFA Table
(4095,0)
(0,k)
DPI.141
Data structure (cont.)
RGBA pixel format The state information of a FSM is
stored as R values in pixels
R G B A (x, y)36
37
38
0
k-1
k
00 01 5a fd fe ff
0
36
0
39 0
((k+1)*256)1/2
((k+1)*256)1/2
DPI.142
R G B A (x, y)36
37
38
0
k-1
k
00 01 5a fd fe ff
0
36
0
39 0
((k+1)*256)1/2
((k+1)*256)1/2
RGBA pixel format The state information of 4 FSMs is
stored as RGBA values in pixels
Data structure (cont.)
DPI.143
Control flow
qurt(N)
qurt(N)
N byte streams
System Buffer
Finite State Automata
TextCurrent
State TableNext State
Table
Texture Memory
Selector
Vertex Shaders
fp 32 Shader Unit
fp 32 Shader Unit
Branch Processor
Fog ALU
Fragment Shder
Texture Cache
Geometry Information
row-major
Text Texture
GPU
DPI.144
The fingerprint of finite state automata
DPI.145
Performance analysis
Memory size for 64K states
(Mbytes)
Memory size for snort 2.4 (Apr.06) (Mbytes)
GeForce 6200
(NV44)
GeForce 6600 GT (NV43-GT)
GeForce 6800 Ultra (NV40)
GeForce 7800 GT (G70)
4 fragment shaders
8 fragment shaders16 fragment
shaders24 fragment
shaders
Strategy 1 256 84.36 933.36 2666.64 4266.64 6400.00
Strategy 2 64 21.09 622.24 1777.76 2844.48 4266.64
Strategy 3 32 10.54 700.00 2000.00 3200.00 5485.68
DPI.146
Agenda
Introduction Pattern Matching Technologies Content Inspection Co-
processors Flow-based Application
Identification Layer 7 Security Switches Conclusion and Open Issues
DPI.147
Considerations when choosing a content coprocessor
Performance Size of supported ruleset
Clam AV 0.7.2 has 22,925 signatures (1.2M rule bytes)
Snort 2.2.0 has 2,085 signatures (24K rule bytes)
SpamAssassin 3.0.1 has 298 signatures (17K rule bytes)
Regex (Regular Expression)ABC * BCD 123????456 (I|You) Love NYOffset/DepthCase sensitive/non-case sensitive
DPI.148
Considerations when choosing a content coprocessor (Cont.)
Work conjunction w/ header fieldsOr Partitionable rules database
Max length of patterns and Max length of search text
Algorithmic attacksConnectivity
DPI.149
Considerations when choosing a content coprocessor (Cont.)
Look-Aside or Flow Through
Size and Cost of External Memory
Multi-packet match support
Auxiliary functionLook-Aside Mode
Flow Through Mode
MII#1 MII#2
DPI.150
Works of a Content Coprocessors
As its name, deal with content (payload) part of network packets.
Target applications:IDS/IPSAV GatewayMail SpamLayer-7 Switch (L7 Load Balance)UTM
DPI.151
Computing Power Analysis
ClamAV90% Pattern Matching
4% Inflating2% De-Mimeing64-ing
2% Other Mime2% Program Overheads
SpamAssassin75% Pattern Matching
20% Program Overheads
5% Decode
DPI.152
Computing Power Analysis
Snort62% Pattern Matching17% Header Classification
13% IP Classification8% Other Matching
DPI.153
PMC-Sierra PMC2329 ClassiPITM
Debut in 2001. Look-Aside Coprocessor (100MHz,
32/64-bit synchronous bus) Supports the implementation of
several protocols in the same equipment to achieve wire-speed performance at Gigabit/OC-48 rates.
DPI.154
ClassiPITM (Cont.)
Source: PMC Sierra
DPI.155
ClassiPITM (Cont.)
Single PM2329 can store up to 16K policy rules.Each Rule is 136-bit (24-bit control fields
and 112-bit data fields) long. Supports composite rules. (up to 4)
Source: PMC Sierra
DPI.156
ClassiPITM (Cont.)
Input buffer is 256 byte * 32 segments. (8K in total)
Maximal pattern length is 192-byte long.
Only supports case-sensitive/non-case sensitive. No Regex.
Up to eight devices can be cascaded. No external memory needed. Will suffer by algorithmic attacks.
DPI.157
ClassiPITM (Cont.)Pros
Flexibility. Handle L2-L7 traffic.Suitable for IPS development.Very good performance.No extra memory needed.
ConsAlgorithmic attacks.Poor matching function.Signature size is quite constrained. (or cascaded solution is quite expensive)
DPI.158
Safenet 4850 Debut in 2003 Industry’s first full regular
expressions content inspection co-processorSupports *, +, ?, <range>, <group>, and
so on. Supports up to 32k signatures.
Each signature character occupies approximately 0.5K rule memory.
2.5Gbps/800Mbps Performance
DPI.159
Safenet 4850 (Cont.)
HOST or
NPU
HOST or
NPU
SafeXcel-4850SafeXcel-4850
ZBTSSRAM
ZBTSSRAM
ZBTSSRAM
ZBTSSRAM
ZBTSSRAM
ZBTSSRAM
ZBTSSRAM
ZBTSSRAM
Traffic Sent to Processor
Packets/Buffers sent to Classifier
Results sent back to Host/NP
Source: Safenet
32/64-bit ZBT or SyncBurst
DPI.160
Safenet 4850 (Cont.)
Look-Aside mode. (32/64-bit ZBT SRAM I/F)
Only deal w/ content. Support multi-packet match.
Using SR (State Record).
Rule memory can be divided into 32 partitions.
DPI.161
Safenet 4850 (Cont.)
An upper bound on the number of reported matches is user configurable (15 max)
Maximal size of each transfer is 16K-byte.
3 types of matching:RegularStart of MatchSub-Expression (/^CookieID=<\d+>/)
DPI.162
Safenet 4850 (Cont.)
ProsSupports full regexHigh performance (conditionally)Supports multi-packet match
ConsDatabase Size (8 * 4K * 0.5K = 16MB)
Wildcard doubles database sizeCompiler’s issue (cost, w/o incremental compiling)
DPI.163
IDT PAX.port 2500TM
Up to 2.5Gbps performance (833Mbps * 3)
Supports both Flow-Through mode and Look-Aside mode
64-bit tag and 192-bit digest Programmable Classification
Almost any format could be described by PDL
16MB Rule Memory for 1K URL rules Supports Regex
Non-Greedy (abc.*def and abcxxxdefyyydef)
Lacks of set. (positive set, negative set, etc)
DPI.164
IDT PAX.port 2500TM (Cont.)
Source: IDT
DPI.165
IDT PAX.port 2500TM (Cont.)
ProsFlexible operation modeProgrammable classificationReference Design (cowork w/ Intel IXP 2400)
ConsConstrained rule size3 pattern memories store same things.
Non-deterministic performancePer-Packet Based (poor text size and lacks of multi-packet match)
DPI.166
Sensory Networks C-Series
Engine speed is up to 1Gbps.Look-Aside Mode (PCI 64/32 or
PCI-X)CorePAKT technology
compresses ClamAV database into 18MB.
Stream-Based matching
DPI.167
Sensory Networks C-Series (Cont.)
Supports Regex., \, […], [^…], |, (…), {n, m}, *, +, ?
offset, depth
Compression/Decompression engine (zip, gzip, lzw, cab)
Decode engine (MIME. Base64 and QP)
MD5 hashing engine
DPI.168
Sensory Networks C-Series (Cont.)
Source: Sensory Networks
DPI.169
Sensory Networks C-Series (Cont.)
NadalCore SPU
Rule Memory
Source: Sensory Networks
DPI.170
Sensory Networks C-Series (Cont.)
ProsSupports sufficient rule memory and regex function in the same time.
With MIME decode and decompression: Good for AntiSpam and AV
Stream mode. (Supports up to 1M streams)
ConsCannot achieve gigabit wire speed
Long compiling time
DPI.171
Netlogic’s Knowledge based Processors
DPI.172
Agenda
Introduction Pattern Matching Technologies Content Inspection Co-
processors Flow-based Application
Identification Layer 7 Security Switches Conclusion and Open Issues
DPI.173
2Login 70 1 PASS_Per
PASV_ReqPASV_Ok
3
ACTIVE_Req
ACTIVE_Ok
5
4
6
LIST_Req
LIST_Ok
File_ReqFile_Ok
Flow_Close
Transitions Trans. ports Patterns
Login
PASS_Per
ACTIVE_Req
ACTIVE_Ok
PASV_Req
PASV_Ok
LIST_Req
LIST_Ok
File_Req
File_Ok
TCP, dport:21 “ PASS”
“ 230” , ” User” , ” logged in”
“ PORT”
“ 200 PORT command successful”
“ 227 Entering Passive Mode”
“ PASV”
“ LIST”
“ 226 Transfer complete.”
“ 226 Transfer complete”
“ RETR”
TCP, sport:21
TCP, dport:21
TCP, dport:21
TCP, dport:21
TCP, dport:21
TCP, sport:21
TCP, sport:21
TCP, sport:21
TCP, sport:21
Flow Classification Using Stateful Method
DPI.174
The FAs of eDonkey protocols
Slot_Request1
Slot_Taken
Request_Parts2
Compressed_Data_0 3
Compressed_Data_1
4Slot_Release
Slot_Request
0
1Hello_Server
2
Get_Sources Found_Sources
3
Search_File_ResultsSearch_File
Offer_Files1
Hello_Client
2
eMule_Hello
3
Hello_Answer
Hello_Answer0
Server_Info_Data
0
Connect PhaseTransitions Patterns Transitions Patterns
Download Phase
Hello_Server
Search_File
Search_File_Results
Get_Sources
Found_Sources
Offer_Files
Server_Info_Data
Hello_Client
Hello_Answer
eMule_Hello
0xe3, 0x01
0xe3, 0x1601
0xe3, 0x15
0xe3, 0x19
0xe3, 0x42
0xe3, 0x33
0xe3, 0x410xe3, 0x0101
0xe3, 0x4c
0xc5, 0x01
Slot_Request
Slot_TakenRequest_Parts
Compressed_Data_0
0xe3, 0x54
0xe3, 0x570xe3, 0x47
0xe3, 0x46
Slot_Release 0xe3, 0x56
Compressed_Data_1 0xe3, 0x46
DPI.175
The FAs of BitTorrent protocols.
1Annouce
2Get_Peers Connect_to_Peer
0 0 1
Connect PhaseTransitions Patterns Transitions Patterns
Download Phase
“ GET /announce”“HTTP/1.0 200 OK”, “e5:peers”
0x13, “ BitTorrent protocol”Connect_to_PeerAnnouce
Get_Peers
DPI.176
The FAs of Yahoo Messenger protocol.
Flow_Close
Login PhaseTransitions Patterns
Chat PhaseAuth_Resp
Trans. portsTCP, sport:5050
0 1Auth_Resp “ YMSG” , 0x54
2
3
P2P_File_Tx
File_Tx_Status_BRB
Msg_Service
Transitions PatternsTrans. portsMsg_Service
P2P_File_Tx
File_Tx_Status_BRB
“ YMSG” , 0x06
“ YMSG” , 0x4d“ YMSG” , 0x4d, 0x1
TCP, sport:5050
DPI.177
Agenda
Introduction Pattern Matching Technologies Content Inspection Co-
processors Flow-based Application
Identification Layer 7 Security Switches Conclusion and Open Issues
DPI.178
Layer 7 Security Switches
Firewalls and IDS/IPS are installed near the routers to prevent attacks from Internet.
More than 80% attacks are launched from the affected hosts inside the intranet
Defense-in-depth is emerged to prevent attacks not only from the Internet but also from the internal hosts.
Leads the need of security switches to provide the first mile protection.
Current security switch solutions are relative still expensive.
DPI.179
Cisco Security Switch Solution
Cisco Catalyst 6500 Series IDS (IDSM-2) Services Module (the 2nd generation of IDS/IPS module) is in the widely deployed Cisco Catalyst chassis.
It supports both the in-line IPS mode and passive operation IDS mode
DPI.180
3Com Security Switch Solution
Each centrally controlled security platform supports several concurrent operations of inspection such as firewalls, VPNs, IDS, IPS, antivirus security, and URL filtering
3Com Security Switches 7280 and 7245 are suited to carrier-class, service-provider businesses
DPI.181
3Com 6200 Architecture
DPI.182
Alcatel’s OmniSwitch Solution
Alcatel released an Automated Quarantine Engine, which works with IDS to identify network threats and to shut off or restrain attacks through switch hardware.
Alcatel's stackable OmniSwitch 6600 and its modular 7700- and 8800-series products use a combination of 802.1X authentication technology and APIs from Sygate to block a virus-infected PC from accessing a corporate LAN.
DPI.183
The authentication operation of Alcatel OmniSwitch 6600
DPI.184
Alternative Cost-Effective Security Switch Architecture
Most of the current available security switch solutions are chassis-based design which is suitable for either core networks or big enterprise.
Nevertheless, L2 switches are most likely the widely deployed network equipment in the world.
Any easy mechanism to “upgrade” the L2 switches to L7 security switches ?
A cost-effective Security Switch Architecture is proposed here
DPI.185
Layer 7 Security Switch Concept
Each “security switch” is composed a traditional layer-2 switch and a “security switch engine (SSE)” which provides layer-7 packet inspection service.
Coupled by GE link.
DPI.186
SecuritySwitch
SW
GE
FE FE
SSE
PVID
1
PVID
2
PVID
3
PVID
4
Layer 7 Security Switch Architecture
Victim
IPv6 Attacks
IPv4 Attacks
Security Service EngineAnti-virus/wormAnti-IntrusionAnti-P2P/IMAnti-Spyware, etc
DPI.187
Network Security Switch Architecture
Secure ports Un-trusted ports Traffic to/from un-
trusted ports are forwarded to SSE for inspection
Malicious packets are dropped
Attacking ports can be isolated or blocked
DPI.188
A Scalable HA/LB Architecture for Security Switch
High availability (HA) and Load Balancing (LB) are critical features for security switch.
The key component is “hardware bypass ports” (HBP) -- a pair of two Ethernet ports that will be “bypassed” when the power or the system is down.
SW1
GE
SSE1
SW2
GE
SSE2
GE
FE FE
DPI.189
Scalable HA/LB Architecture
A scalable load balance and high availability (LB/HA) architecture for network security switches is proposed
A mechanism is designed to interconnect the SSEs so that the “security switches” group furnishes a novel HA feature.
An intelligent load balancing scheme is also designed for the SSE so that the security service can be balanced among the SSEs
DPI.190
HA Architecture for four “Security Switches”
SW1
GESSE1
SW2
GESSE2
SW3
GESSE3
SW4
GESSE4
DPI.191
HA Architecture when SSE 2 fails
SW1
GESSE1
SW2
GESSE2
SW3
GESSE3
SW4
GESSE4
DPI.192
HA Architecture when SSE 2 and SSE 3 fail
SW1
GESSE1
SW2
GESSE2
SW3
GESSE3
SW4
GESSE4
DPI.193
HA Architecture when only SSE 1 survives
SW1
GESSE1
SW2
GESSE2
SW3
GESSE3
SW4
GESSE4
DPI.194
Scalable HA/LB Security Switch Prototyping
SW1
SSE1
SW3
SSE2
SW4
SSE3
SW2
SSE4
GE Links
DPI.195
High Availability test Scenario
Initially, stable traffic is processed by each SSE. Then the SSEs are turned off one by one: A, B, and C.
The traffic of turned off SSE is transferred to the backup SSE for further processing (HA).
Still works even ONLY ONE SSE is survive.
SWD
GESSED
SW A
GESSEA
SW B
GESSEB
SW C
GESSEC
DPI.196
High Availability Test Result
A+C
B+D
A+B+C+D
(1)
(2)
(3)
DPI.197
Load Balance Evaluation Scenario
time(Min)
Throughput(Mbit/s)
SSE A SSE B SSE C SSE D
0 5 10 15
100
(1) (4)(3)(2)
Traffic generator (IXIA) is used to evaluate load balance function. For each SSEi, the increasing traffic load (up to 100Mbps) is sent into the switch SWi at different time.
DPI.198
Load Balance Test Result
SSE A SSE B
DPI.199
Load Balance Test Result
SSE C SSE D
DPI.200
ASIC-based Security Switch
MII
SW
SoC
Signature DB
DPI.201
ASIC-based Security Switch
BroadWeb Security SoC ARM922 RISC CPU (200Mhz) Hardware NAT Hardware IPS Engine Two 10/100/1000 RJ-45 Ports
Embedded-Linux NSS approved IPS signature
database 1950+ Signatures Automatically Internet Upgrade
Drop malicious packets Isolate/Block attacking ports
DPI.202
Agenda
Introduction Pattern Matching Technologies Content Inspection Co-
processors Flow-based Application
Identification Layer 7 Security Switches Conclusions and Open Issues
DPI.203
Pattern matching is the key content inspection technology for layer-7 security devices.
More and more attacks are launched from the affected hosts inside the intranet – Needs defense-in-depth.
Security switches provide first mile protection Wireless Security Switch as well
But still have some challenges (opportunities) Features (IPS switch or UTM switch,
or others ?) Cost Performance -- wire-speed or near
wire-speed Management -- Signature upgrade
Conclusions
DPI.204
New switch architectures need Integrate Content Inspection into L2
switching fabric DoS/DDoS Chip -- Zero-day defense L2 Switch + Content Inspection
Coprocessor
Conclusions
DPI.205
How to identify and management encrypted protocols ? such as Skype 2.0 and Winny. Not by signatures (no signatures ?) May be by state machines
How to design fast content inspection or pattern matching algorithms ? Modified AC algorithm or others Using Cache efficiently Pre-filter is good Post Filter is also necessary (Rule are
more complex)
Open Issues
DPI.206
How to design fast content inspection co-processor ? Regular Expression is necessary Many commercial products already, such
as SafeNet 4850, Sensory Networks C-2000, IDT, Cavium, Netlogic, etc
Network Access Control (NAC) is a new emerging trend.
Open Issues
DPI.207
References[1] Robin Sommer, Vern Paxson, “Enhancing Byte-Level Network
Intrusion Detection Signatures with Context.”, Proceedings 10th ACM Conference on Computer and Communications Security, 2003
[2] . Handley, C. Kreibich and V. Paxson. “Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics.” in Proceedings USENIX Security Symp 2001.
[3] R. S. Boyer and J. S. Moore, “A fast string searching algorithm,” Communications of the ACM, vol. 20, no. 10, Oct. 1977, pp. 762-772.
[4] K. G. Anagnostakis, E. P. Markatos, S. Antonatos, and M. Polychronakis. “E2xB: A domainspecific string matching algorithm for intrusion detection,” Proceedings of the 18th IFIP International Information Security Conference (SEC2003), May 2003.
[5] A. Aho and M. Corasick, “Efficient string matching: An aid to bibliographic search,” Communications of the ACM, vol. 18, no. 6, June 1975, pp. 333-343.
[6] Sun Wu and Udi Manber, “A fast algorithm for multi-pattern searching,” Tech. Rep. TR94-17, Department of Computer Science, University of Arizona, May 1994
[7] R.T. Liu, C.H. Chen, C.N. Kao, N.F. Huang, “A Fast String Matching Algorithm for Network Processor-based Intrusion Detection Systems", ACM Transactions on Embedded Computer Systems, Vol. 3, No. 3, August 2004, pp. 614 – 633 .
DPI.208
References
[8] Sarang Dharmapurikar, Praveen Krishnamurthy, Todd S. Sproull, John W. Lockwood. “Deep Packet Inspection using Parallel Bloom Filters.” IEEE Micro 24(1): 52-61 (2004)
[9] Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese, “Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection,” IEEE Infocom 2004.
[10} Tzu-Fang Sheu, Nen-Fu Huang, Hsiao-Ping Lee, “A Novel Hierarchical Matching Algorithm for Intrusion Detection Systems,” IEEE GLOBECOM 2005, St. Louis, Missouri, USA, November 2005.
[11] Nen-Fu Huang, Chih-Hao Chen, Rong-Tai Liu, Chia-Nan Kao, “On the Design of a Cost Effective Network Security Switch Architecture,” IEEE GLOBECOM 2005, St. Louis, Missouri, USA, November 2005.
[12] Sung-Hua Wen, Chih-Chiang Wu, Nen-Fu Huang, and Chia-Nan Kao, “A Pattern Matching Coprocessor for Deep and Large Signature Set in Network Security System,” IEEE GLOBECOM 2005, St. Louis, Missouri, USA, November 2005.
[13]Nen-Fu Huang, Chih-Hao Chen, Yuan-Fang Huang, Yi-Husan Feng, Chia-Nan Kao, “A Scalable Architecture for High Available Security Switch,” IEEE ICC 2006, Istanbul, June 2006.
DPI.209
References
[14] Nen-Fu Huang, Yen-Ming Chu, Chen-Ying Hsieh, Yih-Jou Tsang, “A Deterministic Cost-effective String Matching Algorithm for Network Intrusion Detection Systems,” IEEE ICC2007, Glasgow, Scotland, UK, June 2007.
[15] Nen-Fu Huang, Yen-Ming Chu, Chi-Hung Tsai, Yih-Jou Tsang, “A Novel Algorithm and Architecture for High Speed Pattern Matching in Resource-limited Silicon Solution,” IEEE ICC2007, Glasgow, Scotland, UK, June 2007.
[16] Yi-Hsuan Feng, Nen-Fu Huang, Rong-Tie, Liu, Meng-Huan Wu, “Flow Digest: A State Synchronization Scheme for Stateful High Availability,” IEEE ICC2007, Glasgow, Scotland, UK, June 2007.
[17] Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu, “Performing Packet Content Inspection by Longest Prefix Matching Technology,“ IEEE GLOBECOM2007, Washington DC, USA, November 2007.
[18] Nen-Fu Huang, Hsien-Wei Hung, Sheng-Hung Lai, Yen-Ming Chu, and Wen-Yen Tsai, “A GPU-based Multiple-pattern Matching Algorithm for Network Intrusion Detection Systems,” The Fourth International Symposium on Frontiers in Networking with Applications (FINA2008), March 2008, Okinawa, Japan.
[19] Nen-Fu Huang, Gin-Yuan Jai, and Han-Chieh Chao, “Early Identifying Application Traffic with Application Characteristics,” IEEE ICC2008, Beijing, China, May 2008.
DPI.210
References
[20] Tzu-Fang Sheu, Nen-Fu Huang, Hsiao-Ping Lee, “A Hierarchical Multi-pattern Matching Algorithm for Network Content Inspection,” Information Sciences (SCI), Vol. 174, Issue 14, July 2008, pp. 2880-2898.
[21]Yen-Ming Chu, Nen-Fu Huang, Chi-Hung Tsai, Chen-Ying Hsieh and Pei-Lei Chen,”A Software-based String Matching Algorithm for Resource-restricted Network System, IEEE Communications Letters (SCI), Vol.12., No.8, August 2008.
[22] Tzu-Fang Sheu, Nen-Fu Huang, Hsiao-Ping Lee, “In-depth Packet Inspection Using a Hierarchical Pattern Matching Algorithm,” IEEE Transactions on Dependable and Secure Computing (SCI), Vol.7, Issue.2, June 2010, pp.175-188.
[23] Yi-Hsuan Feng, Nen-Fu Huang, Chia-Hsiang Chen. “An Efficient Caching Mechanism for Network-based URL Filtering by Multi-level Counting Bloom Filters', IEEE ICC2011, Kyoto, Japan, June 2011.
[24] Nen-Fu Huang, Yen-Ming Chu and Hsien-Wen Hsu (2011). Graphics Processor-based High Performance Pattern Matching Mechanism for Network Intrusion Detection, Intrusion Detection Systems, Pawel Skrobanek (Ed.), ISBN: 978-953-307-167-1, InTech, 2011. Available from: http://www.intechopen.com/articles/show/title/graphics-processor-based-high-performance-pattern-matching-mechanism-for-network-intrusion-detection