209
Ó All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of Professor Nen-Fu Huang (E-mail: [email protected] ). ep Packet Inspection Algorith

All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

Embed Size (px)

Citation preview

Page 1: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

Ó All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of

Professor Nen-Fu Huang (E-mail: [email protected]).

Deep Packet Inspection Algorithms

Page 2: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.2

Agenda

Introduction Pattern Matching Technologies Content Inspection Co-

processors Flow-based Application

Identification Layer 7 Security Switches Conclusion and Open Issues

Page 3: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.3

Introduction -- Goals What?

Multi-pattern matching algorithms Search a set of patterns simultaneously

Where? Networks: In-depth packet inspection

engines How?

Search the whole packet payload to identify interested packets that contain certain pre-defined patterns

Who? Content-aware network devices Application-oriented management e.g. intrusion detection system (IDS),

anti-virus appliance, application firewall or layer-7 switch.

Page 4: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.4

Packet Inspection Engines

Packet Capture

Decoder

Pattern Matching Engine

Output Module

Network Traffic

PatternDB

capture all packets in the subnet

fit the captured packets into data structures and identifytheir protocol type

match packets against each pattern

Generate notifications orput packets into assignedqueues

Page 5: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.5

Intrusion Detection Systems (IDSs)

Active Response

Pattern Set

IDS

Page 6: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.6

Challenges

Do string matching to match every payload against all patterns

The length of pattern is variable From 1 byte to 122 bytes in the case of

Snort rule set Snort: a famous open-source IDS An example of Snort’s pattern: GET

/scripts/root.exe?/c+dir

Patterns may appear anywhere in the payloads

The total number of patterns is usually a few thousands

Page 7: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.7

Challenges

The top four routines in Snort

The pattern matching routine is the most resource-intensive task in the IDS.

Efficient algorithms for multi-pattern matching

Page 8: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.8

A Generic Layer-7 Engine

Packet Normalizer Makes sure the

integrity of incoming packets

Eliminates the ambiguity

Decodes URI strings if necessary

Pattern-Matching Engine

Policy Engine Gather

information from pattern-matching engine and issue the verdict to allow/drop the packets

Packet Normalizer

Pattern-MatchingEngine

Policy Engine

Network

PacketStream

NormalizedTraffic

MatchedEvents

Logs,Reports

Policies

Verdicts

FilteredTraffic

PacketStream

Signatures

Page 9: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.9

Packet Normalizer Integrity Checking IP Fragment Reassemble TCP Segment Reassemble

TCP Segments may come out-of-order SEQ out of window size Segment Overlapping

URI Decode URI hex code obfuscation (‘a’ = %61) URI unicode/UTF-8 obfuscation self-referential directories

obfuscation (/././././ = /) directories obfuscation

(/abc/a/../a/../a/ = /abc/a)

Page 10: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.10

Pattern-Matching Engine

The most computation-intensive task in packet processing. Normally the PM engine needs to process every single byte in packet payload.

In Snort, the PM routine accounts for 31% of the total execution time

Page 11: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.11

Applications using the pattern-matching algorithm

SpamAssassin 75% Pattern

Matching 20% Program

Overheads 5% Decode

• Snort– 62% Pattern Matching– 17% Header

Classification– 13% IP Classification– 8% Other Matching

ClamAV 90% Pattern Matching 4% Inflating 2% De-Mimeing64-ing 2% Other Mime 2% Program

Overheads

Page 12: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.12

Pattern Matching is Expensive!

~30 Instructions/ Byte. 45K Instructions/1500 Byte packet

~50 Instructions/ 1500 Byte packet

Source: Intel Corp.

Page 13: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.13

Policy Engine

Collect the matching events from Pattern-Matching Engine.

Clarify the relationship between matched patterns: Ordered: A policy (rule) may consist

more than one pattern and should be matched in order.

Offset, Depth: The matched position should be within a certain range or location.

Distance, Within: The distance between two matched patterns should be taken into consideration also.

Page 14: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.14

Policy Engine -- Match processing

A rule may consist of multiple patterns.

Inter-pattern matching conditions Order: all patterns of a rule must

appear in order. Location: Offset/depth constraints Location: distance/within constraints

pattern 1

within

beginning of a packet

end of a packet

pattern 2

distance

depth

offset

Page 15: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.15

Policy Engine (cont.)

Trace Application States Some applications are difficult to

identify by using only one signature (e.g. P2P).

Policy Engine needs to track the connection state like the following diagram:

S0S1 S2 S3

Msg Exchange

Request File

Data Exchange

Page 16: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.16

Agenda

Introduction Pattern Matching Technologies Content Inspection Co-

processors Flow-based Application

Identification Layer 7 Security Switches Conclusion and Open Issues

Page 17: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.17

Pattern Matching Technologies

Pattern-Matching Algorithms Software Based

Boyer-Moore Aho-Corasick (AC) Wu-Manber HMA/EHMA/ACM Pre-Filtering

Hardware Based Reconfigure Hardware (FPGA) Bloom-Filter TCAM-based GPU-based

Page 18: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.18

Pattern Matching Problem Definition

Pattern matching A string is a finite sequence of

symbols

Let P = {Pl,P2 . . . . . Pk} be a finite set of strings which we shall call patterns

Let T be an arbitrary string which we shall call the text string

The problem is to locate and identify all substrings of T which are patterns in P

Page 19: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.19

The Simplest Matching

Patterns

Packet

M

N

T

The smallest unit of a symbol

U

(T-mi)/U +1

Page 20: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.20

Software Based Pattern-Matching Algorithms

Boyer-Moore Algorithm (BM)

Best average-case time complexity Good for general network situations Single pattern matching

Aho-Corasick Algorithm (AC) Best worst-case time complexity Good for networks usually under heavy

attacks BM-like Algorithms

Modify BM for multi-pattern matching Wu-Manber Algorithm HMA/EHMA/ACM Algorithms

Page 21: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.21

Boyer-Moore Algorithm (BM)

Proposed by R.S. Boyer and J.S. Moor in 1977

The most efficient single pattern-matching algorithm O (n + rm) comparison. (n: text length, m:

pattern length, r : number of matches) O (n/m) in best case

Heuristics Pre-processing the pattern

Page 22: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.22

Boyer-Moore Algorithm (BM)

Bad-Character HeuristicSkip table: Scan a pattern from right to left. Each time you move left, if the character you are on is not in the table already, add it; its Shift value is its distance from the rightmost character.

If a char never appears in pattern p, then you can safely skip |p| chars when you read this char.

Page 23: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.23

Bad Character Heuristic example

Boyer-Moore Algorithm (BM)

a c d a c d aa c d a c d a

text: a c d e c d a c d a c d a

b

Page 24: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.24

Boyer-Moore Algorithm (BM)

Bad Character Heuristic example

A 3B 2C 1D 0X 4

XBXXABCXXABCDXX

ABCD

ABCD

ABCD

ABCD

44 1

Page 25: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.25

Boyer-Moore Algorithm (BM)

Good-Suffix HeuristicShift table: If a mis-match occurs and repeated substrings exists in the pattern, it is able to shift to the next occurrence of a substring that matches what has already been matched.

When a mis-match occurs, choose the larger skip value of these two tables.

Page 26: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.26

Boyer-Moore Algorithm (BM)

Good Suffix Heuristic

text: a c d e c d a c d a c d aa c d a c d a

a c d a c d a

u

v

Page 27: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.27

Boyer-Moore Algorithm (BM)

Good Suffix Heuristic example

N 1 AN 8

MAN 3NMAN 6

ANPANMANANPANMANANPANMAN

ANPANMAN

Page 28: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.28

BM Algorithm Example 1

X X X X N M A N M A N P A N M A N

A N P A N M A N

A N P A N M A N

8

A N P A N M A N

1

Match !!

Page 29: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.29

BM Algorithm Example 2

BadChar

A C G T

1 6 2 8

Good Suffix

G C A G A G A G

7 7 7 2 7 4 7 1

G C A T C G C A G A G A G T A T A C A G T A A GText

G C A G A G A G

GAGAGACG GAGAGACG

GAGAGACG GAGAGACG

GAGAGACG GAGAGACG

GAGAGACG GAGAGACG

1 / 1

4 / 4

1 / 7

4 / 417 text character comparisons in total.

If NOT AG then move 7characters

A

Page 30: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.30

Pattern Matching Algorithms Single-Pattern Matching vs. Multi-

Pattern Matching A single-pattern matching algorithm

is used to search a string (or text) T for the first occurrence or all occurrences of one given pattern.

A multi-pattern matching algorithm is adopted to search the input T for all occurrences of any patterns in P

Page 31: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.31

The Aho-Corasick (AC) Algorithm

Proposed by A.V. Aho and M.J. Corasick in 1975.

AC is a classic solution to exact set matching.

It works in time O(n + m + z) where n is the text length, m is the pattern length, and z is number of patterns occurrences in T.

AC is based on a refinement of a keyword tree.

AC is a deterministic algorithm. That is, the performance is independent of the number of patterns.

Page 32: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.32

The Aho-Corasick (AC) Algorithm

Pros Provide the best worst-case

computation complexity. The number of state transitions for

each input symbol is at most two. No reverse scan on the input

string. No constraint on the minimum

pattern size. Performance is independent from

the number of including patterns. (|P|)

Page 33: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.33

An Example of AC Algorithm

Example: P = {ab, ba, babb, bb}

Page 34: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.34

The Aho-Corasick (AC) Algorithm

Automaton-based algorithm Three functions: Goto, Fail, and

Output Goto(current_st, input_code)

State transition function Every prefix of the patterns is only

represented by one state. P={she, he, his, hers}

Page 35: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.35

0

1 2 3ehs

4 5eh

6 7si

P={she }

8r

9s

Not {h, s}

, he, his, hers

The Aho-Corasick (AC) Algorithm

Page 36: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.36

The Aho-Corasick (AC) Algorithm

0 0 4 5 0 0 0 10 1 2 3 4 5 6 7

0 18 9

Failst

Fail(current_st)Point to the longest suffix of the current state

Output(current_st)

3579

she, hehehis

hers

st Output

Page 37: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.37

The Aho-Corasick (AC) Algorithm

struct ACO{struct ACO

*next_state[ |A| ];struct Output

*pattern_list;};

r

e

s

seh

h

i

s

Not {h, s} 0 4

1

5 8

2 3

6 7

9

Next(current_st, current_code)= Goto+Fail

Page 38: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.38

An example of AC Algorithm

Dashed: fail transitions; those not shown leads to the root

h

e

e

h r s

i

ss

{hers}

{he, she}

{his}

{sh}{s}

{he}{h}!={h,s}

Patterns:

hers

his

she

Page 39: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.39

An example of AC Algorithm

h

e

e

h r s

i

ss

Text: h e i s h i s

h e

i

s

hi

s

Page 40: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.40

The Aho-Corasick (AC) Algorithm

Cons Large memory

requirement Poor performance in

the real system A lot of external

memory access

1028 bytes per state (256x4 =1024)(When |A| = 256, 32-bit pointer)about 10MB for state machine

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

256

256

256

Page 41: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.41

AC with Bitmap

Proposed by Nathan Tuck et al in Infocom 2004

4

1

5

6

2

8

7

3

90

next_flag

fail_ptrnext_start

pattern_list

0 16 32

44 bytes per state(256 bits = 32 bytes32 bytes + 12 bytes = 44 bytes )

Page 42: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.42

AC with Bitmap (ACB)Procedure ACB_Matching…..

8 While j<code do

9 popcount ← popcount+State->next_flag[j];

10 End

11 State ← State->next_start + popcount*Sizeof(ACB);

12 PM ← PM Out(∪ State->pattern_list);

13 End

14 Return;

For example, if the input symbol is ‘s’, as the ASCII code of ‘s’ is 0x76, ACB matching has to read 118 bits (7x16 + 6 =118) and accumulate these bits to obtain the offset for ‘s’ (popcount).

0001001000000…..0001000… = 2 (offset = 2)

d g s

Page 43: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.43

Wu-Manber Algorithm

Proposed by Sun Wu and Udi Manber, in

1994 A very popular multiple pattern-

matching algorithm. Have better performance than AC in

average case. Uses the “skip” idea of Boyer-Moore to

multiple patterns. (Bad character shift) Looking text in blocks instead of one by

one char. Hash functions and tables are used.

Page 44: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.44

The preprocessing stage

m: Length of the shortest pattern (LSP) Consider only the first m characters of

each pattern. For k patterns, total characters size M

= k x m . Three tables to build:

a SHIFT table, a HASH table, and a PREFIX table.

a b c d e f g h1 2 3 4 5 6 7 a 2 c 4

m = 4

Page 45: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.45

SHIFT table SHIFT table:

Let B be the size of the block, each string of size B in the alphabet is mapped to an index to the SHIFT table by a hash function. Hash (X) = IX doesn’t appear ; Shift[i] = m-B+1

X appears ; Shift[i] = m-q ; q is the position that X ends in some pattern.

Set to the minimum value. Maximum shift distance is

m – B + 1, where m = Length of shortest pattern, B = window size

Page 46: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.46

Patternsa b c d1 2 3 4a 2 c 4

Shift Table

ab 2

bc 1

cd 0

12 2

23 1

34 0

a2 2

2c 1

c4 0

* 3

SHIFT table example

m=4, B = 2

Maximum shift distance = m – B + 1 = 4 – 2 + 1 = 3

m – q = 4 – 2 = 2 m – q = 4 – 3 = 1 m – q = 4 – 4 = 0

Page 47: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.47

Shift Table example

m=5, B = 2

m – q = 5 – 4 = 1m – q = 5 – 2 = 3Shift(ve) = min {1,3} = 1

m – q = 5 – 2 = 3m – q = 5 – 5 = 0Shift(sh) = min {0,3} = 0

m – q = 5 – 3 = 2m – q = 5 – 5 = 0Shift(er) = min {0,2} = 0

Page 48: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.48

HASH table

The same hash function as SHIFT table.

Map the last B chars of all patterns. Hash [i] contains a pointer that:

Points to a list of pointers of the patterns whose last B characters hash into i.

is an index to the PREFIX table.

Page 49: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.49

HASH table example

Page 50: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.50

PREFIX table

Map the first B’ chars of all patterns into the PREFIX table.

Contains the hash value of each prefix of size B’.

Used to filter patterns whose suffix is the same but with different prefix.

Page 51: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.51

SHIFT[i]

HASH table

SHIFT table

PREFIX table

Hash = i

Pattern pointer list

SHIFT[i+1]

Hash = i+1

Data Structures used in Wu-Manber Algorithm

0

0

P1 P2 P3 P4

Page 52: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.52

Scanning steps

1. Compute a hash value h based on the current B characters from the text (starting with ).

2. If Shift[h] >0, shift and back to 1. Otherwsie,

3. Compute the hash value of the prefix of the text; call it text_prefix.

4. Check for each p, Hash[h] <= p < Hash[h+1] whether Prefix [p] = text_prefix . When they are equal, check the actual pattern against the text directly.

tt mBm

1

Page 53: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.53

An Example of MWM Algorithm

Patterns

a b c d

1 2 3 4

a 2 c 4

Shift Table

ab 2

bc 1

cd 0

12 2

23 1

34 0

a2 2

2c 1

c4 0

* 3

Hash Table

ab

12

a2

abcd

1234

a2c4

Text: 1 2 2 c a b c 1 2 3 a 2 a 2 c 4

Window size (B) = 2

Page 54: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.54

Multi-Pattern Matching Algorithms Some algorithms apply the BM-based

algorithms iteratively for each pattern to solve the multi-pattern matching problem.

These algorithms were originally designed for single-pattern matching.

They were designed for text file searching in computers, where the length of an input string is typically larger than that of a pattern string.

However, the input string for multi-pattern matching across a network is a packet, whose length is much smaller than the sum of the length of all patterns.

Moreover, the pattern set (|P|) is generally very large in a network system.

Page 55: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.55

Multi-Pattern Matching Algorithms BM-based approaches are not applicable for

packet inspection, because of the different pattern length, scale of the pattern database and memory capacity.

Cons Space complexity: O(|Λ|) -> O(|Λ|x|P|) Time complexity: O(|T|+|p|) ->

Page 56: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.56

BM-like Algorithms Boyer-Moore-Horspool algorithm (BMH),

which is a variant of BM, slightly modifies the bad character heuristic to build a single skip table Shift Table

BMH is the best average-case algorithm for general pattern lengths in the single-pattern matching

The Brute Force method outperforms the BM-based approaches in the extreme cases of pattern length less than three characters or close to the length of the input string

13.7% of the patterns in the Snort pattern set have pattern lengths of less than three characters

Page 57: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.57

BM-like Algorithms

Fisk and Varghese’s method (FV) groups all patterns to precompute the safety shifts;

Wu and Manber’s algorithm (WM) groups D-grams of the prefixes of all patterns to build a shift table based on the bad gram heuristic, where each entry contains the safety shift of each D-gram.

Page 58: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.58

BM-like Algorithms

D

|Λ|D

Liu et al. presented an algorithm (WM-PH) that groups the prefixes of all patterns to build a large hash table, where the length of the prefix is D Bad-Character heuristic J(.) function

Cons Large table Minimum pattern

length Complicated pattern update

Usually D=3Memory = 16M entries’ size(256)3 = (28 )3 = 224 = 16M

Page 59: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.59

Ways to Improve an Algorithm

Reduce the number of instructions Reduce the number of memory

accesses Embedded system

– 1 cycle for XOR– 10~150 cycles SRAM latency– 50~250 cycles DRAM latency

General CPU– 0.5 cycles for ADD– 80~200 cycles for one off-chip access

The rate of improvement in processor speed exceeds the improvement in memory speed.

Reduce the required memory size

Page 60: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.60

Other Pattern Matching Algorithms

Hierarchical Matching Algorithm (HMA) for Intrusion Detection Systems

Enhanced Hierarchical Matching Algorithm (EHMA) for Intrusion Detection Systems

AC Algorithm with a Magic Number (ACM)

Pre-filtering Algorithms for Pattern Matching

Page 61: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.61

Hierarchical Matching Algorithm (HMA)

A multiple-pattern matching algorithm Be used to search the input string for all

occurrences of any pattern in the database.

Match a packet against a set of patterns simultaneously, instead of one by one.

No constraint on the minimum pattern length.

Contains two tables The first-tier table is very small and

stored in the on-chip cache, which acts as a pre-filter.

The second-tier table is stored in the external memory, which will be accessed only when the first-tier table is matched.

Page 62: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.62

HMA – Hierarchical Architecture

...

0x00

0x01

…0x62

0x63

0x64

0x65

0x66

0x67

0x68

...

0xFF

...

0x00

0x01

...

0xFF

...

0x00

0x01

...

0xFF

...

0x00

0x01

...

0xFF

...

0x00

0x01

...

0xFF

Off-Chip Memory

Processing Engine

On-Chip Memory

Processor

H1

H2

Memory access prob. f

Fetch a cluster of patterns and do string comparison

First-tier Table

Second-tier Table

Page 63: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.63

HMA - Challenges

L1 memory Very small:

2~4 KB in network processors 1~2 MB in some high-end hardware

designs– Linking many memory blocks

degrades the chip performance – Power issues

How to obtain a small Tier-1 table (H1)?

Page 64: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.64

Frequent Common-Code Searching Algorithm (FCS)

To lower the memory access probability To find a small set of codes to represent a

set of patterns. In other words, to get a set of signatures of the

patterns: F (each signature is one byte in this case)

F: Used to build the first-tier table (H1) The main idea of FCS

If one code occurs more frequent than others in the patterns, we can choose it as one of the frequent common codes.

Then we can get a smaller set of frequent common codes.

Page 65: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.65

Pattern Spectrum

1-gram

2-gram

The pattern spectrum when |P| = 1200 from Snort’s rule set.

Page 66: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.66

Frequent Common-Code Searching Algorithm (FCS)

red

orange

green

yellow

black

Patterns

a

e

# of entries =The size of the alphabetset

orange

black

red

green

yellow

FCS

F={a,e}

H1 First-tier Table

H1 is a mapping table which is

indexing to the corresponding cluster

Page 67: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.67

Clustering Patterns

a

e

orange

black

red

green

yellow

orange

black

red

green

yellow

(a,n)

(a,c)

(e,d)

(e,e)

(e,l)

Cluster Balancing Strategy (CBS) Reduce collision prob. in H2

Page 68: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.68

Clustering Patterns

a

e

orange

black

green

yellow

orange

black

green

yellow

(a,n)

(a,c)

(e,d)

(e,e)

(e,l)

seed seed

CBS

Page 69: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.69

HMA – on-line stage

.

.

.

H1(a)

H1(b)

.

.

H1(e)

H1(f)

H1(g)

H1(h)

H1(i)

.

.

.

.

H1(z)

On-chip Cache Memory (H1)

pid fid

(a,a)

(a,b)

(a,c)

(a,n)

(a,z)

.

.

.

red

yellow

(e,a)

(e,b)

(e,c)

(e,l)

(e,z)

.

.

.

.

.

(e,d)

.

.

green(e,e)

black

.

.

.

orange

.

.

.

.

.

.

.

.

.

External Memory H2

1 memory access1 string comparison(5 character comparisons)

it is black

black

Cluster-wise matching

Page 70: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.70

HMA – Preliminary Results

Memory Size

|F|

Page 71: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.71

HMA – Number of L2 Memory Accesses

Filter out about 90%–63% payloads in the Tier-1

Random Input

Page 72: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.72

HMA – Memory size and Memory latency

HMA BM-PH BMH AC-C

Memory 326.75 KB 16.013 MB 313.2 KB 439 KB

Page 73: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.7373

HMA – Time (Cycles)

Page 74: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.74

Other Pattern Matching Algorithms

Hierarchical Matching Algorithm (HMA) for Intrusion Detection Systems

Enhanced Hierarchical Matching Algorithm (EHMA) for Intrusion Detection Systems

AC Algorithm with a Magic Number (ACM)

Pre-filtering Algorithms for Pattern Matching

Page 75: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.75

Enhanced HMA (EHMA) – gram?

Definition: gram A gram: a group of characters D-gram: D is the num. of grouped

characters A string = a set of grams

“green” = {‘gr’, ‘re’, ‘ee’, ‘en’} when D = 2.

Page 76: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.76

EHMA - Strategies Enhance HMA

Frequent-common Code Search (FCS) -> General Frequent-common Gram Search

(GFGS) Obtain frequent-common gram set: F

Cluster Balancing Strategy (CBS) Add

Sampling Window Searching F in the sampling window

Safety Shift Strategy Frequency-based bad gram heuristic

– Modify bad grouped character heuristic– Frequent-common gram

Shift value in H1 and H2 table

Page 77: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.77

EHMA –State Diagram

Tier-1 Matching

Tier-2 Matching

Have next pattern in the cluster

Hit

No next pattern in the cluster

HMA EHMA

Page 78: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.78

EHMA – Off-line Stage

H1

H2

Page 79: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.79

EHMA - GFGS

B1 = 1, B2 = 1, m = M = 6, W = 3

F={e, h}

Page 80: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.80

EHMA

0

1

4

1~4

4

H1(a)

H1(b)

H1(c)

H1(d)

H1(e)

H1(f)

H1(g)

H1(h)

H1(i)

.

.

.

.

.

H1(z)

0

On-chip Cache (H1)

External Memory (H2)

shift fid

(e,a)

(e,z)

.

.

.

.

(e,e)

Pe

5..

5

5(h,a)

(h,i)

(h,t)

Ph

(h,e)

W=3, B1=1, m=6

5..

2

2 firefighter(e,f)

.

.

5..

2 farmer(e,r)

(e,t) 45..

4

teacher1

5..

4

5..

2 architect

.

.

.

.

W=3, B1+B2=2, m=6

1

4

1

4

.

.

5(h,h)

(e,g) 5

(e,h) 5

(e,s) 2 actress

(h,f) 4

(h,z)

5..

5

.

.

1

shift data

shift data

actressteacherarchitectfirefighterfarmer

256 entries

Page 81: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.81

EHMA – on-line stage

Read Next Gram

Skip?

Read Entry from External Memory

Next Pattern?

Skip on Input

StringComparison and Output

NO

NO YES

YES

External Memory

Processing Unit

Tier-1 Matching Tier-2 Matching

pid = NULL?

YES

NO

Output Matched

Single-gram

Input String

Page 82: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.82

EHMA - Safety Shift Strategy

As long as no F is matched in input strings, then no pattern exists.

Therefore, if no F is missed, then no pattern will be missed.

The basic concept of the safety shift strategy is that: if x is not a gram of any pattern, and any

suffix of x is not any prefix of any pattern in P, then it is safe to shift m when x is scanned;

otherwise, the number of safety shifts is the offset between the rightmost occurrence position of x in any p and the position of the frequent-common gram (f) in any p nearest to x.

Page 83: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.83

EHMA - Examples

1 L1 table lookup0 L2 access0 string comparisonFor text of 8 chars

4 L1 table lookups1 L2 access1 string comparisonFor text of 12 chars

M=m=6, W=3, B1=B2=1

(M-W+1)-th

Best Case

Page 84: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.84

EHMA – Number of L2 Memory Accesses

0.01

0.1

1

10

100

1000

EHMA HMA WM-PH AC-C BMH BMH-O

Ave

rage

Num

ber o

f Ext

erna

l Acc

esse

s

200 patterns 1200 patterns

0.01

0.1

1

10

100

1000

EHMA HMA WM-PH AC-C BMH BMH-O

Ave

rage

Num

ber o

f Ext

erna

l Acc

esse

s

200 patterns 1200 patterns

λ=0 λ=4

Filter out 81%-94% payload in Tier-1

Page 85: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.85

Other Pattern Matching Algorithms

Hierarchical Matching Algorithm (HMA) for Intrusion Detection Systems

Enhanced Hierarchical Matching Algorithm (EHMA) for Intrusion Detection Systems

AC Algorithm with a Magic Number (ACM)

Pre-filtering Algorithms for Pattern Matching

Page 86: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.86

AC with a Magic Number (ACM)

AC with Bitmap (Nathan Tuck et al in Infocom 2004)

4

1

5

6

2

8

7

3

90

next_flag

fail_ptrnext_start

pattern_list

0 16 32

44 bytes per state(256 bits = 32 bytes32 bytes + 12 bytes = 44 bytes )

Page 87: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.87

AC with a Magic Number (ACM)

For example, if the input symbol is ‘s’, as the ASCII code of ‘s’ is 0x76, ACB matching has to read 118 bits (7x16 + 6 =118) and accumulate these bits to obtain the offset for ‘s’ (popcount).

00000001000000…..0001000… = 2 (offset = 1)

h s However, the computation load of

popcount function is heavy.

Page 88: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.88

AC with a Magic Number (ACM): Path Decoder R?

{h, s, i} {0, 1, 2}

Page 89: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.89

ACM: A Magic Number?

Assume there is a magic number

and define the function as

i=1, 2, ..., k

Does the magic number exist?

: % ( ) 1i if a i

1 2, ,..., {0,1,..., 1}ka a a k

{h, s, i} {0, 1, 2}

Page 90: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.90

Chinese Remainder Theorem (CRT)

Let , where mi are integers and relatively prime; that is, gcd(mi, mj) = 1 for 1 ≦i,j ≦k , and i≠ j. Let x1, x2,..., xk be integers. Consider the system of congruences (同餘方程組 ):

where X and xi are said to be congruent modulo mi, for 1 ≦i≦k. Then there exists exactly one X and

The gcd(a, b) means the greatest common divisor of a and b.

1

k

ii

M m

1 1(mod )X x m

2 2(mod )X x m

(mod )k kX x m

0,1,..., 1X M

Page 91: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.91

Chinese Remainder Theorem (CRT)

Assume M = 3x5x7 = 105

X ≡2 (mod 3)

X ≡3 (mod 5)

X ≡2 (mod 7)

Then X = 23

Page 92: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.92

ACM: CRT

X%m1 = x1%m1

X%m2 = x2%m2

X%m3 = x3%m3

mi are integers and relatively prime

X exists and X<m1*m2*m3

xi are integers

X {x1, x2, x3}

Page 93: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.93

ACM: CRT

X%m1 = x1%m1

X%m2 = x2%m2

X%m3 = x3%m3

X%m1 = x1

X%m2 = x2

X%m3 = x3

xi < mi

: % ( ) 1i if a i

mi are integers and relatively prime

Page 94: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.94

ACM: find R

Therefore, if let the function f number the symbols by prime numbers,

then by CRT, we know the magic number exists.

: % ( ) 1i if a i

1 2 1 2, ,..., { , ,..., }fk ka a a m m m

Page 95: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.95

ACM: find X

Chinese Remainder Theorem Algorithm. Let zi = M/mi and yi = zi

-1 (mod mi) for each i = 1, 2,..., k, where zi

-1 means the multiplicative inverse of zi. (Note that zi

-1 exists if gcd(zi, mi) = 1.) Then the solution to the congruence system of the Chinese Remainder Theorem is 1

( ) modk

i i ii

X x y z M

Page 96: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.96

ACM: an example

{h, s, i} {2, 3, 5}f

By CRT algorithm, the magic number is 22.

: % ( ) 1i if a i

{0, 1, 2}CRT

X%2=0

X%3=1

X%5=2

Page 97: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.97

ACM: an example

{h, s, i} {2, 3, 5}f

magic number is 22.

22%2=0

: % ( ) 1i if a i

{0, 1, 2}CRT

22

Page 98: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.98

ACM: an example

{h, s, i} {2, 3, 5}f

magic number is 22.

22%2=0

22%3=1

22%5=2

: % ( ) 1i if a i

{0, 1, 2}CRT

22

Page 99: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.99

ACM: Magic Structure

next_flag

fail_ptrnext_start

pattern_list

0 16 32

MagicNumber

52 bytes per state(256 bits = 32 bytes32 bytes + 12 bytes + 8 bytes = 52 bytes )

Page 100: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.100

ACM: on-line stage

next_flag

fail_ptrnext_start

pattern_list

0 16 32

MagicNumber

0a Not flagged

nextState = fail_ptr

Page 101: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.101

ACM: on-line stage

next_flag

fail_ptrnext_start

pattern_list

0 16 32

MagicNumber

S flagged1

Page 102: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.102

ACM: memory architecture

{s, h} {11, 3}f

magic number is 22.{0, 1}CRT{e, i} {2, 5}f

magic number is 6.

{0, 1}CRT

Page 103: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.103

ACM: The Magic Number Heuristic

Heuristic: If there is only one child, then the magic number will be zero.

Observing the ACM state machine, we can find that approaching the leaves, more and more states have only one child state.

The forwarding path can be obtained directly without any computation.

Magic Structure is very efficient for sparse graph 0.078% nodes of ACM have branches more than 15

when including 1200 patterns from Snort

nextState = next_start0 4

1

5 8r

2 3e

6 7s

9seh

h

i

s

Not {h, s}

Page 104: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.104

ACM: Memory Size

519.1 438.8

10252.9

1912.4

81.997.2

1

10

100

1000

10000

100000

ACM ACB ACO

Mem

ory

(KB

)

1200 patterns 200 patterns

Rule Database: Distinct patterns from Snort

1.183%

1.187%

5%

5%

Page 105: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.105

ACM: Time

0

20

40

60

80

100

ACM ACB ACO ACO-100

Tim

e (c

ycle

s)

1200 patterns 200 patterns

Software: The number of instructions per clock for add, mov, mul, cmp, bt, and mod is 3, 3, 1, 3, 3, 1/71 respectively.

5.345.67

Page 106: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.106

ACM: Total Cost An evaluation function C including the

memory and the execution time requirements: C = CM × CT.

CM: memory cost

CT: time cost

Page 107: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.108

Other Pattern Matching Algorithms

Hierarchical Matching Algorithm (HMA) for Intrusion Detection Systems

Enhanced Hierarchical Matching Algorithm (EHMA) for Intrusion Detection Systems

AC Algorithm with a Magic Number (ACM)

Pre-filtering Algorithms for Pattern Matching

Page 108: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.109

Pre-filter: Search Filter Model

All the substrings that filtered by the filter are clear and impossible to contain any of the defined patterns.

And those substrings passed to the pattern matching algorithm may or may not contain pre-defined patterns.

Thus, the search filter may generate false positive but not false negative. The false positive here refers to the

case that a substring without any pre-defined patterns is falsely detected and accepted as with.

An exact string matching mechanism is essential for finding out which patterns are included in the accepted substring.

Page 109: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.110

EnteringInput string

Pre-Filtering Algorithm

String Matching Algorithm

Accepted substrings, may or may not contain pattern

Bypassed substrings, without any pattern

Patterns found in the substring

EnteringInput string

Pre-Filtering Algorithm

String Matching Algorithm

Accepted substrings, may or may not contain pattern

Bypassed substrings, without any pattern

Patterns found in the substring

Pre-filter: Search Filter Model

Page 110: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.111

Super-Symbol Filter The basic idea of the proposed Super-

Symbol Filter (SSF) algorithm is to treat two bytes data as a super-symbol, and the using of bitmap to indicate the occurrence of each super-symbol in the pre-defined patterns.

Bitmap

000010101010101100

ZZ

OO

OD

FO

DE

CO

AB

AA

0x00 0x00

0x41 0x41

0x41 0x42

0x43 0x4F

0x44 0x45

0x46 0x4F

0x4F 0x44

0x4F 0x4F

0xFF

0xFF

0x5A 0x5A

… … … … … … ……

000010101010101100

ZZ

OO

OD

FO

DE

CO

AB

AA

0x00 0x00

0x41 0x41

0x41 0x42

0x43 0x4F

0x44 0x45

0x46 0x4F

0x4F 0x44

0x4F 0x4F

0xFF

0xFF

0x5A 0x5A

… … … … … … …

P2=CODE P3 =FOODP1=AAB P2=CODE P3 =FOODP1=AAB

For example, for the 8-bit ASCII-code, there are 65536 combinations of two bytes data, and a bitmap vector of 65536 entries is used.

Match Vector Constructing

Page 111: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.112

Filtering phase in SSF-1 Algorithm

Input String Text= ABOD CODING IS FOOD

DOC DOOFBA DO

Bitmap

AA

AB

CO

DE

FO OD

OO

ZZ

0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0

AB BO OD D△ △C CO OD DI IN NG G△ △I IS S△ △F FO OO OD

AB BO OD D△ △C CO OD DI IN NG G△ △I IS S△ △F FO OO OD

1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1

Page 112: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.113

SSF-2 Algorithm To have better accuracy and less number of false

positives, the extended SSF-2 algorithm, two match vectors are employed.

The First Match Vector (FMV) is used for the super-symbols being conjugated by the first two symbols in each of the patterns.

The Rest Match Vector (RMV) is used for the rest super-symbols in the patterns except those in the FMV.

FMV bitmap

000000001000100100

ZZ

OO

OD

FO

DE

CO

AB

AA

0x00 0x00

0x41 0x41

0x41 0x42

0x43 0x4F

0x44 0x45

0x46 0x4F

0x4F 0x44

0x4F 0x4F

0xFF

0xFF

0x5A 0x5A

… … … … … … ……

000000001000100100

ZZ

OO

OD

FO

DE

CO

AB

AA

0x00 0x00

0x41 0x41

0x41 0x42

0x43 0x4F

0x44 0x45

0x46 0x4F

0x4F 0x44

0x4F 0x4F

0xFF

0xFF

0x5A 0x5A

… … … … … … …

RMV bitmap

000010100010001000

ZZ

OO

OD

FO

DE

CO

AB

AA

000010100010001000

ZZ

OO

OD

FO

DE

CO

AB

AA

P2=CODE P3 =FOODP1=AAB P2=CODE P3 =FOODP1=AAB

Page 113: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.114

SSF-2 Algorithm The algorithm looks up the FMV and RMV and

detects whether the corresponding bit of each super-symbol is 1.

Since “AB” and “OD” are not the beginning super-symbol of any patterns (by checking FMV), the filter algorithm only outputs two substrings “COD” and “FOOD”. And only one substring “COD” is false positive in this case.Input String Text = ABOD CODING IS FOOD

DOC DOOF

000000001000100100

ZZ

OO

OD

FO

DE

CO

AB

AA

OO ODFO△ FS△IS△ IG△NGINDIODCO△ CD△ODBOAB

OO ODFO△ FS△IS△ IG△NGINDIODCO△ CD△ODBOAB

1 11000000001100000

000010100010001000

ZZ

OO

OD

FO

DE

CO

AB

AA

FMV bitmap

RMV bitmap

Input String Text = ABOD CODING IS FOOD

DOC DOOF

000000001000100100

ZZ

OO

OD

FO

DE

CO

AB

AA

OO ODFO△ FS△IS△ IG△NGINDIODCO△ CD△ODBOAB

OO ODFO△ FS△IS△ IG△NGINDIODCO△ CD△ODBOAB

1 11000000001100000

000010100010001000

ZZ

OO

OD

FO

DE

CO

AB

AA

FMV bitmap

RMV bitmap

Page 114: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.115

Evaluation To evaluate the scalability and flexibility, the popular

Snort IDS signatures are employed. In case most bits of the bitmap are set as ‘1’, we can

expect that the SSF filtering performance will be impacted dramatically as the “hit rate” will be very high.

Fortunately, by tracking the growing paths of Snort rule patterns, the percentage of setting bits for the MV, FMV, and RMV is still very small (less than 5%). Thus, the proposed approaches have a great chance to adopt the fast growth of Snort releases.

Number of Released

Patterns

SSF-1MV bitmap

SSF-2FMV

bitmap

SSF-2RMV

bitmap

Snort-2.0 2066 3213 695 3027

Snort-2.1 2617 3478 813 3296

Snort-2.2 2664 3575 835 3382

Snort-2.3 2679 3611 845 3413

Snort-2.4 2680 3611 845 3413

Page 115: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.116

Defcon9 Trace

Filter-AlgorithmPassed by

Filter < bytes >

Filter outpercentage

Filter cost time< μs >

ACsearchcost

time

<μs>

Total cost time

< μs >

Throughput<Mbps>

Defcon-1# of matched

patterns : 377,508

times(9,846,572 bytes)

PBF 1,173,918 88% >10^7 >10^7 <10

IDP 9,782,654 0.7% 126,439 550,468 676,907 116

AC 9,846,572 0% 0 558,079 558,079 141

SSF-1 2,916,802 70% 122,841 212,307 335,148 250

SSF-2 1,917,544 81% 130,809 160,872 291,681 270

Defcon-2# of matched

patterns : 147,843

times(9,849,836 bytes)

PBF 492,491 95% >10^7 >10^7 <10

IDP 9,777,406 0.8% 125,901 529,602 655,503 120

AC 9,849,836 0% 0 537,297 537,297 146

SSF-1 1,868,185 81% 118,264 119,343 237,607 332

SSF-2 879,353 91% 127,651 68,628 196,279 401

Defcon-3# of matched

patterns : 57,458 times

(9,852,342 bytes)

PBF 197,046 98% >10^7 >10^7 <10

IDP 9,775,924 0.8% 125,810 512,169 637,970 123

AC 9,852,342 0% 0 513,081 513,081 153

SSF-1 1,350,541 86% 117,000 80,374 197,374 400

SSF-2 391,024 96% 126,523 29,739 156,262 504

Performance

Pentium-4 3.0 GHz personal computer with 1MB level-2 cache, and installed with Intel’s VTune tool

Parallel Bloom Filter (PBF), Database Processor (IDP)

Page 116: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.117

Filter Percentage & Throughput The filtering effectiveness of IDP scheme is pretty

bad and is not capable to handle Snort’s patterns. This is due to the bitmap used in the IDP scheme has only 256 entries for one byte symbol.

And most of the entries of are set as “1” for the Snort’s patterns.

Both PBF and SSF schemes are less sensible to the growth of patterns and have a filtering percentage around 80-98%.

0.7 0.8 0.8

7081 8681

91 9688

95 98

0

20

40

60

80

100

120

Defcon-1 Defcon-2 Defcon-3

Case of Defcon9 packets

Filt

er P

erce

ntag

e (

% )

IDP SSF-1 SSF-2 PBF

Page 117: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.118

Filter Percentage & Throughput The PBF is only suitable for hardware-based

implementation, the throughput of PBF is less than that of AC.

We can see that for the Defcon-1, the system throughput is around double speed-up (270Mbps vs 141Mbps) compared to that of original AC algorithm, and for Defcon-3, the system throughput is even more than three times speed-up (504Mbps vs 153Mbps).

The proposed SSF schemes consume far less memory (cache-resident).

0

100

200

300

400

500

600

Defcon-1 Defcon-2 Defcon-3Case of Defcon9 Trace

Thr

ough

put (

Mb

/ s ) PBF + AC IDP + AC AC SSF-1 + AC SSF-2 + AC

Page 118: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.119

Pattern Matching Technologies

Pattern-Matching Algorithms Software Based

Boyer-Moore Aho-Corasick (AC) Wu-Manber HMA/EHMA/ACM Pre-Filtering

Hardware Based Reconfigure Hardware (FPGA) Bloom-Filter TCAM-based GPU-based

Page 119: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.120

Reconfigure Hardware (FSM)

Implement the AC FSM in configurable Logic Elements (LEs) of FPGA.

Achieve multiple gigabit performance. (Depends on the FPGA model)

A powerful FPGA is necessary to accommodate thousands of patterns, so that it’s not practical and visible in commercial market.

Page 120: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.121

FPGA-based pattern matching

FPGA-based

Page 121: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.122

Shortcoming Non-scalable Limited rules

FPGA-based pattern matching

Page 122: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.123

Bloom Filter

Given a string X, the Bloom filter computes k hash functions on it producing k hash values ranging from 1 to m. The same procedure is repeated for all the members of the pattern set.

The input text is verified by generating k hash values in the same way. If at least one of these k bits is found not set then the string is declared to be impossible to match.

Patterns in Length n are grouped into Bn.

Page 123: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.124

Bloom Filter (Cont.)

1 2 3 4 5 6 7 8 9 …

Payload Stream

A B C D E F G H I J

……B2 B3 B4 Bw

False positive :

Mim f = (0.5)K, while m = (k x n) / Ln2

So, total space, sum(Bi) = m x (w - 1)

if k = 1, n = 2048, m = 3072 bits

k = 1, n = 3072, m = 4608 bits

if k = 4, f = 0.0625

k = 5, f = 0.0313

k = 6, f = 0.0156

Bloom Filter (B4)

Bloom Filter (B3)

Bloom Filter (B2)

1

1

1

1

0 m

0 m

0 mH1

H2

H3

Hk0 m

1

1 1

11

1 1

1

Group signature by length :

G2 (X)

G3 (X)

G4 (X)

K Hash functions H1, H2, …, Hk

Page 124: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.125

Bloom Filter (Cont.)

Bloom Filter (B4)

Bloom Filter (B3)

Bloom Filter (B2)

1

1

1

1

0 m

0 m

0 mH1

H2

H3

Hk0 m

1

1 1

11

1 1

1

Group signature by length :

G2 (X)

G3 (X)

G4 (X)

K Hash functions H1, H2, …, Hk

P1: abP2: aaP3: xy

Text: accdefg

Page 125: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.126

TCAM(Ternary-CAM)-Based Pattern Matching

Unlike RAM, CAM takes data as input and then returns the matched address in one clock cycle.

Brute-force methodology needs n clock cycles to find a pattern in a text with length n.

Here we propose a methodology to speed up the lookup performance up to d times. However, the required space is d times also.

Even though, a 2-Mbit TCAM (around $40) can accommodate over 3,000 patterns in multiple gigabit performance.

Page 126: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.127

TCAM TCAM stores data with three logic

values: ‘0’, ‘1’, ‘X’ (don’t care) Multiple match modes are needed.

Page 127: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.128

Ternary-CAM (TCAM)

Each cell takes three logic states ‘0’, ‘1’, and ‘?’(don’t

care)

Fully associative memory: compares input string with all the entries in parallel If multiple matches,

report index of the first match

Current TCAM technology Fast Match Time: 4-8 ns Size: 1M

1K entries * 1K bytes per entry

2K entries * 512 bytes per entry

k bytes

> 1K

entries

A B C D

C D E F

A B ? ?

MatchA B C ?

Input

TCAM

Page 128: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.129

Pattern Matching with TCAM

Put all the patterns into the TCAM Assume patterns are less or

equal to TCAM width If shorter than TCAM width,

pad with ‘?’ Order the patterns

according reverse lengths When matching entry ABC,

report matching of both pattern ABC and AB

Shift one byte each time

k bytes

> 1K

entries

A B C D E F

C D E F

A B ? ?

MatchA B C ?

Input

TCAM

Page 129: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.130

TCAM Search Analysis

Scan speed: 4-8 ns per TCAM lookup, shift one

byte at a time 1-2 Gbps worst case scan rate

Able to report occurrences of all the patterns in the input string

Limitation: require all the patterns to be shorter or equal than the TCAM width

Page 130: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.131131

TCAM application

Packet switch

Page 131: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.132

TCAM application

Address resolution

Page 132: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.133

BM-Based TCAM Solution (cont.)

Shift Table

Patterns shift

a b c d e f g h i 0

1 2 3 4 5 6 7 8 9 0

h i g h s p e e d 0

* a b c d e f g h 1

* 1 2 3 4 5 6 7 8 1

* h i g h s p e e 1

* * a b c d e f g 2

* * 1 2 3 4 5 6 7 2

* * h i g h s p e 2

* * * * * * * * a 8

* * * * * * * * 1 8

* * * * * * * * h 8

* * * * * * * * * 9

Every time it searches 72-bit text string (Ti…Ti+7) in this table, and the windows slides k bytes if the text string matching an entry in Gk.

This design achieves n/9 comparisons in best case. i.e, 8Gbps throughput in 125MHz.

Example: Patterns Abcdefghi 123456789 Highspeed

Input : High12345678123456789

“high12345” match ****12345 (G4)

“123456781” match ********1 (G8)

“123456789” match 123456789 (G0)

…..

72-bits

G0

G1

G2

G8

Page 133: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.134

GPU-based Content Inspection Technology

Graphics processing unit (GPU) GPU are capable of increasing

powerful computation The proposed scheme can exploit the

resources of the GPU, which is originally idle at most time periods, to accelerate string pattern matching

Parallel stream processors

Page 134: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.135

NVIDIA GPU fragment shader GFLOPS. (Courtesy Ian Buck )

NV40

NV30

NV35

G70G70-512

G71

CPU

0

50

100

150

200

250

300

Jan-03 Jun-03 Apr-04 May-05 Nov-05 Mar-06

Time

GFL

OPS

NVIDIA GPU

x86 CPU

GPU-based Content Inspection Technology

Page 135: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.136

AC String Matching Algorithm Deterministic finite state

automatonnext move function

E H I R S

0 0 1 0 0 3

1 2 1 6 0 3

2 0 1 0 8 3

3 0 4 0 0 3

4 5 1 6 0 3

5 0 1 0 8 3

6 0 1 0 0 7

7 0 4 0 0 3

8 0 1 0 0 9

9 0 4 0 0 3

Page 136: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.137

GPU Architecture NVIDIA GeForce 7900

GTX 8 vertex shaders 24 fragment shaders

Host/FW/VTF

Cull/Clip/Setup

Shader Instruction DispatchZ-Cull

Fragment Crossbar

L2 Tex

Memory Partition

Memory Partition

Memory Partition

Memory Partition

DRAM(s) DRAM(s) DRAM(s) DRAM(s)

Page 137: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.138

GPU Architecture (cont.)

The GPU writes results to the rendering target and draw them on screen at the end of the rendering pipeline

Texture memory

RasterizerCoordinate Transformation

Texture Mapping Pixel TestingLightingVertex Information

Rendering Target

Texture Memory

Vertex Shader

Fragment Shader

Render-to-Texture

Page 138: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.139

Model framework

Three data structures are maintained in the GPU texture memory Automata

texture Text texture State texture

Current State texture

Next State texture

Finite State Automata

TextCurrent

State TableNext State

Table

Texture Memory

Selector

Vertex Shaders

fp 32 Shader Unit

fp 32 Shader Unit

Branch Processor

Fog ALU

Fragment Shder

Texture Cache

Geometry Information

Page 139: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.140140

Data structure Construction phase

The size of symbol space is k The finite state machine contains n

states

36

37

38

0

n-2n-1

0 1

39

k-1

k

4096

(0,0) Automata TextureDFA Table

(4095,0)

(0,k)

Page 140: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.141

Data structure (cont.)

RGBA pixel format The state information of a FSM is

stored as R values in pixels

R G B A (x, y)36

37

38

0

k-1

k

00 01 5a fd fe ff

0

36

0

39 0

((k+1)*256)1/2

((k+1)*256)1/2

Page 141: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.142

R G B A (x, y)36

37

38

0

k-1

k

00 01 5a fd fe ff

0

36

0

39 0

((k+1)*256)1/2

((k+1)*256)1/2

RGBA pixel format The state information of 4 FSMs is

stored as RGBA values in pixels

Data structure (cont.)

Page 142: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.143

Control flow

qurt(N)

qurt(N)

N byte streams

System Buffer

Finite State Automata

TextCurrent

State TableNext State

Table

Texture Memory

Selector

Vertex Shaders

fp 32 Shader Unit

fp 32 Shader Unit

Branch Processor

Fog ALU

Fragment Shder

Texture Cache

Geometry Information

row-major

Text Texture

GPU

Page 143: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.144

The fingerprint of finite state automata

Page 144: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.145

Performance analysis

Memory size for 64K states

(Mbytes)

Memory size for snort 2.4 (Apr.06) (Mbytes)

GeForce 6200

(NV44)

GeForce 6600 GT (NV43-GT)

GeForce 6800 Ultra (NV40)

GeForce 7800 GT (G70)

4 fragment shaders

8 fragment shaders16 fragment

shaders24 fragment

shaders

Strategy 1 256 84.36 933.36 2666.64 4266.64 6400.00

Strategy 2 64 21.09 622.24 1777.76 2844.48 4266.64

Strategy 3 32 10.54 700.00 2000.00 3200.00 5485.68

Page 145: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.146

Agenda

Introduction Pattern Matching Technologies Content Inspection Co-

processors Flow-based Application

Identification Layer 7 Security Switches Conclusion and Open Issues

Page 146: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.147

Considerations when choosing a content coprocessor

Performance Size of supported ruleset

Clam AV 0.7.2 has 22,925 signatures (1.2M rule bytes)

Snort 2.2.0 has 2,085 signatures (24K rule bytes)

SpamAssassin 3.0.1 has 298 signatures (17K rule bytes)

Regex (Regular Expression)ABC * BCD 123????456 (I|You) Love NYOffset/DepthCase sensitive/non-case sensitive

Page 147: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.148

Considerations when choosing a content coprocessor (Cont.)

Work conjunction w/ header fieldsOr Partitionable rules database

Max length of patterns and Max length of search text

Algorithmic attacksConnectivity

Page 148: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.149

Considerations when choosing a content coprocessor (Cont.)

Look-Aside or Flow Through

Size and Cost of External Memory

Multi-packet match support

Auxiliary functionLook-Aside Mode

Flow Through Mode

MII#1 MII#2

Page 149: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.150

Works of a Content Coprocessors

As its name, deal with content (payload) part of network packets.

Target applications:IDS/IPSAV GatewayMail SpamLayer-7 Switch (L7 Load Balance)UTM

Page 150: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.151

Computing Power Analysis

ClamAV90% Pattern Matching

4% Inflating2% De-Mimeing64-ing

2% Other Mime2% Program Overheads

SpamAssassin75% Pattern Matching

20% Program Overheads

5% Decode

Page 151: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.152

Computing Power Analysis

Snort62% Pattern Matching17% Header Classification

13% IP Classification8% Other Matching

Page 152: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.153

PMC-Sierra PMC2329 ClassiPITM

Debut in 2001. Look-Aside Coprocessor (100MHz,

32/64-bit synchronous bus) Supports the implementation of

several protocols in the same equipment to achieve wire-speed performance at Gigabit/OC-48 rates.

Page 153: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.154

ClassiPITM (Cont.)

Source: PMC Sierra

Page 154: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.155

ClassiPITM (Cont.)

Single PM2329 can store up to 16K policy rules.Each Rule is 136-bit (24-bit control fields

and 112-bit data fields) long. Supports composite rules. (up to 4)

Source: PMC Sierra

Page 155: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.156

ClassiPITM (Cont.)

Input buffer is 256 byte * 32 segments. (8K in total)

Maximal pattern length is 192-byte long.

Only supports case-sensitive/non-case sensitive. No Regex.

Up to eight devices can be cascaded. No external memory needed. Will suffer by algorithmic attacks.

Page 156: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.157

ClassiPITM (Cont.)Pros

Flexibility. Handle L2-L7 traffic.Suitable for IPS development.Very good performance.No extra memory needed.

ConsAlgorithmic attacks.Poor matching function.Signature size is quite constrained. (or cascaded solution is quite expensive)

Page 157: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.158

Safenet 4850 Debut in 2003 Industry’s first full regular

expressions content inspection co-processorSupports *, +, ?, <range>, <group>, and

so on. Supports up to 32k signatures.

Each signature character occupies approximately 0.5K rule memory.

2.5Gbps/800Mbps Performance

Page 158: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.159

Safenet 4850 (Cont.)

HOST or

NPU

HOST or

NPU

SafeXcel-4850SafeXcel-4850

ZBTSSRAM

ZBTSSRAM

ZBTSSRAM

ZBTSSRAM

ZBTSSRAM

ZBTSSRAM

ZBTSSRAM

ZBTSSRAM

Traffic Sent to Processor

Packets/Buffers sent to Classifier

Results sent back to Host/NP

Source: Safenet

32/64-bit ZBT or SyncBurst

Page 159: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.160

Safenet 4850 (Cont.)

Look-Aside mode. (32/64-bit ZBT SRAM I/F)

Only deal w/ content. Support multi-packet match.

Using SR (State Record).

Rule memory can be divided into 32 partitions.

Page 160: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.161

Safenet 4850 (Cont.)

An upper bound on the number of reported matches is user configurable (15 max)

Maximal size of each transfer is 16K-byte.

3 types of matching:RegularStart of MatchSub-Expression (/^CookieID=<\d+>/)

Page 161: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.162

Safenet 4850 (Cont.)

ProsSupports full regexHigh performance (conditionally)Supports multi-packet match

ConsDatabase Size (8 * 4K * 0.5K = 16MB)

Wildcard doubles database sizeCompiler’s issue (cost, w/o incremental compiling)

Page 162: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.163

IDT PAX.port 2500TM

Up to 2.5Gbps performance (833Mbps * 3)

Supports both Flow-Through mode and Look-Aside mode

64-bit tag and 192-bit digest Programmable Classification

Almost any format could be described by PDL

16MB Rule Memory for 1K URL rules Supports Regex

Non-Greedy (abc.*def and abcxxxdefyyydef)

Lacks of set. (positive set, negative set, etc)

Page 163: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.164

IDT PAX.port 2500TM (Cont.)

Source: IDT

Page 164: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.165

IDT PAX.port 2500TM (Cont.)

ProsFlexible operation modeProgrammable classificationReference Design (cowork w/ Intel IXP 2400)

ConsConstrained rule size3 pattern memories store same things.

Non-deterministic performancePer-Packet Based (poor text size and lacks of multi-packet match)

Page 165: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.166

Sensory Networks C-Series

Engine speed is up to 1Gbps.Look-Aside Mode (PCI 64/32 or

PCI-X)CorePAKT technology

compresses ClamAV database into 18MB.

Stream-Based matching

Page 166: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.167

Sensory Networks C-Series (Cont.)

Supports Regex., \, […], [^…], |, (…), {n, m}, *, +, ?

offset, depth

Compression/Decompression engine (zip, gzip, lzw, cab)

Decode engine (MIME. Base64 and QP)

MD5 hashing engine

Page 167: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.168

Sensory Networks C-Series (Cont.)

Source: Sensory Networks

Page 168: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.169

Sensory Networks C-Series (Cont.)

NadalCore SPU

Rule Memory

Source: Sensory Networks

Page 169: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.170

Sensory Networks C-Series (Cont.)

ProsSupports sufficient rule memory and regex function in the same time.

With MIME decode and decompression: Good for AntiSpam and AV

Stream mode. (Supports up to 1M streams)

ConsCannot achieve gigabit wire speed

Long compiling time

Page 170: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.171

Netlogic’s Knowledge based Processors

Page 171: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.172

Agenda

Introduction Pattern Matching Technologies Content Inspection Co-

processors Flow-based Application

Identification Layer 7 Security Switches Conclusion and Open Issues

Page 172: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.173

2Login 70 1 PASS_Per

PASV_ReqPASV_Ok

3

ACTIVE_Req

ACTIVE_Ok

5

4

6

LIST_Req

LIST_Ok

File_ReqFile_Ok

Flow_Close

Transitions Trans. ports Patterns

Login

PASS_Per

ACTIVE_Req

ACTIVE_Ok

PASV_Req

PASV_Ok

LIST_Req

LIST_Ok

File_Req

File_Ok

TCP, dport:21 “ PASS”

“ 230” , ” User” , ” logged in”

“ PORT”

“ 200 PORT command successful”

“ 227 Entering Passive Mode”

“ PASV”

“ LIST”

“ 226 Transfer complete.”

“ 226 Transfer complete”

“ RETR”

TCP, sport:21

TCP, dport:21

TCP, dport:21

TCP, dport:21

TCP, dport:21

TCP, sport:21

TCP, sport:21

TCP, sport:21

TCP, sport:21

Flow Classification Using Stateful Method

Page 173: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.174

The FAs of eDonkey protocols

Slot_Request1

Slot_Taken

Request_Parts2

Compressed_Data_0 3

Compressed_Data_1

4Slot_Release

Slot_Request

0

1Hello_Server

2

Get_Sources Found_Sources

3

Search_File_ResultsSearch_File

Offer_Files1

Hello_Client

2

eMule_Hello

3

Hello_Answer

Hello_Answer0

Server_Info_Data

0

Connect PhaseTransitions Patterns Transitions Patterns

Download Phase

Hello_Server

Search_File

Search_File_Results

Get_Sources

Found_Sources

Offer_Files

Server_Info_Data

Hello_Client

Hello_Answer

eMule_Hello

0xe3, 0x01

0xe3, 0x1601

0xe3, 0x15

0xe3, 0x19

0xe3, 0x42

0xe3, 0x33

0xe3, 0x410xe3, 0x0101

0xe3, 0x4c

0xc5, 0x01

Slot_Request

Slot_TakenRequest_Parts

Compressed_Data_0

0xe3, 0x54

0xe3, 0x570xe3, 0x47

0xe3, 0x46

Slot_Release 0xe3, 0x56

Compressed_Data_1 0xe3, 0x46

Page 174: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.175

The FAs of BitTorrent protocols.

1Annouce

2Get_Peers Connect_to_Peer

0 0 1

Connect PhaseTransitions Patterns Transitions Patterns

Download Phase

“ GET /announce”“HTTP/1.0 200 OK”, “e5:peers”

0x13, “ BitTorrent protocol”Connect_to_PeerAnnouce

Get_Peers

Page 175: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.176

The FAs of Yahoo Messenger protocol.

Flow_Close

Login PhaseTransitions Patterns

Chat PhaseAuth_Resp

Trans. portsTCP, sport:5050

0 1Auth_Resp “ YMSG” , 0x54

2

3

P2P_File_Tx

File_Tx_Status_BRB

Msg_Service

Transitions PatternsTrans. portsMsg_Service

P2P_File_Tx

File_Tx_Status_BRB

“ YMSG” , 0x06

“ YMSG” , 0x4d“ YMSG” , 0x4d, 0x1

TCP, sport:5050

Page 176: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.177

Agenda

Introduction Pattern Matching Technologies Content Inspection Co-

processors Flow-based Application

Identification Layer 7 Security Switches Conclusion and Open Issues

Page 177: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.178

Layer 7 Security Switches

Firewalls and IDS/IPS are installed near the routers to prevent attacks from Internet.

More than 80% attacks are launched from the affected hosts inside the intranet

Defense-in-depth is emerged to prevent attacks not only from the Internet but also from the internal hosts.

Leads the need of security switches to provide the first mile protection.

Current security switch solutions are relative still expensive.

Page 178: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.179

Cisco Security Switch Solution

Cisco Catalyst 6500 Series IDS (IDSM-2) Services Module (the 2nd generation of IDS/IPS module) is in the widely deployed Cisco Catalyst chassis.

It supports both the in-line IPS mode and passive operation IDS mode

Page 179: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.180

3Com Security Switch Solution

Each centrally controlled security platform supports several concurrent operations of inspection such as firewalls, VPNs, IDS, IPS, antivirus security, and URL filtering

3Com Security Switches 7280 and 7245 are suited to carrier-class, service-provider businesses

Page 180: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.181

3Com 6200 Architecture

Page 181: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.182

Alcatel’s OmniSwitch Solution

Alcatel released an Automated Quarantine Engine, which works with IDS to identify network threats and to shut off or restrain attacks through switch hardware.

Alcatel's stackable OmniSwitch 6600 and its modular 7700- and 8800-series products use a combination of 802.1X authentication technology and APIs from Sygate to block a virus-infected PC from accessing a corporate LAN.

Page 182: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.183

The authentication operation of Alcatel OmniSwitch 6600

Page 183: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.184

Alternative Cost-Effective Security Switch Architecture

Most of the current available security switch solutions are chassis-based design which is suitable for either core networks or big enterprise.

Nevertheless, L2 switches are most likely the widely deployed network equipment in the world.

Any easy mechanism to “upgrade” the L2 switches to L7 security switches ?

A cost-effective Security Switch Architecture is proposed here

Page 184: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.185

Layer 7 Security Switch Concept

Each “security switch” is composed a traditional layer-2 switch and a “security switch engine (SSE)” which provides layer-7 packet inspection service.

Coupled by GE link.

Page 185: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.186

SecuritySwitch

SW

GE

FE FE

SSE

PVID

1

PVID

2

PVID

3

PVID

4

Layer 7 Security Switch Architecture

Victim

IPv6 Attacks

IPv4 Attacks

Security Service EngineAnti-virus/wormAnti-IntrusionAnti-P2P/IMAnti-Spyware, etc

Page 186: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.187

Network Security Switch Architecture

Secure ports Un-trusted ports Traffic to/from un-

trusted ports are forwarded to SSE for inspection

Malicious packets are dropped

Attacking ports can be isolated or blocked

Page 187: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.188

A Scalable HA/LB Architecture for Security Switch

High availability (HA) and Load Balancing (LB) are critical features for security switch.

The key component is “hardware bypass ports” (HBP) -- a pair of two Ethernet ports that will be “bypassed” when the power or the system is down.

SW1

GE

SSE1

SW2

GE

SSE2

GE

FE FE

Page 188: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.189

Scalable HA/LB Architecture

A scalable load balance and high availability (LB/HA) architecture for network security switches is proposed

A mechanism is designed to interconnect the SSEs so that the “security switches” group furnishes a novel HA feature.

An intelligent load balancing scheme is also designed for the SSE so that the security service can be balanced among the SSEs

Page 189: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.190

HA Architecture for four “Security Switches”

SW1

GESSE1

SW2

GESSE2

SW3

GESSE3

SW4

GESSE4

Page 190: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.191

HA Architecture when SSE 2 fails

SW1

GESSE1

SW2

GESSE2

SW3

GESSE3

SW4

GESSE4

Page 191: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.192

HA Architecture when SSE 2 and SSE 3 fail

SW1

GESSE1

SW2

GESSE2

SW3

GESSE3

SW4

GESSE4

Page 192: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.193

HA Architecture when only SSE 1 survives

SW1

GESSE1

SW2

GESSE2

SW3

GESSE3

SW4

GESSE4

Page 193: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.194

Scalable HA/LB Security Switch Prototyping

SW1

SSE1

SW3

SSE2

SW4

SSE3

SW2

SSE4

GE Links

Page 194: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.195

High Availability test Scenario

Initially, stable traffic is processed by each SSE. Then the SSEs are turned off one by one: A, B, and C.

The traffic of turned off SSE is transferred to the backup SSE for further processing (HA).

Still works even ONLY ONE SSE is survive.

SWD

GESSED

SW A

GESSEA

SW B

GESSEB

SW C

GESSEC

Page 195: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.196

High Availability Test Result

A+C

B+D

A+B+C+D

(1)

(2)

(3)

Page 196: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.197

Load Balance Evaluation Scenario

time(Min)

Throughput(Mbit/s)

SSE A SSE B SSE C SSE D

0 5 10 15

100

(1) (4)(3)(2)

Traffic generator (IXIA) is used to evaluate load balance function. For each SSEi, the increasing traffic load (up to 100Mbps) is sent into the switch SWi at different time.

Page 197: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.198

Load Balance Test Result

SSE A SSE B

Page 198: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.199

Load Balance Test Result

SSE C SSE D

Page 199: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.200

ASIC-based Security Switch

MII

SW

SoC

Signature DB

Page 200: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.201

ASIC-based Security Switch

BroadWeb Security SoC ARM922 RISC CPU (200Mhz) Hardware NAT Hardware IPS Engine Two 10/100/1000 RJ-45 Ports

Embedded-Linux NSS approved IPS signature

database 1950+ Signatures Automatically Internet Upgrade

Drop malicious packets Isolate/Block attacking ports

Page 201: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.202

Agenda

Introduction Pattern Matching Technologies Content Inspection Co-

processors Flow-based Application

Identification Layer 7 Security Switches Conclusions and Open Issues

Page 202: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.203

Pattern matching is the key content inspection technology for layer-7 security devices.

More and more attacks are launched from the affected hosts inside the intranet – Needs defense-in-depth.

Security switches provide first mile protection Wireless Security Switch as well

But still have some challenges (opportunities) Features (IPS switch or UTM switch,

or others ?) Cost Performance -- wire-speed or near

wire-speed Management -- Signature upgrade

Conclusions

Page 203: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.204

New switch architectures need Integrate Content Inspection into L2

switching fabric DoS/DDoS Chip -- Zero-day defense L2 Switch + Content Inspection

Coprocessor

Conclusions

Page 204: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.205

How to identify and management encrypted protocols ? such as Skype 2.0 and Winny. Not by signatures (no signatures ?) May be by state machines

How to design fast content inspection or pattern matching algorithms ? Modified AC algorithm or others Using Cache efficiently Pre-filter is good Post Filter is also necessary (Rule are

more complex)

Open Issues

Page 205: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.206

How to design fast content inspection co-processor ? Regular Expression is necessary Many commercial products already, such

as SafeNet 4850, Sensory Networks C-2000, IDT, Cavium, Netlogic, etc

Network Access Control (NAC) is a new emerging trend.

Open Issues

Page 206: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.207

References[1] Robin Sommer, Vern Paxson, “Enhancing Byte-Level Network

Intrusion Detection Signatures with Context.”, Proceedings 10th ACM Conference on Computer and Communications Security, 2003

[2] . Handley, C. Kreibich and V. Paxson. “Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics.” in Proceedings USENIX Security Symp 2001.

[3] R. S. Boyer and J. S. Moore, “A fast string searching algorithm,” Communications of the ACM, vol. 20, no. 10, Oct. 1977, pp. 762-772.

[4] K. G. Anagnostakis, E. P. Markatos, S. Antonatos, and M. Polychronakis. “E2xB: A domainspecific string matching algorithm for intrusion detection,” Proceedings of the 18th IFIP International Information Security Conference (SEC2003), May 2003.

[5] A. Aho and M. Corasick, “Efficient string matching: An aid to bibliographic search,” Communications of the ACM, vol. 18, no. 6, June 1975, pp. 333-343.

[6] Sun Wu and Udi Manber, “A fast algorithm for multi-pattern searching,” Tech. Rep. TR94-17, Department of Computer Science, University of Arizona, May 1994

[7] R.T. Liu, C.H. Chen, C.N. Kao, N.F. Huang, “A Fast String Matching Algorithm for Network Processor-based Intrusion Detection Systems", ACM Transactions on Embedded Computer Systems, Vol. 3, No. 3, August 2004, pp. 614 – 633 .

Page 207: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.208

References

[8] Sarang Dharmapurikar, Praveen Krishnamurthy, Todd S. Sproull, John W. Lockwood. “Deep Packet Inspection using Parallel Bloom Filters.” IEEE Micro 24(1): 52-61 (2004)

[9] Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese, “Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection,” IEEE Infocom 2004.

[10} Tzu-Fang Sheu, Nen-Fu Huang, Hsiao-Ping Lee, “A Novel Hierarchical Matching Algorithm for Intrusion Detection Systems,” IEEE GLOBECOM 2005, St. Louis, Missouri, USA, November 2005.

[11] Nen-Fu Huang, Chih-Hao Chen, Rong-Tai Liu, Chia-Nan Kao, “On the Design of a Cost Effective Network Security Switch Architecture,” IEEE GLOBECOM 2005, St. Louis, Missouri, USA, November 2005.

[12] Sung-Hua Wen, Chih-Chiang Wu, Nen-Fu Huang, and Chia-Nan Kao, “A Pattern Matching Coprocessor for Deep and Large Signature Set in Network Security System,” IEEE GLOBECOM 2005, St. Louis, Missouri, USA, November 2005.

[13]Nen-Fu Huang, Chih-Hao Chen, Yuan-Fang Huang, Yi-Husan Feng, Chia-Nan Kao, “A Scalable Architecture for High Available Security Switch,” IEEE ICC 2006, Istanbul, June 2006.

Page 208: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.209

References

[14] Nen-Fu Huang, Yen-Ming Chu, Chen-Ying Hsieh, Yih-Jou Tsang, “A Deterministic Cost-effective String Matching Algorithm for Network Intrusion Detection Systems,” IEEE ICC2007, Glasgow, Scotland, UK, June 2007.

[15] Nen-Fu Huang, Yen-Ming Chu, Chi-Hung Tsai, Yih-Jou Tsang, “A Novel Algorithm and Architecture for High Speed Pattern Matching in Resource-limited Silicon Solution,” IEEE ICC2007, Glasgow, Scotland, UK, June 2007.

[16] Yi-Hsuan Feng, Nen-Fu Huang, Rong-Tie, Liu, Meng-Huan Wu, “Flow Digest: A State Synchronization Scheme for Stateful High Availability,” IEEE ICC2007, Glasgow, Scotland, UK, June 2007.

[17] Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu, “Performing Packet Content Inspection by Longest Prefix Matching Technology,“ IEEE GLOBECOM2007, Washington DC, USA, November 2007.

[18] Nen-Fu Huang, Hsien-Wei Hung, Sheng-Hung Lai, Yen-Ming Chu, and Wen-Yen Tsai, “A GPU-based Multiple-pattern Matching Algorithm for Network Intrusion Detection Systems,” The Fourth International Symposium on Frontiers in Networking with Applications (FINA2008), March 2008, Okinawa, Japan.

[19] Nen-Fu Huang, Gin-Yuan Jai, and Han-Chieh Chao, “Early Identifying Application Traffic with Application Characteristics,” IEEE ICC2008, Beijing, China, May 2008.

Page 209: All rights reserved. No part of this publication and file may be reproduced, stored in a retrieval system, or transmitted in any form or by any means,

DPI.210

References

[20] Tzu-Fang Sheu, Nen-Fu Huang, Hsiao-Ping Lee, “A Hierarchical Multi-pattern Matching Algorithm for Network Content Inspection,” Information Sciences (SCI), Vol. 174, Issue 14, July 2008, pp. 2880-2898.

[21]Yen-Ming Chu, Nen-Fu Huang, Chi-Hung Tsai, Chen-Ying Hsieh and Pei-Lei Chen,”A Software-based String Matching Algorithm for Resource-restricted Network System, IEEE Communications Letters (SCI), Vol.12., No.8, August 2008.

[22] Tzu-Fang Sheu, Nen-Fu Huang, Hsiao-Ping Lee, “In-depth Packet Inspection Using a Hierarchical Pattern Matching Algorithm,” IEEE Transactions on Dependable and Secure Computing (SCI), Vol.7, Issue.2, June 2010, pp.175-188.

[23] Yi-Hsuan Feng, Nen-Fu Huang, Chia-Hsiang Chen. “An Efficient Caching Mechanism for Network-based URL Filtering by Multi-level Counting Bloom Filters', IEEE ICC2011, Kyoto, Japan, June 2011.

[24] Nen-Fu Huang, Yen-Ming Chu and Hsien-Wen Hsu (2011). Graphics Processor-based High Performance Pattern Matching Mechanism for Network Intrusion Detection, Intrusion Detection Systems, Pawel Skrobanek (Ed.), ISBN: 978-953-307-167-1, InTech, 2011. Available from: http://www.intechopen.com/articles/show/title/graphics-processor-based-high-performance-pattern-matching-mechanism-for-network-intrusion-detection