5
8/20/2019 Multi packet classification http://slidepdf.com/reader/full/multi-packet-classification 1/5 A Customized TCAM Architecture for Multi-Match Packet Classification Miad Faezipour  and  Mehrdad Nourani Center for Integrated Circuits & Systems The University of Texas at Dallas, Richardson, TX 75083 {mxf042000, nourani} @utdallas.edu  Abstract — Most conventional packet classifiers find the highest priority filter that matches the packet. However, new networking applications such as Network Intrusion Detection Systems (NIDS) and load balancers require all (or the first few) matching results in packet classification. A TCAM-based architecture optimized for multiple match search is introduced in this paper. We propose a renovated TCAM design that can find all or the first  r  matches in a packet filter set. Our scheme partitions the filter set and performs the multi-match search on one partition only. A VLSI implementation of our classifier in 0.18  µm technology can achieve speed that is 1-2 order of magnitude higher than software based approaches. Power consumption is reduced by  40%  which is far better than the conventional hardware based multi-match designs. I. I NTRODUCTION  A. Background Packet classification in general refers to finding the best matching filter containing multiple fields among the  filter (also called  rule  in the literature) set for a given packet. The standard five-tuple fields include the source address, destination address, protocol, source port and destination port [1]. Among these fields, source and destination address fields are prefixes and often require longest prefix match (LPM) methods. Protocol field can be wild cards or exact values. Source and destination port numbers are typically introduced as ranges . Packet classification performs searching the table of filters to assign a flow identifier for the highest priority filter which matches the packet in all fields. The returning flow ID indicates the action that is next applied to the packet. Filter fields are combination of prefixes, wild cards and exact values. Hence, Ternary Content Addressable Memories (TCAMs) that have the ability to store  don’t-care  values in addition to 1’s and 0’s are often utilized to store filters and perform the parallel search in packet classification. A traditional packet classifier assigns one TCAM entry for each filter and finds the index of the highest priority matching filter in the database (Figure 1). Range fields may cause more than one TCAM entry to introduce a filter. However, TCAM range matching solutions can be found in [1] and [2]. Each filter  f i (0 ≤ i  ≤ n 1) contains multiple fields (e.g. in the standard five-tuple), and also the filter database often may have up to hundred thousand filters. Therefore, wide TCAM devices both in terms of bits and entries are used for packet classification applications.  B. Motivation New emerging networking applications such as Network In- trusion Detection Systems (NIDS) and load balancers require finding all or the first few matching filters in packet classifi- cation. Malicious intrusions and denial-of-service attacks, that are expected to grow rapidly, can be monitored and detected by NIDS. Once all the matching filter headers are found, a detection system such as Snort [3] scans the packet payload for existing worms. The concept of multi-match classification for NIDS is becoming a major stream of research in the near future, since there is a great demand for network worm detections. Packet level accounting, transparent monitoring and the Programmable Network Element (PNE) are other networking applications of multi-match classification. PNE is the general platform for packet processing in layers 2-4 in the edge. Packets entering PNEs are classified to identify the relevant functions. Multi-matching classification can be utilized to support multiple functions in PNEs [4]. Single-Match Priority Encoder 2 m2 m1 m0 n-1 1 Filter fn-1 Filter f2 Filter f1 0 Filter f0 Words Index Lines Match TCAM m . . . . . . Search Key  2 log n r e d o c n E n-1 r e z i t i r o i r P . . . Figure 1. TCAM structure as a single-match classifier The single and multi-match problem in packet classification should be solved in a sufficient speed to keep up with the high data rate. Pure software solutions suffer from low speed, since they often require several instructions and as a result several memory accesses to find a single or multiple matches. In the past few years researchers in industry and academia employed architectural solutions to the problem. This stream of research proposed the most widely used packet classification device technology, TCAM [5]. TCAMs are well suited for perform- ing high speed parallel searches on database with ternary entries, since they provide the match results with determin- istic throughput (i.e. one search per cycle) and deterministic capacity. Hence, TCAM has become quite popular for packet classification tasks [1], [6], [7], [8]. While TCAMs perform packet classification at high speed, they cannot directly report all possible matches in a database. This is due to the native structure of a TCAM cell design, which consists of a priority encoder, resulting in the highest priority match. C. Main Contribution We propose a design for multi-matching packet classifica- tion by modifying the prioritizer circuit in the conventional TCAM cell. Since the TCAM entries remain unchanged in our design, our multi-matching hardware can be easily adopted for IPv6 where the bit width of the TCAM entries highly increases [9], [10]. Performance of conventional (software- based) classification techniques linearly grows with number of filters. Our system finds  r  (pre-defined by a user) matches © -4244-0357-X/06/ 20.00 2006 IEEE This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2006 proceedings.

Multi packet classification

Embed Size (px)

Citation preview

Page 1: Multi packet classification

8/20/2019 Multi packet classification

http://slidepdf.com/reader/full/multi-packet-classification 1/5

A Customized TCAM Architecture for Multi-Match Packet Classification

Miad Faezipour and   Mehrdad Nourani

Center for Integrated Circuits & Systems

The University of Texas at Dallas, Richardson, TX 75083

{mxf042000, nourani}@utdallas.edu

 Abstract— Most conventional packet classifiers find the highestpriority filter that matches the packet. However, new networking

applications such as Network Intrusion Detection Systems (NIDS)and load balancers require all (or the first few) matching resultsin packet classification. A TCAM-based architecture optimizedfor multiple match search is introduced in this paper. We proposea renovated TCAM design that can find all or the first  r  matchesin a packet filter set. Our scheme partitions the filter set andperforms the multi-match search on one partition only. A VLSIimplementation of our classifier in 0.18 µm technology can achievespeed that is 1-2 order of magnitude higher than software basedapproaches. Power consumption is reduced by   40%   which isfar better than the conventional hardware based multi-matchdesigns.

I. INTRODUCTION

 A. Background 

Packet classification in general refers to finding the best

matching filter containing multiple fields among the   filter 

(also called   rule   in the literature) set for a given packet.

The standard five-tuple fields include the source address,

destination address, protocol, source port and destination port

[1]. Among these fields, source and destination address fields

are prefixes and often require longest prefix match (LPM)

methods. Protocol field can be wild cards or exact values.

Source and destination port numbers are typically introduced

as ranges . Packet classification performs searching the table

of filters to assign a flow identifier for the highest priority filter

which matches the packet in all fields. The returning flow ID

indicates the action that is next applied to the packet.

Filter fields are combination of prefixes, wild cards and

exact values. Hence, Ternary Content Addressable Memories

(TCAMs) that have the ability to store   don’t-care   values

in addition to 1’s and 0’s are often utilized to store filters

and perform the parallel search in packet classification. A

traditional packet classifier assigns one TCAM entry for each

filter and finds the index of the highest priority matching filter

in the database (Figure 1). Range fields may cause more than

one TCAM entry to introduce a filter. However, TCAM range

matching solutions can be found in [1] and [2]. Each filter   f i

(0 ≤  i  ≤  n − 1) contains multiple fields (e.g. in the standardfive-tuple), and also the filter database often may have up to

hundred thousand filters. Therefore, wide TCAM devices both

in terms of bits and entries are used for packet classification

applications.

 B. Motivation

New emerging networking applications such as Network In-

trusion Detection Systems (NIDS) and load balancers require

finding all or the first few matching filters in packet classifi-

cation. Malicious intrusions and denial-of-service attacks, that

are expected to grow rapidly, can be monitored and detected

by NIDS. Once all the matching filter headers are found, a

detection system such as Snort [3] scans the packet payloadfor existing worms. The concept of multi-match classification

for NIDS is becoming a major stream of research in the

near future, since there is a great demand for network worm

detections.

Packet level accounting, transparent monitoring and the

Programmable Network Element (PNE) are other networking

applications of multi-match classification. PNE is the general

platform for packet processing in layers 2-4 in the edge.

Packets entering PNEs are classified to identify the relevant

functions. Multi-matching classification can be utilized to

support multiple functions in PNEs [4].

Single-Match Priority Encoder

2m2

m1

m0

n-1

1

Filter fn-1

Filter f2

Filter f1

0 Filter f0

Words

Index

LinesMatchTCAMm

.

.

.

.

.

.

Search Key   2log n

r

e

d

o

c

n

E

n-1

r

e

z

i

t

i

r

o

i

r

P

.

.

.

Figure 1. TCAM structure as a single-match classifier

The single and multi-match problem in packet classification

should be solved in a sufficient speed to keep up with the high

data rate. Pure software solutions suffer from low speed, since

they often require several instructions and as a result severalmemory accesses to find a single or multiple matches. In the

past few years researchers in industry and academia employed

architectural solutions to the problem. This stream of research

proposed the most widely used packet classification device

technology, TCAM [5]. TCAMs are well suited for perform-

ing high speed parallel searches on database with ternary

entries, since they provide the match results with determin-

istic throughput (i.e. one search per cycle) and deterministic

capacity. Hence, TCAM has become quite popular for packet

classification tasks [1], [6], [7], [8]. While TCAMs perform

packet classification at high speed, they cannot directly report

all possible matches in a database. This is due to the native

structure of a TCAM cell design, which consists of a priorityencoder, resulting in the highest priority match.

C. Main Contribution

We propose a design for multi-matching packet classifica-

tion by modifying the prioritizer circuit in the conventional

TCAM cell. Since the TCAM entries remain unchanged in our

design, our multi-matching hardware can be easily adopted

for IPv6 where the bit width of the TCAM entries highly

increases [9], [10]. Performance of conventional (software-

based) classification techniques linearly grows with number

of filters. Our system finds  r  (pre-defined by a user) matches

©-4244-0357-X/06/ 20.00 2006 IEEEThis full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2006 proceedings.

Page 2: Multi packet classification

8/20/2019 Multi packet classification

http://slidepdf.com/reader/full/multi-packet-classification 2/5

in at most  r  cycles regardless of total number of filters. Such

property significantly improves the performance such that 1-2

order of magnitude higher speedup becomes achievable.

II. PRIOR WORK

Some recent work focused on multi-match packet classifi-

cation using TCAMs. Authors in [4] reorganize the TCAM

entry filters in a compatible order to report all matches. The

authors use a geometric intersection scheme to remove the

overlaps and negation among the intersecting filters placedin the TCAM. Their multi-search mechanism is based on

the TCAM search along with a SRAM to store all matching

indices.

The Set Splitting Algorithm (SSA) introduced in [11] splits

the filter set into two groups to remove at least half of the

intersections among filters. It then performs the search on

multiple groups in parallel. This method is based on minimum

intersections among filters; however, it adds filters for partially

overlapped filters in one set.

The authors in [2] address the problem of finding multiple

matches in a TCAM by proposing the multi-match using

discriminators (MUD) algorithm. In this algorithm, the extra

bits per TCAM entry are used for the required encoding. For

getting the next matching result from the TCAM after filter

 f  j, the search on all the entries after index   j is performed. To

accomplish this, the search key is expanded to prefixes that

correspond to the range greater than   j. The authors use the

idea that along with each TCAM entry, a discriminator field

that encodes the index of that entry is stored.

Entry-invalidation scheme, also discussed in [2], is one of 

the earliest and simplest schemes. In this method, a valid bit

in addition to the header fields is associated with each TCAM

entry. Initially, all entries have their valid bits set to ”1”.

Searches are performed multiple times to find all matching

entries. Each time a match is found, the valid bit is set to ”0”for that matching entry. The same search key is applied again

until all matches are found.

The BV-TCAM architecture introduced in [8] combines

the TCAM and the Bit Vector (BV) algorithm to address

the problem of packet classification for network intrusion

detection. The authors use the tree-bitmap implementation of 

the BV approach for the source and destination port lookup,

while using a TCAM for the search of the other header fields.

They use the un-encoded TCAM to report the matching status

of the corresponding TCAM entry. The authors implement

their classification engine in an FPGA.

III. MULTI-MATCH PRIORITIZER UNIT

A TCAM includes a priority encoder. One possible imple-

mentation of the priority encoder is shown in Figure 1 by

splitting it into a prioritizer and a conventional encoder. A

valid   log2n-bit address out of the encoder is generated only

if at most one of its   n   inputs is high at a time. Thus, the

prioritizer unit which provides the encoder input lines should

be designed carefully. The main idea is to modify the single-

match prioritizer unit to a Multi-match Prioritizer (MPZ), as

shown in Figure 2 so that the encoder would generate all

matching indices, one at a time.

A power optimized priority encoder cell introduced in [12],

is used as our reference model for the prioritizer unit. We add

IndexTCAMWords

EN

m0

m1

EP0

EP1

EPn-1

MPZ

Clk r

Multi-Match Priority Encoder

Search Key

E

n

c

o

d

e

r

.

.

.

n-1m

.

.

.

log n2

Figure 2. Conceptual block diagram of the design

a control logic circuitry to this prioritizer circuit to report allmatches in a prioritized sequence. The MPZ circuit, shown in

Figure 3, functions in response to a counter which counts from

0 to  n, where  n is the highest possible number of matches. In

other words,  n  can be assumed as the number of inputs in the

worst case. On the first clock cycle, the MPZ should function

as a single-prioritizer unit, reporting the highest priority match.

On the next clock cycle the next highest priority match should

be provided at the output. This procedure should be followed

for all other clock cycles until the counter has reached counting

up to  n. In each clock cycle, a function of the original inputs

and the higher priority outputs of the prioritizer circuit in

the previous clock cycle should be fed through the prioritizer

circuit. Let   mi   denote the original input lines (i.e. matchlines from TCAM words),   epi  denote the   EPi  outputs of the

prioritizer after one clock cycle,   M i  be the set of inputs that

should be given to the prioritizer circuit, and E N  be the enable

line. The logic equation1 for the MPZ circuit can be derived

as follows:

 EPi =

  EN  · M i   i = 0

 EN  · (∏i−1k =0 M k ) · M i   1≤ i ≤ n−1

  (1)

where,  M i  in Equation 1 can be computed as:

 M i =

s ·mi   i = 0

s ·mi +s ·mi · epi−1   i = 1

s ·mi +s · (mi ·∑i−1

k =0

epk )   2≤ i ≤ n−1

(2)

Signal   s = ∑log2n−1k =0   ck  is the select line of the multiplexers

that control which data should be chosen for the corresponding

 M i. This select line should be low for the first clock cycle

and high for the rest. Thus,   s can be implemented by simply

ORing all the counter outputs ci. The MPZ unit functions in an

efficient manner; in essence it reports all  r  matches in exactly

r -cycles. This implies that in case of the need to report the first

r  matches instead of all possibles matches, a comparator unit

could be added to the MPZ design wherein the count value

c  and the   r  value are compared. Once the count exceeds   r ,

the enable line   EN  is set to zero; hence disabling the MPZ

unit. Figure 3 illustrates the multi-matching prioritizer unit

architecture.

IV. SCALABILITY

The multi-match prioritizer unit can be designed for any

number of   n  inputs. However, the synthesized MPZ design

for a real database which requires up to thousands of TCAM

entries would be very costly. Hence, a modular design that

could scale up to the required  n inputs is essential.

IPv6 has much longer fields in the header of packets. Our

design does not alter the filters placed in the TCAM; hence,

making it suitable for multi-threaded environments. Moreover,

1Throughout this paper, symbols +, ·,∏ and ∑ stand for Boolean notations.

©-4244-0357-X/06/ 20.00 2006 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2006 proceedings.

Page 3: Multi packet classification

8/20/2019 Multi packet classification

http://slidepdf.com/reader/full/multi-packet-classification 3/5

Counter

Clk 

m2

m0

m1

EP0

EP1

EP2

EPn-2

EPn-1

M2

M1

M0

en

EN

Q D

Comparator

s

s

s

s

s

Prioritizer

r[2:0]Clk 

Q D

Q D

Q D

Clk 

Clk 

Clk 

Clk 

epn-2

ep2

ep1

ep0

c[2:0]

c[2:0]

m0

m1m1

m2m2

..

.

 . .

.

n-1M

..

.   OE

n-1m n-1

m

 .

..

.

.

n-1m

..

.

..

..

Figure 3. The MPZ architectural design

the MPZ approach processes the match lines coming from the

TCAM words. Therefore, it can efficiently be scaled to IPv6

classifiers that use large TCAMs or multiple parallel TCAMs

for packet classification.

To achieve a modular design with cascaded blocks we

define an   output enable  (OE ) line which indicates when all

the matches have been provided at the output. This signal

is activated when any match is found at the output, and

deactivated when all the matching results have been provided.

The   OE  signal also highly depends on the   EN   line. The   OE line in a MPZ can be expressed as follows:

OE  =  E N +n−1

∑i=0

 EPi   (3)

Our approach, similar to the multi-level look ahead archi-

tecture [13], would be to cascade  v w-bit MPZ units to design

a  n = v×w multi-match prioritizer block. Figure 4 shows the

concept of cascading eight, 8-bit MPZ modules to design a

64-bit MPZ. By connecting the   OE  line of each stage to the

 EN  (enable line) of the next stage (in this case higher priority

stages are placed at the left), we assure that each block would

be enabled only if all the higher priority blocks have completedreporting their matches at the output. The cascaded design

would have at most two additional clock cycle delays for each

mismatching MPZ unit. Hence, if all matches are concentrated

within one block of MPZ, a high throughput can be achieved.

As authors in [2] stated, the maximum degree of matches

(number of matches) often requested in real-world ACL filter

database is statistically around 8. Considering this fact the

counter used in any MPZ unit can be designed to count up

until 8. This implies that only  l og28 = 3 lines are required for

the counter. In addition, the comparator can be designed to

compare the count value and   r  = 8.

EP[15:8]EP[7:0]

m[63:56]m[55:48]m[47:40]m[39:32]m[31:24]m[23:16]m[15:8]m[7:0]

MPZ8MPZ7MPZ6MPZ5MPZ4MPZ3MPZ2MPZ1

EP[23:16]

"1"

OEENOEENOEENOEENEN OEOEENOEENEN OE

EP[63:56]EP[55:48]EP[47:40]EP[39:32]EP[31:24]

Figure 4. Cascaded MPZ architecture for a 64-bit MPZ design.

V. INTERSECTION-DRIVEN PARTITIONING STRATEGY

Intersection among filters in the database mainly results in

multiple matches. Therefore, partitioning the filter set based on

filter intersections, and performing the search on a partition,

can significantly improve the performance.

 A. Maximum Intersection Partitioning Scheme

From the MPZ design, we have seen that by having all

matches concentrated within one block, the multi-match per-

formance can achieve the highest extreme in performance; in

other words it is capable of finding  r  matches in  r  cycles. The

Maximum Intersection Partitioning scheme or MXIP which is

explained in this section, partitions the filters in the databasesuch that each partition would hold the maximum number of 

intersections among its filters. In addition, partitions will be

disjoint, i.e. any pair of partitions do not have any overlap in

the filters that they contain. Since there would always be a

number of filters that do not have any intersection with any

other filter, one last partition is needed in which all these

distinct  filters can be placed.

Partitioning Heuristic:

Let   f i[w− 1 : 0]   and   f  j[w− 1 : 0]   denote two filters of bit-

width   w. We define   distance for the two filters as expressed

in Equation 4:

d i, j =

w−1

∑k =0 f i[k ]⊗ f  j[k ]   (4)

where, ⊗  in Equation 4 is defined as follows:

a⊗b =

  0 if  a or  b is a   don’t-care

a⊕b   otherwise  (5)

A brief pseudo-code for generating the partitions based on

the concept of maximum intersection is shown in Figure 5.

In this pseudocode,   F   refers to the set of all filters, and   Pm

denotes the   m-th partition. In every step the first element in

F   (i.e.   f 1) is the seed of partition. The   for   loop grows the

partition around the seed based on the maximum intersection

partitioning heuristic. Lines 16-23 indicate that all distinct

filters (that make no intersection with other filters) will be

assigned to a separate partition.  N  p would be the total number

of partitions generated based on the MXIP scheme. Hence,  N  ppartitions (P1, P2, ..., P N p) will be formed, from which the last

partition (P N p) is a collection of all distinct filters. Figure 5

indicates that the complexity of the algorithm is   O(|F |2).Figure 6 illustrates an example of a small filter set of 10

filters partitioned based on maximum intersections. Filters are

assumed to be 8-bits long for simplicity. Filters 1, 4, and 10

have zero distance, hence they can form one partition;   P1.

Also, filters 8 and 9 have zero distance with filter 10, therefore

they are also placed in  P1. Filters 5 and 6 make a zero distance

©-4244-0357-X/06/ 20.00 2006 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2006 proceedings.

Page 4: Multi packet classification

8/20/2019 Multi packet classification

http://slidepdf.com/reader/full/multi-packet-classification 4/5

 MXIP ( )01:  N  p = 002:  Ptemp = {}03: while (F  = {})04: {05:   N  p ←− N  p +106:   P Np  ←− { f 1}07:   F  ←− F −{ f 1}08: for ( all   f i ∈ F )09:   {10: if (∃ f  j ∈ P N p such that  d i, j = 0)

11:   {12:   P N p ←− P Np ∪{ f i}13:   F  ←− F −{ f i}14:   }15:   }16: if (| P N p |= 1)17:   Ptemp ←− Ptemp ∪{ f i}18:   }19: if (Ptemp = {})20:   {21:   N  p ←− N  p +122:   P Np  ←− Ptemp

23: }

Figure 5. Pseudocode for Maximum-Intersection partitioning.

with filter 2, and form partition  P2. Finally, filters 3 and 7 thathave no zero distance with any other filters, form the distint

filter collection, and are placed in a separate partition (P3). In

this example, if ”11010010” arrives as the search key, partition

P1 would be chosen and the search is performed on a relatively

small TCAM containing partition  P1 only.

3) 1011001x

7) 0111x010

1) 1101xx10

4) 11x1xxxx

8) 100x1xx0

9) 1001x110

10)1x01xx10

Before Partitioning After Partitioning

Partition P1

Partition P2

Partition P3

6) 0110x1x0

Filters:

1) 1101xx10

2) 0xx01110

5) 00101xx06) 0110x1x0

7) 0111x010

9) 1001x110

8) 100x1xx0

10)1x01xx10

4) 11x1xxxx

3) 1011001x

2) 0xx01110

5) 00101xx0

Figure 6. Example of Maximum Intersection Partitioning

Maximum intersection partitioning would ensure that all

possible matches for a given packet are located in one par-

tition. The MPZ and encoder circuit connected to the TCAM

provides the addresses of the   r  matches in at most   r   cycles.

Figure 7 illustrates the Maximum Intersection Partitioningapproach used with the MPZ architecture. Note that the last

partition (P N p) does not need a MPZ unit since it would result

in at most one match. This approach is highly efficient since:

•  It provides all matching addresses in at most  r  cycles.

•  It does not add any extra filter to the whole set, unlike

others, e.g. the SSA scheme [11], that adds new filters for

partially overlapping filters in each partition. This feature

makes the MXIP approach along with the MPZ, much

more memory efficient.

•   By enabling the TCAM search on one partition and

disabling others, the power consumption is significantly

TCAM

Search Key

Partition PNp

Partition P2

Partition P1

Index

Encoder

+MPZTCAM

TCAMIndex

Encoder

+MPZ

Encoder  Index

Resolver

Contention

Figure 7. Maximum Intersection Partitioning

reduced. Power consumption is directly proportional to

the number of entries being searched in parallel in a

TCAM. The MXIP method effectively partitions the

database, hence for each packet, only a small portion is

being searched, while all others remain idle.

 B. MXIP Implementation

Performing the TCAM search on only a small portion of 

the entire database can significantly save power consumption

due to the frequently charging and discharging of the highly

capacitive match line. To achieve such power savings, a  Con-

tention Resolver Unit  is required so that the search mechanism

would result in enabling the TCAM search on one partition,

while disabling others. A unique identification code for each

partition (a representative of the filters in that partition) can

be defined to facilitate choosing the right partition. The search

key is to be compared against the ID codes to select the TCAM

that contains the match(es).

In our system large partitions and therefore large MPZ units

will not be required. Maximum number of possible filters (thatMXIP pushes into the same partition) depends on number of 

fields considered. Using practical filter sets, 2-field and 5-field

classification produce around 150 and 8 matches, respectively

[11]. Therefore, due to nature of MXIP method, the expected

(average) size of partitions will be in the range of 8 to 150.

Updating the filter set (e.g. adding new filters or removing

existing ones) may alter the partitions as well as the ID codes

for the contention resolver unit, and would have complexity

of  O(|F |2) in the worst case. Due to lack of space these issues

are not further discussed here but can be found in [14].

VI. EXPERIMENTAL RESULTS

 A. Simulation of MPZ Unit 

An 8-bit MPZ unit was designed and implemented using

Synopsys tools [15], and the functional simulation results

are shown in Figure 8. The inputs to the MPZ unit are

assumed to be the TCAM output match lines. As shown

in Figure 8 assuming   m[7 : 0]   to be ”01011000” (58 H ) for

the MPZ input lines, there would be 3 matches. The select

line   s is also shown in the simulation results. This signal is

low on the first clock cycle, where the count is zero, and

becomes high on the next clock cycles when the counter

continues counting. Output results   EP[7 : 0]  are observed to

be ”00001000” (08 H ), ”00010000” (10 H ) and ”01000000”

©-4244-0357-X/06/ 20.00 2006 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2006 proceedings.

Page 5: Multi packet classification

8/20/2019 Multi packet classification

http://slidepdf.com/reader/full/multi-packet-classification 5/5

Figure 8. Functional simulation for an 8-bit MPZ unit

TABLE I

SIMULATION RESULTS FOR n-BIT  MPZ UNITS

n   Delay [ns] Area [NAND] Speedup  S n

8 9.92 206 40.32

16 17.92 437 44.64

32 25.09 769 63.77

64 43.87 1373 72.94

(40 H ) on the first three clock cycles, respectively. The fourth

clock cycle would provide all zeros at the output, indicatingno more matches found. We assumed the enable line E N  to be

high until all matches are reported at the output. The encoder

receives  EP[7 : 0] from the prioritizer and reports all 3 match

addresses in 3 clock cycles.

 B. Speedup

The maximum throughput of a single MPZ unit would be

the smallest clock period for the circuit to function correctly.

Table I compares the statistics for four MPZ units that we have

implemented. As n grows, the delay and the cost of MPZ both

increases. However, note carefully that growth of critical path

delay, shown in the second column, is not an indication of 

improvement. To be more clear about this the third columnshows the estimated speedup (S n) in each case compared to a

software-based classification approach. We assumed that the

host running the classification software needs at least one

memory access and one comparison instructions per entry.

This means the speed up will be:   S n  ≥  r ·n·(t mem+t cmp)

r ·T C lk n, where

t mem and t cmp refer to the corresponding instructions, and T Clk n

is the duration of the clock driving a   n-bit MPZ; obviously,

T Clk n  should be larger than the delay values given in Table

I. Note that in practical cases   r  <<  n  and thus as   n   grows

the speedup will be even larger than 40-73 given in the table

because in our approach the overall performance depends

on   r   and not   n. Such property makes our hardware engine

quite attractive in applications that require fast multi-match

classification.

Table II also summarizes the design challenges for imple-

menting a 64-bit multi-match prioritizer cell by cascading

n-bit MPZ stages, for different values of   n. The table also

indicates increasing and decreasing trends for the worst-case

delay (second column) and area cost of cascaded designs (third

column), respectively. A designer can do the tradeoff and

choose appropriate size MPZs that satisfy the time and area

constraints.

The speed of our architecture is also higher than con-

ventional hardware based designs. A conventional TCAM

TABLE II

IMPLEMENTING A 64- BIT  MPZ  BY CASCADING VARIOUS SIZE MPZ UNITS

#× size MPZ Delay [ns] Area [NAND]

8×8-bit    11.01 2264

4×16-bit    19.71 2320

2×32-bit    30.11 2121

1×64-bit    48.26 1373

delay can be written as   T TCAM  =  T word  + T PE , where   T word   is

the delay of the TCAM word and   T PE   is the delay of thepriority encoder. For large TCAMs   T word  ≈  T PE  ≈ T TCAM /2.

Our priority encoder (the MPZ unit along with an encoder)

approximately has a delay of 1.4×T PE . A conventional multi-

match TCAM based approach would spend   r · (7× T TCAM )cycles for finding   r  matches [2]. Hence, we obtain speedup

of   S  =  r ·(7×T TCAM )

T TCAM /2+r ·(1.4×T TCAM /2)   for finding   r   matches. For a

maximum of 8 matches we achieve speedup of 9.18.

VII.   CONCLUSION

We introduced a multi-match classifier engine that is capable

of finding all   r  matches in exactly   r  cycles. The MPZ design

can operate at wire speed without performance degradation.

Our design can be employed by IPv4/IPv6 packet classifiersthat utilize TCAMs in their architectures.

REFERENCES

[1] K. Zheng, H. Che, Z. Wang, and B. Liu, “TCAM-Based DistributedParallel Packet Classification Algorithm with Range-Matching Solution,”

 INFOCOM’05, 2005.[2] K. Lakshminarayanan, A. Rangarajan and S. Venkatachary, “Algorithms

for Advanced Packet Classification with Ternary CAMs,”   ACM SIG-COMM’05, Aug. 2005.

[3] SNORT Network Intrusion Detection System,  www.snort.org.[4] F. Yu, R. H. Katz and T. V. Lakshman, “Efficient Multimatch Packet

Classification and Lookup with TCAM,”  IEEE Computer Society, Jan.2005.

[5] D. E. Taylor and E. W. Spitznagel, “On Using Content AddressableMemory for Packet Classification,”   Technical Report WUCSE-2005-9,March 2005.

[6] E. Spitznagel, D. Taylor and J. Turner, “Packet Classification Using Ex-tended TCAMs,” Proceedings of the 11th IEEE International Conferenceon Network Protocols, pp. 120-131, Nov. 2003.

[7] F. Yu, R. H. Katz and T. V. Lakshman, “Gigabit Rate Packet Pattern-Matching Using TCAM,”  Proceedings of the 12th IEEE InternationalConference on Network Protocols, pp. 174-183, 2004.

[8] H. Song and J. W. Lockwood, “Efficient Packet Classification forNetwork Intrusion Detection Using FPGA,”   Proceedings of the 2005

 ACM/SIGDA 13th International Symposium on Field-ProgrammableGate Arrays, Feb. 2005.

[9] N. F. Huang, W. E. Chen, J. Y. Luo and J. M. Chen, “Design of Multi-field IPv6 Packet Classifiers Using Ternary CAMs,”   IEEE Conferenceon Global Telecommunications, vol. 3, pp. 1877-1881, Nov. 2001.

[10] N. F. Huang, K. B. Chen and W. E. Chen, “Fast and Scalable Multi-TCAM Classification Engine for Wide Policy Table Lookup,”   19th

 IEEE International Conference on Advanced Information Networkingand Applications, vol. 1, pp. 792-797 , March 2005.

[11] F. Yu, T. V. Lakshman, M. A. Motoyama and R. Katz., “SSA: A Powerand Memory Efficient Scheme to Multi-Match Packet Classification,”

 ACM Proceedings of the 2005 Symposium on Architecture for Network-ing and Communications Systems ANCS ’05, pp. 105-113, Oct. 2005.

[12] C. Kun, S. Quan and A. Mason, “A Power Optimized 64-bit PriorityEncoder Utilizing Parallel Priority Look-Ahead,”  IEEE ISCAS’04, vol.2, pp. 753-756 , May 2004.

[13] C. H. Huang, J. S. Wang and Y. C. Huang, “Design of High-PerformanceCMOS Priority Encoders and Incrementer/Decrementers Using Multi-level Lookahead and Multilevel Folding Techniques,”  IEEE Journal of Solid-State Circuits, vol. 37, no. 1, pp. 63-76 , Jan. 2002.

[14] M. Faezipour and M. Nourani, “Design and Implementation of a TCAM-Based Architecture for Multi-Match Packet Classification,”   Technical

 Report, UTDEE-12-2005, Dec. 2005.[15] Synopsys Inc., “User Manuals for SYNOPSYS Toolset Version

2005.06,” 2005.

©-4244-0357-X/06/ 20.00 2006 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2006 proceedings.