Application Of Bloom Filter In Fast Packet Classification

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

134 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

Application Of Bloom Filter In Fast Packet Classification. Milind K. Chavan

M.E. student at S.R.E.S. College of Engineering, Kopargaon.

[email protected]

A B S T R A C T

Packet classification finds various applications in computer networks like QoS (Quality of

service), Firewalls, multimedia communication, telecommunication, security, monitoring data

traffic etc. To classify packets to a particular flow or the set of flows, Intermediate nodes which

are present in the network must perform search for a rule which defines the flow of that

particular packet, which is chosen on the basis of different field present in the data packet. The

rule set is predefined by the user, which is constructed on the basis of algorithmic and

architectural methodology. The major constraint in this methodology is searching speed of

particular rule. Since few decades researches are finding the best computational methodology for

packet classification. But the current algorithms which are used in the packet classification highly

rely on expensive and high power consuming devices like TCAM (Ternary content addressable

memory). Therefore searching of fast and power efficient algorithms for packet classification is

still the subject of interest for researchers. In this paper we have delivered a new direction to

packet classification which includes algorithmic and architectural structure for packet

classification. Our inception is from well-known Cross product algorithm which is very fast but

introduces additional rules which increases memory requirement. We have shown how to

enhance the crossproduct in a way which drastically reduces this addition of extra rules, without

affecting the throughput of the algorithm, and unnecessary memory access to the off chip memory

by using on chip bloom filter.

Index Terms: Computer Networks, Packet Classification, TCAM, Crossproduct, Bloom Filter

I. INTRODUCTION

Packet classification is becoming favorite topic of researchers in last few decades as the demand of large

data in communication is increasing day by day, thus we require more and more sophisticated

algorithmic techniques to fulfill this demand. Logically packet classification technique is nothing but

comparing the bit stream given in the different fields in data packets with the classifier, which consist of a

rule set. The comparison is done with the prefix bits and not with the whole bit stream. After the matched

rule is found in the classifier respective action is applied on the packet which is defined in the classifier

with the match rule.

However, until this date none of the computational techniques is not able to eliminate TCAM in real life

application .TCAMs are the storing devices that store the array of limited width key. This key is used to

search the rule in parallel and produces the result when any entry matches with the key. Recent TCAMs

supports up to 133 million searches per second for 144 bit wide key and can be able to store 128K keys

that are 144 bit wide TCAM devices are costly and consumes 50 times much power than other devices

but they are still favorite choice of manufacturer. They are also 15 times more bulky than SRAM [1].




In this paper we are implementing a new logarithmic method i.e. Crossproduct algorithm. In this

algorithm a single data structure is created for a multiple no of field to be checked. But the drawback is

that, it creates more amount of additional rules. Which require significantly large amount of memory, so

to avoid this unnecessary usage of memory by additional rules, multiple subsets of data structure (trie)

are created, which drastically reduces the unnecessary usage of memory [2]. In cross producting multiple

subset algorithm the results are used to form a key, which is used to lookup in lookup table to find

matched rule.

To achieve this we will first look up the prefix for each field separately using bloom filter which is fast

and memory efficient searching technique. Therefore, with a very high probability the longest prefix

matching can be performed on the source and destination addresses and the source and destination port

in just four memory accesses.

To reduce the memory consumption we have divided the rules into multiple subsets and then

constructed cross product lookup table. As the rules are distributed into number of sets, we need to

perform lookup in each subset for which we can use Bloom filter [8]. This computational technique will

avoid the unnecessary lookups in those subset which do not match the prefix address.

To reduce memory requirement, we divide the rule in multiple subset and then construct a cross product

table for each subset. This will reduce the requirement of additional rules in cross product.

As we divided the rules in number of filter to avoid this extra lookup in the subset which are not having

matching rule, this helps to get high throughput from this algorithm. If multiple rules are matched then

we will require only 4 access to choose the highest priority rule where P is the number of rules that

packet can match.

In the following sections of this paper we have presented how the algorithm will require P+4+ memory

accesses to get the matched rule. Where is a small constant which is very less than 1 (




service provider across a network access point. By using this example they have explained application of

packet classification. They have also classified the basic techniques in 4 types

A) Basic data structure.

B) Geometric algorithms.

C) Heuristics based.

D) Hardware based.

The basic concept of cross product algorithm is that, we have to perform LPM on each field first and then

combine the results of individual to form a key which mapped towards crossproduct table. This best

matching rule from the cross product table can be fetched in only one memory access (cycle). The single

field look up as in the ABV and RFC algorithm.

TCAMs are widely used for packet classification. Latest TCAM devices also include the banking

mechanism to reduce the power consumption by selectively turning off unused banks. Traditionally,

TCAM devices needed to expand the range values into prefixes for storing a rule with range specifications

[6]. The recently introduced algorithm, DIRPE, uses a clever technique to encode ranges differently which

results in overall lesser rule expansion compared to the traditional method. The authors also recognized

that in modern security applications, it is not sufficient to stop the matching process after the first match

is found but all the matching rules for a packet must be reported. They devised a multi-match scheme

with TCAMs which involves multiple TCAM accesses.

In paper [6] the authors have done longest prefix matching using bloom filter to get the required rule for

the packet, in this paper bloom filters are created on the bases of count of bit present in the destination

IP address. The bloom filter is made like: for 1 bit prefix. 1bit bloom filter is programmed for 2 bit prefix

two bit length bloom filter is programmed and so on. After matching the 1, 2, 3n numbers of prefixes

one by one the proportional data is collected. If 1bit prefix is matched with 1bit bloom filter then filter

will generate 1bit at its output, if it does not matches then it will generate zero at the output. Thus 1, 2,

3n no. of prefixes are matched with input bits (Different information e.g. Source or Destination IP).

Even a single bit in output string of bloom filter is 0 then it will discard the packet. If matched then,

computation is performed on the output string which is called hashing. After hashing the identity of

particular prefix is return.

III. DEFICIENT CROSS PRODUCT ALGORITHM

Deficient cross product rule work as follows. First, the separate trie is constructed for different fields,

which are represented in the rule set. In this trie each node is marked with the prefix representing the

rule. Let the first trie for the field 1 search and second trie for field 2 search given in Fig 1. The connection

between the marked nodes is nothing but the matching rule for the given prefix in Table 2. At start we

perform independent search for each field in the respective individual trie and find the most specific

prefix, which will be longest matching prefix (LPM). After this we create a unique key and use it to index

the cross product rule table. Every rule in cross product table is original and artificial rules which we

generated during crossproducting. This rules either forms matching rule or do not form any rule. Hence

with no matching we get nothing. Thus when there is match present, we always gets the correct rule. This

is shown in Figure 1.




Table 1 Basic Classifier Table

r1 1* *

r2 1* 00*

r3 01* 100*

r4 101* 11*

r5 101* 11*

r6 00* 0

Figure 1 Illustration of basic cross product algorithm

Here we have used two dimensional rule set with each field is maximum 4 bit wide for the purpose of

demonstration.

Table 2 Representation of Pseudo Rule and Original Rule

1* * r1

1* 00* r2

p1 1* 11* r1

p1 1* 100* r1

00* * r6

p3 00* 00* r6

p4 00* 11* r6

p5 00* 100* r6

01* *

01* 00*

01* 11*

01* 100* r3

p6 101* * r1

p7 101* 00* r1

r2

101* 11* r5

101* 100* r4




Figure 2 Representation of Pseudo Rule and Original Rule

This cross product algorithm has two deficiencies.

1) A large no of empty rule.

2) A very large no of pseudo rule.

The first problem is eliminated by using hash table instead of using direct look up table. As the cross

product table maintains all the possibility that are generated by cross producting. Thus maintaining a

hash table is the best way to eliminate empty rules.

Above all this if we use Bloom filter described in above topic before Hash table it tremendously improves

the throughput. Because it require only one memory access per LPM. Therefore entire classification

process takes 5 memory accesses with very high probability to classify a packet. Second problem is

eliminated by the following methodology.

IV. SUBSET CROSS PRODUCING ALGORITHM

In the deficient cross product algorithm to get list of matching rule we required only one hash table

access but against this benefit multiple blank rule (empty rule) are generated. However if we split the

single data trie into multiple subsets while taking multiple smaller rule sets and taking cross produdcting

between them pseudo rule get reduce significantly as compared to deficient cross producting algorithm.

This is shown in fig. 4.




Figure 4. Dividing rule in separate subset to reduce overlap

Table 3. LPM tables for individual field.

We have divided the rule set into three subsets and within those subsets we have performed the cross

producting and inserted those rule which are provided in subset, this results inserting P7 pseudo rule in

subset 1 (G1) andP2 in subset 2(G2) all the pseudo rule vanishes and the load of the extra rule reduces

significantly. Now one question arises in our mind how this, Pseudo rule are vanished? This is because

the cross product is by default multiplicative in nature. When the number of overlapping prefixes of a

field i get reduced by a factor of xi due to partitioning the resulting reduction in the cross product rule is

of the order xi and here large. After the reduction of the required memory by the cross producting, an

independent hash table can be prepared in which for each independent rule in subset, independent look

can be performed. This splitting inserts two extra memory access.

1) An entire LPM process is performed for all subset.

2) A separate access is required to look up final from hash table.

V. EXPLANATION TO THE FLOW OF ALGORITHM

After splitting the single data tried into multiple subsets, LPM is done on an each subsets separately for

different field. This will generate the keys for each matched subset, which is place in the LPM table for

G1 G2 G3

1* 1 1 -

00* - - 2

01* - 2 -

101* 3 1 -

G1 G2 G3

* - 0 0

00* 2 0 0

11* 2 0 0

100* 3 3 0




particular field. So, LPM is done on each subset and a key is obtained. Key is nothing but a number of

prefixes matched in the subset. If number of prefix is matched in Subset then it will simply take the key as

zero.

Figure 5. Illustration of flow of algorithm

So in practice we can directly skip this subset and move to next subset. But for the purpose of analysis,

we will consider the entire matching prefix key as non-zero. After probing this rule directly in hash table

we get multiple unnecessary rules. This problem is avoided by using bloom filter. We maintain one bloom

filter in on chip memory corresponding to each off chip rule subset hash table. We first tests the bloom

filters with the key to be looked up in the subset. If the filter shows the match, We took the longest

matching prefix (up the key in the off chip hash table the flow of algorithm is shown in figure 5.

VI. IMPLIMENTATION DETAIL

In this algorithm we have examined the algorithm with an extensive simulation study by using Net beans

IDE( Integrated Development Environment) tool. The data set size is varied as 500, 1000, 1500, 2000

and 3000 number of rules. The Data sets are generated by using TCP Dump which collects the data

packets from working network. Here is the GUI made for this algorithm.

VII. RESULTS

In this we are directly browsing the data set file, which contains data packet information. By clicking

select the Test File. Or we can directly brows the folder which contains the files of data set. By clicking the

Select the Rule File we can browse Rule file for the algorithm. After that by clicking the built tab, Bloom

filter is generated, the time taken to generate the bloom filter is displayed.




Figure 6. GUI(Graphical User Interface) for the algorithm

Figure 7. Time taken to built Bloom filter

Next to this after clicking the test button the time required to check the data set is displayed the data

packet are checked in the filter and the following result is displayed.

Figure 8. Time taken to test the data set file

Figure 9. GUI showing the results

1. Performance Metrics

In this simulation of algorithm, the following parameters are considered. Time taken to built bloom filter

(the time required to build bloom filter), Time taken to process the data (time taken by the filter to

search the identity of the all packets in data set).




Table 4. Shows the time requires to built the Bloom filter and time require to process the data set

through the filter for various no of data packet.

No of

packet

Time taken

to build

bloom filter

Time taken

to process

the data set

500 30 382

1000 50 1048

1500 20 1294

2000 30 1476

3000 40 1937

Table 5. Shows the no of packet accepted denied and no of packet which do not matched.

VIII. CONCLUSION

Algorithmic solutions are always a better alternative for TCAM for lower cost, less power consumption

and flexibility. Our computational methodology includes multi set crossproducting, which are much

better than only crossproducting with insertion of bloom filter which accelerates computational process

of packet classification. Due to its primary reliance on memory, our algorithm is power-efficient. It

consumes about an average 30 to 36 bytes per rule of memory (on-chip and off-chip combined). Hence

rule sets as large as 128K can be easily supported in less than 5MB of SRAM. Using two 300MHz 36-bit

wide SRAM chips, packets can be classified at OC-192 speed.

IX. REFERENCES

[1] Fang Yu, T. V. Lakshman, Martin Austin Motoyama, and Randy H. Katz. Ssa: a power and memory

efficient scheme to multi-match packet classification. In ANCS 05: Proceedings of the 2005

symposium on Architecture for networking and communications systems, 2005.

[2] David Taylor and Jon Turner. Scalable Packet Classification Using Distributed Crossproducting of

Field Labels. In IEEE INFOCOM, July 2005.

[3] V. Srinivasan, Subhash Suri, and George Varghese. Packet Classification Using Tuple Space

Search.In ACM SIGCOMM, 1999.V. Srinivasan, George Varghese, Subhash Suri, and Marcel

Waldvogel. Fast and Scalable Layer Four Switching. In ACM SIGCOMM, 1998.

No of

packets

File

accepted File Deny

No match

found

500 43 220 194

1000 86 440 388

1500 127 660 582

2000 172 880 776

3000 258 1320 1164




[4] Haoyu Song, Sarang Dharmarpurikar, Jonathan Turner, and John Lockwood. Fast Hash Table

Lookup Using Extended Bloom Filter: An Aid to Network Processing. In ACM SIGCOMM, 2005.

[5] Pankaj Gupta and Nick McKeown. Packet classification on multiple fields. In ACM SIGCOMM, 1999.

[6] Balasaheb S. Agarkar, Uday V. Kulkarni A Novel Technique for Fast Packet Classification.

International Journal of Computer Applications (0975 8887), Volume 76 No.4, August 2013.

[7] IDT Generic Part: 71P72604. http://www.idt.com/?catID=58745&genID=71P72604.

[8] IDT Generic Part: 75K72100. http://www.idt.com/?catID=58523&genID=75K72100.

[9] Florin Baboescu and George Varghese. Scalable Packet Classification. In ACM SIGCOMM, 2001.

[10] Sarang Dharmapurikar, P. Krishnamurthy, and Dave Taylor. Longest Prefix Matching using Bloom

Filters. In ACM SIGCOMM, August 2003. Will Eatherton. Fast IP Lookup Using Tree Bitmap.

Washington University Master Thesis, 1999.

[11] T. V. Lakshman and D. Stiliadis. High-speed policy-based packet forwarding using efficient multi-

dimensional range matching.In ACM SIGCOMM, 1998.

[12] K. Lakshminarayanan, Anand Rangarajan, and Srinivasan Venkatachary. Algorithms for Advanced

Packet Classification using Ternary CAM. In ACM SIGCOMM, 2005.

[13] David E. Taylor. Survey and taxonomy of packet classification techniques. Washington University

Technical Report, WUCSE-2004, 2004.

[14] David E. Taylor and Jonathan S. Turner. Classbench: A Packet Classification Benchmark. In IEEE

INFOCOM, 2005.

[15] Fang Yu and Randy H. Katz. Efficient Multi-Match Packet Classification with TCAM. In IEEE Hot

Interconnects, August 2003.

Documents

Application Of Bloom Filter In Fast Packet Classification