Upload
ijafrc
View
15
Download
0
Embed Size (px)
DESCRIPTION
Packet classification finds various applications in computer networks like QoS (Quality ofservice), Firewalls, multimedia communication, telecommunication, security, monitoring datatraffic etc. To classify packets to a particular flow or the set of flows, Intermediate nodes whichare present in the network must perform search for a rule which defines the flow of thatparticular packet, which is chosen on the basis of different field present in the data packet. Therule set is predefined by the user, which is constructed on the basis of algorithmic andarchitectural methodology. The major constraint in this methodology is searching speed ofparticular rule. Since few decades researches are finding the best computational methodology forpacket classification. But the current algorithms which are used in the packet classification highlyrely on expensive and high power consuming devices like TCAM (Ternary content addressablememory). Therefore searching of fast and power efficient algorithms for packet classification isstill the subject of interest for researchers. In this paper we have delivered a new direction topacket classification which includes algorithmic and architectural structure for packetclassification. Our inception is from well-known Cross product algorithm which is very fast butintroduces additional rules which increases memory requirement. We have shown how toenhance the cross product in a way which drastically reduces this addition of extra rules, withoutaffecting the throughput of the algorithm, and unnecessary memory access to the off chip memoryby using on chip bloom filter.
Citation preview
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
134 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
Application Of Bloom Filter In Fast Packet Classification. Milind K. Chavan
M.E. student at S.R.E.S. College of Engineering, Kopargaon.
A B S T R A C T
Packet classification finds various applications in computer networks like QoS (Quality of
service), Firewalls, multimedia communication, telecommunication, security, monitoring data
traffic etc. To classify packets to a particular flow or the set of flows, Intermediate nodes which
are present in the network must perform search for a rule which defines the flow of that
particular packet, which is chosen on the basis of different field present in the data packet. The
rule set is predefined by the user, which is constructed on the basis of algorithmic and
architectural methodology. The major constraint in this methodology is searching speed of
particular rule. Since few decades researches are finding the best computational methodology for
packet classification. But the current algorithms which are used in the packet classification highly
rely on expensive and high power consuming devices like TCAM (Ternary content addressable
memory). Therefore searching of fast and power efficient algorithms for packet classification is
still the subject of interest for researchers. In this paper we have delivered a new direction to
packet classification which includes algorithmic and architectural structure for packet
classification. Our inception is from well-known Cross product algorithm which is very fast but
introduces additional rules which increases memory requirement. We have shown how to
enhance the crossproduct in a way which drastically reduces this addition of extra rules, without
affecting the throughput of the algorithm, and unnecessary memory access to the off chip memory
by using on chip bloom filter.
Index Terms: Computer Networks, Packet Classification, TCAM, Crossproduct, Bloom Filter
I. INTRODUCTION
Packet classification is becoming favorite topic of researchers in last few decades as the demand of large
data in communication is increasing day by day, thus we require more and more sophisticated
algorithmic techniques to fulfill this demand. Logically packet classification technique is nothing but
comparing the bit stream given in the different fields in data packets with the classifier, which consist of a
rule set. The comparison is done with the prefix bits and not with the whole bit stream. After the matched
rule is found in the classifier respective action is applied on the packet which is defined in the classifier
with the match rule.
However, until this date none of the computational techniques is not able to eliminate TCAM in real life
application .TCAMs are the storing devices that store the array of limited width key. This key is used to
search the rule in parallel and produces the result when any entry matches with the key. Recent TCAMs
supports up to 133 million searches per second for 144 bit wide key and can be able to store 128K keys
that are 144 bit wide TCAM devices are costly and consumes 50 times much power than other devices
but they are still favorite choice of manufacturer. They are also 15 times more bulky than SRAM [1].
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
135 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
In this paper we are implementing a new logarithmic method i.e. Crossproduct algorithm. In this
algorithm a single data structure is created for a multiple no of field to be checked. But the drawback is
that, it creates more amount of additional rules. Which require significantly large amount of memory, so
to avoid this unnecessary usage of memory by additional rules, multiple subsets of data structure (trie)
are created, which drastically reduces the unnecessary usage of memory [2]. In cross producting multiple
subset algorithm the results are used to form a key, which is used to lookup in lookup table to find
matched rule.
To achieve this we will first look up the prefix for each field separately using bloom filter which is fast
and memory efficient searching technique. Therefore, with a very high probability the longest prefix
matching can be performed on the source and destination addresses and the source and destination port
in just four memory accesses.
To reduce the memory consumption we have divided the rules into multiple subsets and then
constructed cross product lookup table. As the rules are distributed into number of sets, we need to
perform lookup in each subset for which we can use Bloom filter [8]. This computational technique will
avoid the unnecessary lookups in those subset which do not match the prefix address.
To reduce memory requirement, we divide the rule in multiple subset and then construct a cross product
table for each subset. This will reduce the requirement of additional rules in cross product.
As we divided the rules in number of filter to avoid this extra lookup in the subset which are not having
matching rule, this helps to get high throughput from this algorithm. If multiple rules are matched then
we will require only 4 access to choose the highest priority rule where P is the number of rules that
packet can match.
In the following sections of this paper we have presented how the algorithm will require P+4+ memory
accesses to get the matched rule. Where is a small constant which is very less than 1 (
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
136 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
service provider across a network access point. By using this example they have explained application of
packet classification. They have also classified the basic techniques in 4 types
A) Basic data structure.
B) Geometric algorithms.
C) Heuristics based.
D) Hardware based.
The basic concept of cross product algorithm is that, we have to perform LPM on each field first and then
combine the results of individual to form a key which mapped towards crossproduct table. This best
matching rule from the cross product table can be fetched in only one memory access (cycle). The single
field look up as in the ABV and RFC algorithm.
TCAMs are widely used for packet classification. Latest TCAM devices also include the banking
mechanism to reduce the power consumption by selectively turning off unused banks. Traditionally,
TCAM devices needed to expand the range values into prefixes for storing a rule with range specifications
[6]. The recently introduced algorithm, DIRPE, uses a clever technique to encode ranges differently which
results in overall lesser rule expansion compared to the traditional method. The authors also recognized
that in modern security applications, it is not sufficient to stop the matching process after the first match
is found but all the matching rules for a packet must be reported. They devised a multi-match scheme
with TCAMs which involves multiple TCAM accesses.
In paper [6] the authors have done longest prefix matching using bloom filter to get the required rule for
the packet, in this paper bloom filters are created on the bases of count of bit present in the destination
IP address. The bloom filter is made like: for 1 bit prefix. 1bit bloom filter is programmed for 2 bit prefix
two bit length bloom filter is programmed and so on. After matching the 1, 2, 3n numbers of prefixes
one by one the proportional data is collected. If 1bit prefix is matched with 1bit bloom filter then filter
will generate 1bit at its output, if it does not matches then it will generate zero at the output. Thus 1, 2,
3n no. of prefixes are matched with input bits (Different information e.g. Source or Destination IP).
Even a single bit in output string of bloom filter is 0 then it will discard the packet. If matched then,
computation is performed on the output string which is called hashing. After hashing the identity of
particular prefix is return.
III. DEFICIENT CROSS PRODUCT ALGORITHM
Deficient cross product rule work as follows. First, the separate trie is constructed for different fields,
which are represented in the rule set. In this trie each node is marked with the prefix representing the
rule. Let the first trie for the field 1 search and second trie for field 2 search given in Fig 1. The connection
between the marked nodes is nothing but the matching rule for the given prefix in Table 2. At start we
perform independent search for each field in the respective individual trie and find the most specific
prefix, which will be longest matching prefix (LPM). After this we create a unique key and use it to index
the cross product rule table. Every rule in cross product table is original and artificial rules which we
generated during crossproducting. This rules either forms matching rule or do not form any rule. Hence
with no matching we get nothing. Thus when there is match present, we always gets the correct rule. This
is shown in Figure 1.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
137 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
Table 1 Basic Classifier Table
r1 1* *
r2 1* 00*
r3 01* 100*
r4 101* 11*
r5 101* 11*
r6 00* 0
Figure 1 Illustration of basic cross product algorithm
Here we have used two dimensional rule set with each field is maximum 4 bit wide for the purpose of
demonstration.
Table 2 Representation of Pseudo Rule and Original Rule
1* * r1
1* 00* r2
p1 1* 11* r1
p1 1* 100* r1
00* * r6
p3 00* 00* r6
p4 00* 11* r6
p5 00* 100* r6
01* *
01* 00*
01* 11*
01* 100* r3
p6 101* * r1
p7 101* 00* r1
r2
101* 11* r5
101* 100* r4
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
138 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
Figure 2 Representation of Pseudo Rule and Original Rule
This cross product algorithm has two deficiencies.
1) A large no of empty rule.
2) A very large no of pseudo rule.
The first problem is eliminated by using hash table instead of using direct look up table. As the cross
product table maintains all the possibility that are generated by cross producting. Thus maintaining a
hash table is the best way to eliminate empty rules.
Above all this if we use Bloom filter described in above topic before Hash table it tremendously improves
the throughput. Because it require only one memory access per LPM. Therefore entire classification
process takes 5 memory accesses with very high probability to classify a packet. Second problem is
eliminated by the following methodology.
IV. SUBSET CROSS PRODUCING ALGORITHM
In the deficient cross product algorithm to get list of matching rule we required only one hash table
access but against this benefit multiple blank rule (empty rule) are generated. However if we split the
single data trie into multiple subsets while taking multiple smaller rule sets and taking cross produdcting
between them pseudo rule get reduce significantly as compared to deficient cross producting algorithm.
This is shown in fig. 4.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
139 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
Figure 4. Dividing rule in separate subset to reduce overlap
Table 3. LPM tables for individual field.
We have divided the rule set into three subsets and within those subsets we have performed the cross
producting and inserted those rule which are provided in subset, this results inserting P7 pseudo rule in
subset 1 (G1) andP2 in subset 2(G2) all the pseudo rule vanishes and the load of the extra rule reduces
significantly. Now one question arises in our mind how this, Pseudo rule are vanished? This is because
the cross product is by default multiplicative in nature. When the number of overlapping prefixes of a
field i get reduced by a factor of xi due to partitioning the resulting reduction in the cross product rule is
of the order xi and here large. After the reduction of the required memory by the cross producting, an
independent hash table can be prepared in which for each independent rule in subset, independent look
can be performed. This splitting inserts two extra memory access.
1) An entire LPM process is performed for all subset.
2) A separate access is required to look up final from hash table.
V. EXPLANATION TO THE FLOW OF ALGORITHM
After splitting the single data tried into multiple subsets, LPM is done on an each subsets separately for
different field. This will generate the keys for each matched subset, which is place in the LPM table for
G1 G2 G3
1* 1 1 -
00* - - 2
01* - 2 -
101* 3 1 -
G1 G2 G3
* - 0 0
00* 2 0 0
11* 2 0 0
100* 3 3 0
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
140 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
particular field. So, LPM is done on each subset and a key is obtained. Key is nothing but a number of
prefixes matched in the subset. If number of prefix is matched in Subset then it will simply take the key as
zero.
Figure 5. Illustration of flow of algorithm
So in practice we can directly skip this subset and move to next subset. But for the purpose of analysis,
we will consider the entire matching prefix key as non-zero. After probing this rule directly in hash table
we get multiple unnecessary rules. This problem is avoided by using bloom filter. We maintain one bloom
filter in on chip memory corresponding to each off chip rule subset hash table. We first tests the bloom
filters with the key to be looked up in the subset. If the filter shows the match, We took the longest
matching prefix (up the key in the off chip hash table the flow of algorithm is shown in figure 5.
VI. IMPLIMENTATION DETAIL
In this algorithm we have examined the algorithm with an extensive simulation study by using Net beans
IDE( Integrated Development Environment) tool. The data set size is varied as 500, 1000, 1500, 2000
and 3000 number of rules. The Data sets are generated by using TCP Dump which collects the data
packets from working network. Here is the GUI made for this algorithm.
VII. RESULTS
In this we are directly browsing the data set file, which contains data packet information. By clicking
select the Test File. Or we can directly brows the folder which contains the files of data set. By clicking the
Select the Rule File we can browse Rule file for the algorithm. After that by clicking the built tab, Bloom
filter is generated, the time taken to generate the bloom filter is displayed.
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
141 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
Figure 6. GUI(Graphical User Interface) for the algorithm
Figure 7. Time taken to built Bloom filter
Next to this after clicking the test button the time required to check the data set is displayed the data
packet are checked in the filter and the following result is displayed.
Figure 8. Time taken to test the data set file
Figure 9. GUI showing the results
1. Performance Metrics
In this simulation of algorithm, the following parameters are considered. Time taken to built bloom filter
(the time required to build bloom filter), Time taken to process the data (time taken by the filter to
search the identity of the all packets in data set).
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
142 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
Table 4. Shows the time requires to built the Bloom filter and time require to process the data set
through the filter for various no of data packet.
No of
packet
Time taken
to build
bloom filter
Time taken
to process
the data set
500 30 382
1000 50 1048
1500 20 1294
2000 30 1476
3000 40 1937
Table 5. Shows the no of packet accepted denied and no of packet which do not matched.
VIII. CONCLUSION
Algorithmic solutions are always a better alternative for TCAM for lower cost, less power consumption
and flexibility. Our computational methodology includes multi set crossproducting, which are much
better than only crossproducting with insertion of bloom filter which accelerates computational process
of packet classification. Due to its primary reliance on memory, our algorithm is power-efficient. It
consumes about an average 30 to 36 bytes per rule of memory (on-chip and off-chip combined). Hence
rule sets as large as 128K can be easily supported in less than 5MB of SRAM. Using two 300MHz 36-bit
wide SRAM chips, packets can be classified at OC-192 speed.
IX. REFERENCES
[1] Fang Yu, T. V. Lakshman, Martin Austin Motoyama, and Randy H. Katz. Ssa: a power and memory
efficient scheme to multi-match packet classification. In ANCS 05: Proceedings of the 2005
symposium on Architecture for networking and communications systems, 2005.
[2] David Taylor and Jon Turner. Scalable Packet Classification Using Distributed Crossproducting of
Field Labels. In IEEE INFOCOM, July 2005.
[3] V. Srinivasan, Subhash Suri, and George Varghese. Packet Classification Using Tuple Space
Search.In ACM SIGCOMM, 1999.V. Srinivasan, George Varghese, Subhash Suri, and Marcel
Waldvogel. Fast and Scalable Layer Four Switching. In ACM SIGCOMM, 1998.
No of
packets
File
accepted File Deny
No match
found
500 43 220 194
1000 86 440 388
1500 127 660 582
2000 172 880 776
3000 258 1320 1164
International Journal of Advance Foundation and Research in Computer (IJAFRC)
Volume 1, Issue 6, June 2014. ISSN 2348 - 4853
143 | 2014, IJAFRC All Rights Reserved www.ijafrc.org
[4] Haoyu Song, Sarang Dharmarpurikar, Jonathan Turner, and John Lockwood. Fast Hash Table
Lookup Using Extended Bloom Filter: An Aid to Network Processing. In ACM SIGCOMM, 2005.
[5] Pankaj Gupta and Nick McKeown. Packet classification on multiple fields. In ACM SIGCOMM, 1999.
[6] Balasaheb S. Agarkar, Uday V. Kulkarni A Novel Technique for Fast Packet Classification.
International Journal of Computer Applications (0975 8887), Volume 76 No.4, August 2013.
[7] IDT Generic Part: 71P72604. http://www.idt.com/?catID=58745&genID=71P72604.
[8] IDT Generic Part: 75K72100. http://www.idt.com/?catID=58523&genID=75K72100.
[9] Florin Baboescu and George Varghese. Scalable Packet Classification. In ACM SIGCOMM, 2001.
[10] Sarang Dharmapurikar, P. Krishnamurthy, and Dave Taylor. Longest Prefix Matching using Bloom
Filters. In ACM SIGCOMM, August 2003. Will Eatherton. Fast IP Lookup Using Tree Bitmap.
Washington University Master Thesis, 1999.
[11] T. V. Lakshman and D. Stiliadis. High-speed policy-based packet forwarding using efficient multi-
dimensional range matching.In ACM SIGCOMM, 1998.
[12] K. Lakshminarayanan, Anand Rangarajan, and Srinivasan Venkatachary. Algorithms for Advanced
Packet Classification using Ternary CAM. In ACM SIGCOMM, 2005.
[13] David E. Taylor. Survey and taxonomy of packet classification techniques. Washington University
Technical Report, WUCSE-2004, 2004.
[14] David E. Taylor and Jonathan S. Turner. Classbench: A Packet Classification Benchmark. In IEEE
INFOCOM, 2005.
[15] Fang Yu and Randy H. Katz. Efficient Multi-Match Packet Classification with TCAM. In IEEE Hot
Interconnects, August 2003.