Application Of Bloom Filter In Fast Packet Classification

  • Upload
    ijafrc

  • View
    15

  • Download
    0

Embed Size (px)

DESCRIPTION

Packet classification finds various applications in computer networks like QoS (Quality ofservice), Firewalls, multimedia communication, telecommunication, security, monitoring datatraffic etc. To classify packets to a particular flow or the set of flows, Intermediate nodes whichare present in the network must perform search for a rule which defines the flow of thatparticular packet, which is chosen on the basis of different field present in the data packet. Therule set is predefined by the user, which is constructed on the basis of algorithmic andarchitectural methodology. The major constraint in this methodology is searching speed ofparticular rule. Since few decades researches are finding the best computational methodology forpacket classification. But the current algorithms which are used in the packet classification highlyrely on expensive and high power consuming devices like TCAM (Ternary content addressablememory). Therefore searching of fast and power efficient algorithms for packet classification isstill the subject of interest for researchers. In this paper we have delivered a new direction topacket classification which includes algorithmic and architectural structure for packetclassification. Our inception is from well-known Cross product algorithm which is very fast butintroduces additional rules which increases memory requirement. We have shown how toenhance the cross product in a way which drastically reduces this addition of extra rules, withoutaffecting the throughput of the algorithm, and unnecessary memory access to the off chip memoryby using on chip bloom filter.

Citation preview

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

    134 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    Application Of Bloom Filter In Fast Packet Classification. Milind K. Chavan

    M.E. student at S.R.E.S. College of Engineering, Kopargaon.

    [email protected]

    A B S T R A C T

    Packet classification finds various applications in computer networks like QoS (Quality of

    service), Firewalls, multimedia communication, telecommunication, security, monitoring data

    traffic etc. To classify packets to a particular flow or the set of flows, Intermediate nodes which

    are present in the network must perform search for a rule which defines the flow of that

    particular packet, which is chosen on the basis of different field present in the data packet. The

    rule set is predefined by the user, which is constructed on the basis of algorithmic and

    architectural methodology. The major constraint in this methodology is searching speed of

    particular rule. Since few decades researches are finding the best computational methodology for

    packet classification. But the current algorithms which are used in the packet classification highly

    rely on expensive and high power consuming devices like TCAM (Ternary content addressable

    memory). Therefore searching of fast and power efficient algorithms for packet classification is

    still the subject of interest for researchers. In this paper we have delivered a new direction to

    packet classification which includes algorithmic and architectural structure for packet

    classification. Our inception is from well-known Cross product algorithm which is very fast but

    introduces additional rules which increases memory requirement. We have shown how to

    enhance the crossproduct in a way which drastically reduces this addition of extra rules, without

    affecting the throughput of the algorithm, and unnecessary memory access to the off chip memory

    by using on chip bloom filter.

    Index Terms: Computer Networks, Packet Classification, TCAM, Crossproduct, Bloom Filter

    I. INTRODUCTION

    Packet classification is becoming favorite topic of researchers in last few decades as the demand of large

    data in communication is increasing day by day, thus we require more and more sophisticated

    algorithmic techniques to fulfill this demand. Logically packet classification technique is nothing but

    comparing the bit stream given in the different fields in data packets with the classifier, which consist of a

    rule set. The comparison is done with the prefix bits and not with the whole bit stream. After the matched

    rule is found in the classifier respective action is applied on the packet which is defined in the classifier

    with the match rule.

    However, until this date none of the computational techniques is not able to eliminate TCAM in real life

    application .TCAMs are the storing devices that store the array of limited width key. This key is used to

    search the rule in parallel and produces the result when any entry matches with the key. Recent TCAMs

    supports up to 133 million searches per second for 144 bit wide key and can be able to store 128K keys

    that are 144 bit wide TCAM devices are costly and consumes 50 times much power than other devices

    but they are still favorite choice of manufacturer. They are also 15 times more bulky than SRAM [1].

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

    135 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    In this paper we are implementing a new logarithmic method i.e. Crossproduct algorithm. In this

    algorithm a single data structure is created for a multiple no of field to be checked. But the drawback is

    that, it creates more amount of additional rules. Which require significantly large amount of memory, so

    to avoid this unnecessary usage of memory by additional rules, multiple subsets of data structure (trie)

    are created, which drastically reduces the unnecessary usage of memory [2]. In cross producting multiple

    subset algorithm the results are used to form a key, which is used to lookup in lookup table to find

    matched rule.

    To achieve this we will first look up the prefix for each field separately using bloom filter which is fast

    and memory efficient searching technique. Therefore, with a very high probability the longest prefix

    matching can be performed on the source and destination addresses and the source and destination port

    in just four memory accesses.

    To reduce the memory consumption we have divided the rules into multiple subsets and then

    constructed cross product lookup table. As the rules are distributed into number of sets, we need to

    perform lookup in each subset for which we can use Bloom filter [8]. This computational technique will

    avoid the unnecessary lookups in those subset which do not match the prefix address.

    To reduce memory requirement, we divide the rule in multiple subset and then construct a cross product

    table for each subset. This will reduce the requirement of additional rules in cross product.

    As we divided the rules in number of filter to avoid this extra lookup in the subset which are not having

    matching rule, this helps to get high throughput from this algorithm. If multiple rules are matched then

    we will require only 4 access to choose the highest priority rule where P is the number of rules that

    packet can match.

    In the following sections of this paper we have presented how the algorithm will require P+4+ memory

    accesses to get the matched rule. Where is a small constant which is very less than 1 (

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

    136 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    service provider across a network access point. By using this example they have explained application of

    packet classification. They have also classified the basic techniques in 4 types

    A) Basic data structure.

    B) Geometric algorithms.

    C) Heuristics based.

    D) Hardware based.

    The basic concept of cross product algorithm is that, we have to perform LPM on each field first and then

    combine the results of individual to form a key which mapped towards crossproduct table. This best

    matching rule from the cross product table can be fetched in only one memory access (cycle). The single

    field look up as in the ABV and RFC algorithm.

    TCAMs are widely used for packet classification. Latest TCAM devices also include the banking

    mechanism to reduce the power consumption by selectively turning off unused banks. Traditionally,

    TCAM devices needed to expand the range values into prefixes for storing a rule with range specifications

    [6]. The recently introduced algorithm, DIRPE, uses a clever technique to encode ranges differently which

    results in overall lesser rule expansion compared to the traditional method. The authors also recognized

    that in modern security applications, it is not sufficient to stop the matching process after the first match

    is found but all the matching rules for a packet must be reported. They devised a multi-match scheme

    with TCAMs which involves multiple TCAM accesses.

    In paper [6] the authors have done longest prefix matching using bloom filter to get the required rule for

    the packet, in this paper bloom filters are created on the bases of count of bit present in the destination

    IP address. The bloom filter is made like: for 1 bit prefix. 1bit bloom filter is programmed for 2 bit prefix

    two bit length bloom filter is programmed and so on. After matching the 1, 2, 3n numbers of prefixes

    one by one the proportional data is collected. If 1bit prefix is matched with 1bit bloom filter then filter

    will generate 1bit at its output, if it does not matches then it will generate zero at the output. Thus 1, 2,

    3n no. of prefixes are matched with input bits (Different information e.g. Source or Destination IP).

    Even a single bit in output string of bloom filter is 0 then it will discard the packet. If matched then,

    computation is performed on the output string which is called hashing. After hashing the identity of

    particular prefix is return.

    III. DEFICIENT CROSS PRODUCT ALGORITHM

    Deficient cross product rule work as follows. First, the separate trie is constructed for different fields,

    which are represented in the rule set. In this trie each node is marked with the prefix representing the

    rule. Let the first trie for the field 1 search and second trie for field 2 search given in Fig 1. The connection

    between the marked nodes is nothing but the matching rule for the given prefix in Table 2. At start we

    perform independent search for each field in the respective individual trie and find the most specific

    prefix, which will be longest matching prefix (LPM). After this we create a unique key and use it to index

    the cross product rule table. Every rule in cross product table is original and artificial rules which we

    generated during crossproducting. This rules either forms matching rule or do not form any rule. Hence

    with no matching we get nothing. Thus when there is match present, we always gets the correct rule. This

    is shown in Figure 1.

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

    137 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    Table 1 Basic Classifier Table

    r1 1* *

    r2 1* 00*

    r3 01* 100*

    r4 101* 11*

    r5 101* 11*

    r6 00* 0

    Figure 1 Illustration of basic cross product algorithm

    Here we have used two dimensional rule set with each field is maximum 4 bit wide for the purpose of

    demonstration.

    Table 2 Representation of Pseudo Rule and Original Rule

    1* * r1

    1* 00* r2

    p1 1* 11* r1

    p1 1* 100* r1

    00* * r6

    p3 00* 00* r6

    p4 00* 11* r6

    p5 00* 100* r6

    01* *

    01* 00*

    01* 11*

    01* 100* r3

    p6 101* * r1

    p7 101* 00* r1

    r2

    101* 11* r5

    101* 100* r4

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

    138 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    Figure 2 Representation of Pseudo Rule and Original Rule

    This cross product algorithm has two deficiencies.

    1) A large no of empty rule.

    2) A very large no of pseudo rule.

    The first problem is eliminated by using hash table instead of using direct look up table. As the cross

    product table maintains all the possibility that are generated by cross producting. Thus maintaining a

    hash table is the best way to eliminate empty rules.

    Above all this if we use Bloom filter described in above topic before Hash table it tremendously improves

    the throughput. Because it require only one memory access per LPM. Therefore entire classification

    process takes 5 memory accesses with very high probability to classify a packet. Second problem is

    eliminated by the following methodology.

    IV. SUBSET CROSS PRODUCING ALGORITHM

    In the deficient cross product algorithm to get list of matching rule we required only one hash table

    access but against this benefit multiple blank rule (empty rule) are generated. However if we split the

    single data trie into multiple subsets while taking multiple smaller rule sets and taking cross produdcting

    between them pseudo rule get reduce significantly as compared to deficient cross producting algorithm.

    This is shown in fig. 4.

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

    139 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    Figure 4. Dividing rule in separate subset to reduce overlap

    Table 3. LPM tables for individual field.

    We have divided the rule set into three subsets and within those subsets we have performed the cross

    producting and inserted those rule which are provided in subset, this results inserting P7 pseudo rule in

    subset 1 (G1) andP2 in subset 2(G2) all the pseudo rule vanishes and the load of the extra rule reduces

    significantly. Now one question arises in our mind how this, Pseudo rule are vanished? This is because

    the cross product is by default multiplicative in nature. When the number of overlapping prefixes of a

    field i get reduced by a factor of xi due to partitioning the resulting reduction in the cross product rule is

    of the order xi and here large. After the reduction of the required memory by the cross producting, an

    independent hash table can be prepared in which for each independent rule in subset, independent look

    can be performed. This splitting inserts two extra memory access.

    1) An entire LPM process is performed for all subset.

    2) A separate access is required to look up final from hash table.

    V. EXPLANATION TO THE FLOW OF ALGORITHM

    After splitting the single data tried into multiple subsets, LPM is done on an each subsets separately for

    different field. This will generate the keys for each matched subset, which is place in the LPM table for

    G1 G2 G3

    1* 1 1 -

    00* - - 2

    01* - 2 -

    101* 3 1 -

    G1 G2 G3

    * - 0 0

    00* 2 0 0

    11* 2 0 0

    100* 3 3 0

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

    140 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    particular field. So, LPM is done on each subset and a key is obtained. Key is nothing but a number of

    prefixes matched in the subset. If number of prefix is matched in Subset then it will simply take the key as

    zero.

    Figure 5. Illustration of flow of algorithm

    So in practice we can directly skip this subset and move to next subset. But for the purpose of analysis,

    we will consider the entire matching prefix key as non-zero. After probing this rule directly in hash table

    we get multiple unnecessary rules. This problem is avoided by using bloom filter. We maintain one bloom

    filter in on chip memory corresponding to each off chip rule subset hash table. We first tests the bloom

    filters with the key to be looked up in the subset. If the filter shows the match, We took the longest

    matching prefix (up the key in the off chip hash table the flow of algorithm is shown in figure 5.

    VI. IMPLIMENTATION DETAIL

    In this algorithm we have examined the algorithm with an extensive simulation study by using Net beans

    IDE( Integrated Development Environment) tool. The data set size is varied as 500, 1000, 1500, 2000

    and 3000 number of rules. The Data sets are generated by using TCP Dump which collects the data

    packets from working network. Here is the GUI made for this algorithm.

    VII. RESULTS

    In this we are directly browsing the data set file, which contains data packet information. By clicking

    select the Test File. Or we can directly brows the folder which contains the files of data set. By clicking the

    Select the Rule File we can browse Rule file for the algorithm. After that by clicking the built tab, Bloom

    filter is generated, the time taken to generate the bloom filter is displayed.

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

    141 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    Figure 6. GUI(Graphical User Interface) for the algorithm

    Figure 7. Time taken to built Bloom filter

    Next to this after clicking the test button the time required to check the data set is displayed the data

    packet are checked in the filter and the following result is displayed.

    Figure 8. Time taken to test the data set file

    Figure 9. GUI showing the results

    1. Performance Metrics

    In this simulation of algorithm, the following parameters are considered. Time taken to built bloom filter

    (the time required to build bloom filter), Time taken to process the data (time taken by the filter to

    search the identity of the all packets in data set).

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

    142 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    Table 4. Shows the time requires to built the Bloom filter and time require to process the data set

    through the filter for various no of data packet.

    No of

    packet

    Time taken

    to build

    bloom filter

    Time taken

    to process

    the data set

    500 30 382

    1000 50 1048

    1500 20 1294

    2000 30 1476

    3000 40 1937

    Table 5. Shows the no of packet accepted denied and no of packet which do not matched.

    VIII. CONCLUSION

    Algorithmic solutions are always a better alternative for TCAM for lower cost, less power consumption

    and flexibility. Our computational methodology includes multi set crossproducting, which are much

    better than only crossproducting with insertion of bloom filter which accelerates computational process

    of packet classification. Due to its primary reliance on memory, our algorithm is power-efficient. It

    consumes about an average 30 to 36 bytes per rule of memory (on-chip and off-chip combined). Hence

    rule sets as large as 128K can be easily supported in less than 5MB of SRAM. Using two 300MHz 36-bit

    wide SRAM chips, packets can be classified at OC-192 speed.

    IX. REFERENCES

    [1] Fang Yu, T. V. Lakshman, Martin Austin Motoyama, and Randy H. Katz. Ssa: a power and memory

    efficient scheme to multi-match packet classification. In ANCS 05: Proceedings of the 2005

    symposium on Architecture for networking and communications systems, 2005.

    [2] David Taylor and Jon Turner. Scalable Packet Classification Using Distributed Crossproducting of

    Field Labels. In IEEE INFOCOM, July 2005.

    [3] V. Srinivasan, Subhash Suri, and George Varghese. Packet Classification Using Tuple Space

    Search.In ACM SIGCOMM, 1999.V. Srinivasan, George Varghese, Subhash Suri, and Marcel

    Waldvogel. Fast and Scalable Layer Four Switching. In ACM SIGCOMM, 1998.

    No of

    packets

    File

    accepted File Deny

    No match

    found

    500 43 220 194

    1000 86 440 388

    1500 127 660 582

    2000 172 880 776

    3000 258 1320 1164

  • International Journal of Advance Foundation and Research in Computer (IJAFRC)

    Volume 1, Issue 6, June 2014. ISSN 2348 - 4853

    143 | 2014, IJAFRC All Rights Reserved www.ijafrc.org

    [4] Haoyu Song, Sarang Dharmarpurikar, Jonathan Turner, and John Lockwood. Fast Hash Table

    Lookup Using Extended Bloom Filter: An Aid to Network Processing. In ACM SIGCOMM, 2005.

    [5] Pankaj Gupta and Nick McKeown. Packet classification on multiple fields. In ACM SIGCOMM, 1999.

    [6] Balasaheb S. Agarkar, Uday V. Kulkarni A Novel Technique for Fast Packet Classification.

    International Journal of Computer Applications (0975 8887), Volume 76 No.4, August 2013.

    [7] IDT Generic Part: 71P72604. http://www.idt.com/?catID=58745&genID=71P72604.

    [8] IDT Generic Part: 75K72100. http://www.idt.com/?catID=58523&genID=75K72100.

    [9] Florin Baboescu and George Varghese. Scalable Packet Classification. In ACM SIGCOMM, 2001.

    [10] Sarang Dharmapurikar, P. Krishnamurthy, and Dave Taylor. Longest Prefix Matching using Bloom

    Filters. In ACM SIGCOMM, August 2003. Will Eatherton. Fast IP Lookup Using Tree Bitmap.

    Washington University Master Thesis, 1999.

    [11] T. V. Lakshman and D. Stiliadis. High-speed policy-based packet forwarding using efficient multi-

    dimensional range matching.In ACM SIGCOMM, 1998.

    [12] K. Lakshminarayanan, Anand Rangarajan, and Srinivasan Venkatachary. Algorithms for Advanced

    Packet Classification using Ternary CAM. In ACM SIGCOMM, 2005.

    [13] David E. Taylor. Survey and taxonomy of packet classification techniques. Washington University

    Technical Report, WUCSE-2004, 2004.

    [14] David E. Taylor and Jonathan S. Turner. Classbench: A Packet Classification Benchmark. In IEEE

    INFOCOM, 2005.

    [15] Fang Yu and Randy H. Katz. Efficient Multi-Match Packet Classification with TCAM. In IEEE Hot

    Interconnects, August 2003.