14
TSINGHUA SCIENCE AND TECHNOLOGY ISSNll 1007-0214 ll 07/10 ll pp196-209 Volume 21, Number 2, April 2016 HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined Networks Chang Chen, Xiaohe Hu, Kai Zheng, Xiang Wang, Yang Xiang, and Jun Li Abstract: Most types of Software-Defined Networking (SDN) architectures employ reactive rule dispatching to enhance real-time network control. The rule dispatcher, as one of the key components of the network controller, generates and dispatches the cache rules with response for the packet-in messages from the forwarding devices. It is important not only for ensuring semantic integrity between the control plane and the data plane, but also for preserving the performance and efficiency of the forwarding devices. In theory, generating the optimal cache rules on demands is a knotty problem due to its high theoretical complexity. In practice, however, the characteristics lying in real-life traffic and rule sets demonstrate that temporal and spacial localities can be leveraged by the rule dispatcher to significantly reduce computational overhead. In this paper, we take a deep-dive into the reactive rule dispatching problem through modeling and complexity analysis, and then we propose a set of algorithms named Hierarchy-Based Dispatching (HBD), which exploits the nesting hierarchy of rules to simplify the theoretical model of the problem, and trade the strict coverage optimality off for a more practical but still superior rule generation result. Experimental result shows that HBD achieves performance gain in terms of rule cache capability and rule storage efficiency against the existing approaches. Key words: Software-Defined Networking (SDN); reactive rule dispatching; rule cache; performance 1 Introduction The ultimate goal of networks today is to improve traffic processing performance and reduce network Chang Chen, Xiaohe Hu, and Xiang Wang are with Department of Automation, Research Institute of Information Technology, Tsinghua University, Beijing 100084, China. E-mail: fchenchang13, hu-xh14, xiang- wang11g@mails.tsinghua.edu.cn. Kai Zheng is with IBM China Research Lab, Beijing 100084, China. E-mail: [email protected]. Yang Xiang is with Research Institute of Information Technology, Tsinghua University, Beijing 100084, China. E- mail: [email protected]. Jun Li is with Research Institute of Information Technology, Tsinghua National Lab for Information Science and Technology, Tsinghua University, Beijing 100084, China. E-mail: [email protected]. To whom correspondence should be addressed. Manuscript received: 2015-03-31; revised: 2015-08-25; accepted: 2015-08-28 deployment and operation cost, while satisfying diverse policies for network control. These policies are mostly presented in the form of packet processing rules installed in the forwarding devices, which process the incoming packets according to values of specific fields in their headers. To better support the growth and innovation of networks, Software-Defined Networking (SDN) offloads the control logic from the forwarding devices to a separate and centralized control plane. Correspondingly, control policy determining is thus separated from data-plane packet processing, and this makes the packet processing rules in the forwarding devices logically become the cache of control policies in the controllers. Under this clear division of roles, an urgent issue regarding the execution of control policies emerges: in respond to the data plane’s rule installation requests (i.e., packet-in message [1] ), how does the controller reactively generate proper rules and dispatch them to the corresponding devices, while preserving

HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

TSINGHUA SCIENCE AND TECHNOLOGYISSNll1007-0214ll07/10llpp196-209Volume 21, Number 2, April 2016

HBD: Towards Efficient Reactive Rule Dispatching inSoftware-Defined Networks

Chang Chen, Xiaohe Hu, Kai Zheng, Xiang Wang, Yang Xiang, and Jun Li�

Abstract: Most types of Software-Defined Networking (SDN) architectures employ reactive rule dispatching to

enhance real-time network control. The rule dispatcher, as one of the key components of the network controller,

generates and dispatches the cache rules with response for the packet-in messages from the forwarding devices.

It is important not only for ensuring semantic integrity between the control plane and the data plane, but also for

preserving the performance and efficiency of the forwarding devices. In theory, generating the optimal cache rules

on demands is a knotty problem due to its high theoretical complexity. In practice, however, the characteristics

lying in real-life traffic and rule sets demonstrate that temporal and spacial localities can be leveraged by the rule

dispatcher to significantly reduce computational overhead. In this paper, we take a deep-dive into the reactive rule

dispatching problem through modeling and complexity analysis, and then we propose a set of algorithms named

Hierarchy-Based Dispatching (HBD), which exploits the nesting hierarchy of rules to simplify the theoretical model

of the problem, and trade the strict coverage optimality off for a more practical but still superior rule generation

result. Experimental result shows that HBD achieves performance gain in terms of rule cache capability and rule

storage efficiency against the existing approaches.

Key words: Software-Defined Networking (SDN); reactive rule dispatching; rule cache; performance

1 Introduction

The ultimate goal of networks today is to improvetraffic processing performance and reduce network

�Chang Chen, Xiaohe Hu, and Xiang Wang are withDepartment of Automation, Research Institute ofInformation Technology, Tsinghua University, Beijing100084, China. E-mail: fchenchang13, hu-xh14, [email protected].�Kai Zheng is with IBM China Research Lab, Beijing 100084,

China. E-mail: [email protected].�Yang Xiang is with Research Institute of Information

Technology, Tsinghua University, Beijing 100084, China. E-mail: [email protected].� Jun Li is with Research Institute of Information Technology,

Tsinghua National Lab for Information Science andTechnology, Tsinghua University, Beijing 100084, China.E-mail: [email protected].�To whom correspondence should be addressed.

Manuscript received: 2015-03-31; revised: 2015-08-25;accepted: 2015-08-28

deployment and operation cost, while satisfying diversepolicies for network control. These policies are mostlypresented in the form of packet processing rulesinstalled in the forwarding devices, which process theincoming packets according to values of specific fieldsin their headers.

To better support the growth and innovationof networks, Software-Defined Networking (SDN)offloads the control logic from the forwardingdevices to a separate and centralized control plane.Correspondingly, control policy determining is thusseparated from data-plane packet processing, and thismakes the packet processing rules in the forwardingdevices logically become the cache of control policiesin the controllers. Under this clear division of roles, anurgent issue regarding the execution of control policiesemerges: in respond to the data plane’s rule installationrequests (i.e., packet-in message[1]), how does thecontroller reactively generate proper rules and dispatchthem to the corresponding devices, while preserving

Page 2: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

Chang Chen et al.: HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined Networks 197

semantic integrity (i.e., decision consistency of controlplane and data plane) and optimizing efficiency? Wecall it the reactive rule dispatching problem.

To realize an elastic SDN with fine-grainedcontrol, reactive rule dispatching is the basis ofdealing with complex and dynamic events (e.g.,host mobility, abnormal traffic analysis, and failurerecovery). Although rules can, alternatively, beproactively pushed into the data plane[2–5] beforespecific traffic comes, however, due to the expensiveand limited fast data-plane flow table, it is costly or eveninfeasible for the forwarding devices to cache potentialrules all at the bootstrapping stage. Fortunately,network traffic has both temporal and spatial localitiesin most scenarios[6], which makes it possible forreactive rule dispatching to perform well by optimalcaching of rules.

Figure 1 shows the general architecture of reactiverule dispatching in an OpenFlow-based SDN. As a coremodule of the controller, the rule dispatcher logicallysits between the rule matching module in the controller,which holds the entire control rules, and the rule table(or flow table) in the forwarding device, which cachesthe recently dispatched rules. To handle a “cache miss”message (i.e., a packet-in message) for certain trafficflow, it is performance critical for the rule dispatcher todetermine which or what rule(s) to be dispatched to theforwarding device, not being too generic (i.e., with toomuch wildcards or too large ranges to lead to semanticerror) or too specific (i.e., limiting the probability ofcovering more upcoming traffic and leading to highcache miss ratio).

Using some straightforward approaches to preservedecision consistency, reactive rule dispatching isimplemented in various types of SDN architecturesand systems[2, 7–9]. However, all these straightforwardapproaches (which will be detailed in Section 2) lackthe consideration of performance. For example, they

Fig. 1 General model of reactive rule dispatching.

usually generate exactly matching rules for the traffic ona per flow basis and thus accompany with remarkablyhigh cache miss ratio.

From a high level, an optimized cache rule generationmethod should take into consideration the followingperformance criteria:

(1) Low latency: Generally, in the case whenthe rule required for the specific packet to forwardresides locally in the forwarding device (i.e., in thecase of cache hit), the forwarding time is on theorder of nanosecond. In contrast, the processinglatency in the case of a cache miss is on the order ofmillisecond[10]. The high time penalty of a cache misscomes essentially from network transmission overheadbetween the controller and the forwarding devices.Obviously, the higher processing latency on the ordersof magnitude suggests that the dispatched cache rulesshould cover as much potential upcoming traffic aspossible (e.g., with more wildcard fields or larger rangeto match).

(2) Efficient memory utilization: The feasibilityof implementation can be enhanced if the generatedcache rules make optimal use of the limited memoryspace in forwarding devices. For example, TernaryContent Addressable Memory (TCAM) is one of thepopular lookup mechanisms for packet classification.However, as TCAM is space-limited, expensive, andpower-hungry, the commodity switches usually supportonly thousands to tens of thousands of rules. Underthese constrains, the rule dispatcher should also beresponsible to ensure the efficient use of storageresource in forwarding devices.

Reactive rule dispatching is inherently hard tooptimize due to the existence of rule dependency(caused by rule overlap and rule priority). Simplygenerating and dispatching cache rules regardless ofrule dependency may cause the action inconsistencybetween the full rule set and the cache rules. On theother hand, finding the optimal cache rule regardinghigh traffic coverage (a.k.a., high cache hit ratio)implies high computational complexity in theory, whichis usually unacceptable for real-time packet processing.For the sake of feasibility, it is necessary for therule dispatching algorithms to exploit the inherentcharacteristics lying in real-life rule sets and traffic, andto flexibly trade strict optimality off for a more practicalbut still superior solution. In general, the proposed workmakes the following main contributions:

Problem deep-dive: The geometrical model and

Page 3: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

198 Tsinghua Science and Technology, April 2016, 21(2): 196-209

complexity of rule dispatching problem are studied inthis paper, along with the summary and classification ofexisting works according to different design decisions.

Heuristic rule dispatching algorithm: Based onthe observation of characteristics of real-life rulesets, we propose Hierarchy-Based Dispatching (HBD),including a set of algorithms for determining the near-optimal cache rules according to packet-in messages.Besides, the cache rules dispatched by HBD have anaddtional bonus attribute: Any pair of the cache rulesis either disjointed or harmlessly overlapped (i.e., thetwo cache rules overlap with each other but are derivedfrom a same original rule). This nice attribute[11] canbe leveraged accompaning with various kinds of packetclassification schemes for faster cache rule lookup onthe data plane.

Performance evaluation: Based on both syntheticand real-life rules, the proposed algorithms as wellas several existing approaches are evaluated in termsof performance metrics such as cache hit rate, trafficcoverage capability, and processing time.

The rest of this paper is organized as follows: Therelated works are reviewed in Section 2. In Section 3we introduce the optimization objective and modelingof reactive rule dispatching problem. In Section 4 wediscuss several design principles in detail, and give ashort study on the overlap characteristic of real-liferules. The proposed algorithms are presented in Section5 and evaluated in Section 6. In Section 7 we concludeour work.

2 Related Work

2.1 Straightforward approaches

The straightforward rule dispatching approachesapplied in prior SDN architectures can be summarizedas follows: (1) Exact dispatching: Dispatching exact-match rule for the specific packet to forward, whichis employed in several prior SDN architectures suchas Ethane[7] and DevoFlow[10]. This approach suffersfrom high cache miss ratio due to the weak trafficcoverage capability of exact-match rules. (2) Abundantdispatching: Dispatching the rule matching the specificpacket along with all the dependent rules (where thedependency chain of rules is studied in Ref. [12]).This approach leads to excessive rule space usage indevices and in some cases, many of the dispatcheddependent rules do not help to hit the subsequenttraffic. (3) Fragmented dispatching: Transforming

interdependent rules into a set of disjointed sub-rulesbefore dispatching. This approach has relatively nicetraffic coverage but the overlaps between multi-dimensional rules may lead to the explosion of thepopulation of the transformed sub-rules[13].

2.2 Prior algorithms

Unlike the straightforward approaches mainly focusingon semantic correctness of the dispatched rules, severalprior arts also target at meeting the performancerequirements through well-designed rule dispatchingalgorithms, which can be summarized according todifferent design decisions:

(1) Evolving-based algorithm: In evolving-basedalgorithms, the boundaries of the dispatched rulesare dynamically expanded (i.e., evolved) according toincoming traffic. Though they are designed for thescenario of two-stage rule matching schemes inside apacket forwarding device, the proposed algorithms arestill enlightening today’s SDN scenarios. In Ref. [14],Smart Rule Cache (SRC) introduces a representativeevolving cache rule construction algorithm, which isalso applied in DIFANE[2]. Another evolving-basedalgorithm is introduced in Storm[15], which proposesa rule caching system for software routers. However,the performance of SRC and Storm relies too much onthe traffic pattern. For example, every single packet inan address-incrementing SYN flood traffic may triggera cache miss because of the incremental extension ofrule boundaries. Besides, their high cache hit ratios tosome extent result from the aggregation of rules with thesame action, which leads to incapability of preservingstatistical information for each individual flow.

(2) Dependency-based algorithm: As animprovement to the straightforward abundantdispatching approach, the Cover-set algorithmpresented in CacheFlow[12] finds each rule’s immediateancestor rules in the dependency graph as the “cover-set” of the rule, and dispatches rules associated withtheir cover-sets to preserve semantic correctness. Inthe CacheFlow architecture, the Cover-set algorithm isused to proactively dispatch to the hardware switchesa group of rules with high “weights” and low “costs”.At run-time, the packets that match the cover-setsare redirected to some software switches for furtherhandling.

(3) Cut-based algorithm: As one latest worktargeting at reactive rule dispatching, CAching inBuckets (CAB)[16] dispatches a set of buckets

Page 4: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

Chang Chen et al.: HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined Networks 199

associated with their overlapped rules, and implementsa two-stage table pipeline in the forwarding device. Thebucket generation procedure, which directly affectscache performance, shares similarity with spacedecomposition based packet classification algorithmssuch as HiCuts[17] and HyperCuts[18].

The advantages of the proposed approaches againstthe prior-arts will be discussed later in Section 4.

3 Problem Statement

To make a clear view of reactive rule dispatchingproblem, we introduce the theoretical model, set theoptimization objective, and propose an optimal solutionto the problem in this section.

3.1 Modeling

As shown in Fig. 1, the forwarding devices relyon packet classification to associate the forwardingrules or actions with the traffic. Formally speaking, apacket is classified according to some specific headerfields values (e.g., the typical 5-tuples, source anddestination IP addresses, source and destination ports,and protocol number). From a geometrical point ofview, a packet P D fp1; p2; : : : ; pDg is a point in a D-dimensional address space S , where D is the numberof header fields required to classify the packets, andthe d -th field value of P is denoted as pd . A ruleR D fŒr1s; r1e�; Œr2s; r2e�; : : : ; ŒrDs; rDe�g also containsD components. The d -th component RŒd� D Œrd s; rde�

refers to a range in the d -th dimension of S , and allthe D ranges of R compose a D-dimensional hyper-rectangle.

Packet classification can be regarded as a pointlocation problem in computational geometry.Accordingly, the process of generating an optimaldispatched rule can be considered as finding the largestinscribed hyper-rectangle for a given point in theaddress space.

Figure 2a shows a 2-dimensional rule set exampleand Fig. 2b is the corresponding geometrical model.There are 8 rules in a 2-dimensional address space. Forpacket classification, some of the rules overlap witheach other and the priority determines the final matchfor packets in the overlapped regions. As illustrated inFig. 2c, given the packet P D f12; 7g which matchesR7, the rule dispatcher needs to generate and dispatchto the forwarding device a proper rule.

Due to rule overlapping and different priorities for thecorresponding forwarding actions, simply dispatching

(a) An example rule set

(b) The geometrical model (c) The optimal cache rule for packet P

Fig. 2 A two-dimensional example.

the intact rule R7 D Œ8; 15�; Œ5; 12� might break theaction consistency (For example, in the space coveredby R4, which is nested in R7, the action associatedwithR4 should be taken instead of that ofR7 accordingto their priority; thus simply applying the action ofR7 to the entire hyper-rectangle covered by R7 maylead to loss of consistency). As a straightforward idea,“exact dispatching” is to dispatch the very specific rulerepresenting only the single point as the packet headerP indicates directly, i.e., Œ12; 12�; Œ7; 7�. This obviouslypreserves the action consistency but may suffer fromhigh cache miss ratio due to the very low coverage ofaddress space. In contrast, “abundant dispatching” isto dispatch all the dependent rules (e.g., R7, R2, R3,and R4 in our example), which occupies excessive rulespace in the underlying forwarding device. Moreover,some of the “remote” dispatched rules (e.g., R2) maynot actually help with increasing cache hit ratio due totraffic’s spatial locality.

Suppose that the dispatcher only generates one cacherule. The optimal rule to generate should potentiallycover the maximum space in which the correspondingrule action is with the highest priority. As shown bythe shaded rectangle in Fig. 2c, the optimal cacherule for the example is Œ8; 15�; Œ7; 8�, which is thelargest inscribed hyper-rectangle that preserves theaction consistency.

Page 5: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

200 Tsinghua Science and Technology, April 2016, 21(2): 196-209

3.2 Optimization objective

In an SDN with reactive rule dispatching, the averagetime for processing a packet can be approximated by

Tavrg D Thit C �miss � .Ttrans C Tgen/ (1)

where Thit is the average rule lookup time in theforwarding device, �miss is the rule cache miss ratio,Ttrans is the average delay of the transmission betweentwo planes, and Tgen is the average time for generatinga cache rule.

The optimization objective of rule dispatchingincludes two aspects:

(1) To reduce Tavrg according to

Tavrg D Thit C Ttrans � �miss �

�1C

Tgen

Ttrans

�(2)

Generally, Thit is on the order of nanosecond andTtrans is on the order of millisecond[10]. For the othertwo arguments, it is critical to reduce �miss for theobtimization objective, while Tgen is the cost to achievethis. In order to ensure that the benefits outweigh thecost, the rule dispatching algorithm needs to producehigh-coverage rules efficiently; it is equivalent withreducing �miss and guaranteeing Tgen � Ttrans.

(2) To satisfy the rule space constraints in theforwarding devices (e.g., the TCAM table sizeconstraint): The reduction of �miss should not relymerely on installing larger number of rules on thedevices. Instead, the focus should be on leveragingthe temporal and spacial localities of the traffic andgenerating less but higher coverage rules (i.e., as morewildcards or larger ranges in the generated rules aspossible while preserving the semantic consistency tothe control rules).

3.3 The optimal solution and complexity analysis

In this section, we brief an optimal solution to theproblem of determining a single optimal cache rule andanalyze its complexity.

Let point P represent the input packet, hyper-rectangle Rmatch represent the highest priority rule thatmatches P , and R represent the optimal cache rule wewish to generate. Suppose R is initialized to be theexact-point rule located at P . The mission is to keepexpanding and adjustingR’s boundary (i.e., Œrd s; rde� ofR, d D 1; 2; : : : ;D) until finding the largest possiblehyper-rectangle, which represents the optimal cacherule. There are three constrains of the expansion of R:

(1) R covers P :

8d 2 Œ1;D� W pd 2 RŒd� (3)

(2) R is nested in Rmatch, that is, Rmatch forms a hardlimit for the boundary expansion of R:

8d 2 Œ1;D� W RŒd� � RmatchŒd � (4)

(3) The rules that overlap withRmatch and have higherpriorities, referred to as R�, must not overlap with R;in other word, R� forms the obstacles for the boundaryexpansion of R:

8d 2 Œ1;D� W RŒd� \R�Œd � D ∅ (5)

For each R�, if there are exactly M (0 6M < D)dimensions in which the range of R� covers theprojection of P (i.e., pd 2 RŒd�), Rmatch can besimplified as an M -dimensional obstacle, whichprovides D �M possibilities for the boundaryadjustment of R. For example, in Fig. 3a, the matchingrule Rmatch D R8, the obstacle rules R� D R1 �R7.As shown in Fig. 3b, R8 forms R’s boundary limit.R1, R5, and R6 are simplified as three 1-dimensionalobstacles (lines), each of which provides only onechoice for boundary expansion of R (field-x for R1 andR5, field-y for R6). R2, R3, R4, and R7 are simplifiedas four 0-dimensional obstacles (points), each of whichprovides two choices for boundary adjustment (field-xor field-y).

In the worst case, all the obstacles are points, andall possible combinations of boundaries need to beexamined. Thus the optimal solution requires extra�.ND/ memory space and �.DN / computation time,where N is the number of rules that overlap withRmatch and also have higher priorities. As we can see,the complexity of finding the optimal cache rule isunacceptable in practice.

4 Design Decision

4.1 Design principles

Compared with the prior-arts, the proposed approach isbased on four distinct principles:

(a) An input packet matching R8 (b) Simplified obstacles

Fig. 3 Finding the optimal cache rule.

Page 6: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

Chang Chen et al.: HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined Networks 201

(1) Constructing new cache rules rather thanpreserving original ones: Unlike Cover-set algorithmthat determines “which rule to be dispatched”, wedetermine “what rule to be dispatched”. This makes itmore possible to customize and therefore optimize thedispatched rule. Besides, every reconstructed rule inour approach is derived from an original rule, thus theaction and more attributes (e.g., counters) of the originalrule can be inherited directly.

(2) Dispatching only one rule per “packet-in”:According to the observation on the spatial locality ofreal-life traffic, the nearer the distance to the “packet-in”point, the more possible the dispatched rule may hit thefuture traffic. Thus we argue that dispatching multipleoverlapping-but-sparse rules to the forwarding devicewill have little advantages yet incur significant storageoverhead. Therefore, in the proposed rule dispatchingscheme, we concentrate on finding the “best” single rulewhich covers the largest “inscribed” hyper-rectangle fora given “packet-in” point in the address space.

(3) No assumption on device functionality: UnlikeCAB that requires the forwarding device to supporttwo-stage flow table pipeline (i.e., to be compatiblewith OpenFlow specification), we make no assumptionon device functionality. Our approach should becompatible with the legacy forwarding devices with anykind of packet classification approach applied.

According to different design decisions, we comparethe main attributes of existing works along with ourproposed approach in Table 1.

(4) Adapted to the diversity of rule setscharacteristics: As mentioned before, the challengesof the rule dispatching problem comes from theexistence of rule overlap and rule priority. We do notassume that the control rules are basically disjointed

Table 1 Existing work.

ApproachRule

recons-truction

Number of rulesdispatched

per packet-in

Assumptionon forwarding

deviceExact — Single NoAbundant No Multiple NoFragmented Yes Single NoEvolving-based Yes Single —

Cover-set No MultipleNeed extra

softwareswitches

CAB No MultipleNeed two-stage

flow tableOur work Yes Single No

or, conversely, seriously overlapped. As a matter offact, different kinds of rule set tend to form differentkinds of overlap characteristics. For example, thetraditional routing or switching rules are more likely tobe non-overlapping, but the ACL rules at a multi-tiernetwork are more likely to be overlapped with eachother because of the differentiated permissions todifferent subnets or individuals.

Since rule overlap cannot be avoided, in the nextsection we study the rule overlap characteristics inseveral real-life rule sets, in order to further introducethe heuristic to the algorithm for simplifying the cacherule generation process.

4.2 Overlap characteristic

The overlap characteristic is studied using fourrepresentive real-life rule sets. Two of the rule sets,ACL1 and FW1, are publicly available from Ref.[19]. The other two rule sets, vAccess and vGateway,are extracted from the network controller and thesecurity controller of a cloud management platform,respectively. This platform[20, 21] provides secure virtualprivate cloud service in a Tier-IV datacenter. vAccessis for the access software switch in a physical server,containing 1982 OpenFlow rules; and vGateway is for atenant virtual gateway that delivers routing and securityservices among multiple virtual networks, containing3106 five-tuple rules.

To study the overlap degree of these rule sets, wedefine the number of overlap tiers as the maximumdepth of the rule set’s dependency chains[12]. (Theoverlap tiers can be built in a bottom-up traverse ofthe rule set. The default rule forms the bottom tier.Any two rules in the same tier must not overlap witheach other.) Table 2 shows the number of overlap tiersof each rule set and suggests the diversity of overlapdegree of different kinds of rules.

Moreover, we define three relations between twooverlapped rules (R1 and R2) as follow:

(1) Nested: R1 is nested in R2 if

8d 2 Œ1;D� W R1Œd � � R2Œd � (6)

We call R1 the subset rule and R2 the superset rule.

Table 2 The four real-life rule sets.

Set Number of rules Number of overlap-tiersACL1 753 38FW1 270 5

vAccess 1982 4vGateway 3106 11

Page 7: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

202 Tsinghua Science and Technology, April 2016, 21(2): 196-209

(2) Crossed: R1 and R2 are crossed if

9d1; d2 2 Œ1;D� W

R1Œd1� � R2Œd1� and R1Œd2� � R2Œd2�(7)

(3) Partially overlapped: The other cases.Suppose that all the rules can be organized in a

complete “nesting hierarchy”, namely, for each pair ofrules, they are either disjointed or nested in priorityorder (i.e., the higher-priority one is nested in the lower-priority one). For the optimal solution presented inSection 3, the number of “obstacle” rules needed to beexamed during the boundary adjustment of R can besignificantly reduced. It is because that after examinga “obstacle” superset rule, all the corresponding subsetrules nested in this superset rule will certainly not affectthe further boundary adjustment of R, and thus can beignored.

For the purpose of observing the nestingcharacteristic, we further provide a more intenseanalysis on overlap type using ACL1, which is theworst case among the four rule sets. Figure 4 presentsthe overlap characteristic of ACL1 in detail. The x-axisrepresents the ID of the rules; the rule with a smallerID has a higher priority. The blue curve shows the totalnumber of rules overlapped with a specific rule, thegreen curve shows the number of higher priority rulesoverlapped with a specific rule (i.e., the rules that mightcause action inconsistency), while the red curve showsthe number of rules overlapped but non-nested with aspecific rule. We can see that: (1) For a specific rule R,the number of higher-priority rules that overlap with Rmight be numerous, especially for the large-sized low-priority rules; (2) Though most of the overlapped rulepairs tend to be “nested”, the “crossed” or “partiallyoverlapped” relations still exist among rules.

5 Hierarchy-Based Dispatching

The proposed HBD approach includes the followingthree threads: (1) Pre-processing: The rule dispatcherdivides all the control rules into several groups, suchthat each of the groups forms a complete nestinghierarchy in the address space, respectively. (2) Cacherule generation: With the nesting hierarchy treesbuilt in pre-processing, the rule dispatcher reactivelygenerates high-coverage cache rules using a greedyalgorithm at run-time. (3) Rule set update: The ruledispatcher supports dynamic rule set update by meansof incrementally updating the nesting hierarchy trees.

5.1 Pre-processing

According to the observation on overlap characteristics,we cannot assume that the rule set is normally organizedin a complete nesting hierarchy. In the example of Fig.5a, even if most rules form a nesting hierarchy, thereare still some rule pairs that are crossed or partiallyoverlapped, including R5 and R7, R6 and R8, R7

and R8, and R8 and R9. The rule that is crossed orpartially overlapped with another rule is reffered to as a“saboteur rule” in the following.

In order to build the nesting hierarchy of the rule setwhere all saboteur rules are eliminated, HBD employs a“grouping + decomposing” strategy for pre-processing.

5.1.1 The naive decomposing strategyOne intuitive solution to handle saboteur rules is todecompose them into some sub-rules until no saboteursexist. Using this strategy, the operation of buildinga nesting hierarchy can be accomplished in a singlebottom-up traverse of the rule set. Algorithm 1 showsthe process of building the nesting hierarchy (whichis organized as a tree structure). During the bottom-

Fig. 4 Overlap characteristic of ACL1.

Page 8: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

Chang Chen et al.: HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined Networks 203

(a) An example rule set (b) The naive decomposing strategy

(c) Grouping-based nesting hierarchy

Fig. 5 A two-dimensional example.

Algorithm 1 Decomposing-based pre-processing1: function BUILDHIERARCHY DECOMPOSING(Ruleset)2: stack D fg3: for each R in Ruleset (decending priority/ do4: stack.push.R/5: end for6: tree D fstack.pop./g7: while stack not empty do8: R D stack.pop./9: for each Rt in tree do

10: if R is nested in Rt then11: Rt :addChild.R/12: continue13: end if14: if Rt :is Saboteur.R/ then15: R0 D spaceIntersection.R;Rt /

16: Rt :addChild.R0/

17: stack.push.R �R0/

18: end if19: end for20: end while21: return tree22: end function

up traverse, when a new saboteur rule (i.e., crossedor partially overlapped with some pre-traversed rule)is encountered, it will be decomposed until everysub-rule of it can find a certain pre-traversed rule

to be its superset. The decomposing is achievedby function spaceIntersection(). Figure 5b shows thenesting hierarchy of the example rule set using thisnaive decomposing strategy. R5, R6, R7, and R8

are decomposed into several nested sub-rules, whichafterwards meet the nesting hierarchy requirement.

However, the R �R0 step in line 17 might generatenumerous sub-rules of R (2D in the worst case),which causes indeterminacy to the number of rules afterbuilding the nesting hierarchy. As shown in Table 3,the “decomposing” column suggests the infeasibility ofdirect decomposition of all saboteur rules. For example,after the direct decomposition, the size of vGatewayswells from 3106 rules to over 200 thousand rules.

5.1.2 Grouping strategyInstead of directly decomposing the saboteur rules, thegrouping strategy eliminates the potential risks of rulenumber explosion by means of partitioning the rulesinto different groups where no saboteur rule existsin each group. It should be noticed that the defaultrule which covers the entire address space needs to beassigned to every group.

The grouping strategy is based on the followingobservation. Suppose that the original rules are dividedinto K groups. Consequently, the original problem offinding large inscribed hyper-rectangle R for a packet-in point P can be partitioned into K sub-problems.The result (i.e., generated cache rule) to each sub-problem is referred to as Ri (i D 1; 2; : : : ; K). Theintersected space of allRi forms the final resultR to theoriginal problem (i.e., R D \Ri ), which can guaranteethe correctness. Thus, when dealing with a cachemiss, the cache rule generation algorithm (which willbe described later) is executed on all groups (i.e., rulesubsets) parallelly. After that, the dispatcher conductsa space intersection of all intermediate cache rules toform the final result.

Algorithm 2 shows the grouping operation as wellas building each group’s nesting hierarchy. Each tree

Table 3 Scalability of the two strategies.

Rule setNumber

ofrules

Decomposing GroupingNumber ofrules afterbuilding

Numberof

groups

Number ofrules afterbuilding

ACL1 753 2338 3 755FW1 270 154 257 5 274

vAccess 1982 2154 3 1984vGateway 3106 234 874 5 3110

Page 9: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

204 Tsinghua Science and Technology, April 2016, 21(2): 196-209

Algorithm 2 Grouping-based pre-processing1: function BUILDHIERARCHY GROUPING(Ruleset)2: stack D fg; list D fg3: for each R in Ruleset .decending priority/ do4: stack.push.R/5: end for6: tree D fstack.pop./g7: while stack not empty do8: R D stack.pop./9: for each Rt in tree do

10: if R is nested in Rt then11: Rt :addChild.R/12: continue13: end if14: if Rt : is Saboteur.R/ then15: list.add.R/16: end if17: end for18: end while19: add tree to trees list20: if list not empty then21: list.reverse./I list.add.Rdefault/

22: BuildHierarchy Grouping(list)23: end if24: end function

represents a group, and the trees list contains all thetrees (groups). The grouping procedure is implementedas a recursive function: when a saboteur rule isencountered, it is excluded from current group, and willbe explored in later iterations. The algorithm terminatesuntil in each group the rules form a complete nestinghierarchy. Figure 5c shows the grouping result of theexample rule set, where the original rules are dividedinto two groups, and in each group the rules form acomplete nesting hierarchy. The intersected space ofthe cache rule results of the two groups (as shown inred) represents the final cache rule result (the propertiesof the cache rule are inherited from R9, which is withthe higher priority). The “grouping” column in Table3 shows the number of groups needed for eliminatingsaboteurs completely, which is small enough for reliableperformance. However we can not assume that allrule sets in real-life have similar characteristics. Tosupport the limitation of the group number (especiallyfor large-sized rule sets), we combine the grouping anddecomposing strategies to enhance the feasibility ofimplementation.

5.1.3 Combination of the two strategiesIn the grouping process, the algorithm can introducemoderate rule decompositions in current group (instead

of excluding all saboteur rules) based on the costof decomposing a specific saboteur rule. The costis defined as the number of sub-rules that would beproduced by step R � R0 in Algorithm 1. Whether todecompose a saboteur rule or exclude it from currentgroup depends on a pre-defined cost threshold: if thecost is not larger than the threshold, the algorithmchooses the former option; otherwise, the latter ischosen. For example, the cost of decomposing a rulethat is partially overlapped on only one dimensionmight be 1, thus the algorithm decomposes the rule andkeeps it in current group. But the cost of decomposinga rule crossed on multiple dimensions is usually high,thus this rule might be excluded from current group.

The question of how to set the cost threshold and theupper limit of group number leaves open. Generally,the upper limit of the group number should follow themaximum number that a particular platform supports toexecute in parallel (e.g., 8 cores or threads), and the costthreshold depends on the overlap degree of rules. In ourapproach, the upper limit of group number is set to 4,and the cost threshold is set to 2.

5.2 Cache rule generation

With the rule set’s nesting hierarchy, the cache rulegeneration process can be significantly simplified.Given the packet-in point P and its matching rule R, ifR is on the topmost tier, the original R can be directlydispatched; otherwise, exploring only the immediatenested rules of R (i.e., the rules that the child nodesof R represent) is sufficient for preserving semanticintegrity and optimizing performance. Thus in HBDthe rule generation problem is simplified to a two-tiernesting model, containing only R and its immediatenested rules.

The theoretical optimal solution presented in Section3 needs �.DN ) time to find the optimal cache rulebecause of DN boundary adjustment possibilities inthe worst case. It is impractical to apply this solutionin practice due to the exponential computationalcomplexity. Thus in HBD the optimality of coveragecan be traded off for a much more efficient but stillsuperior greedy algorithm. As shown in Algorithm 3,the output cache rule Rout is initialized to the packet’smatching rule R. During the process of adjusting theboundary of Rout, Rout greedily reduces its boundarybased on each immediate nested rule of R.

Page 10: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

Chang Chen et al.: HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined Networks 205

Algorithm 3 Cache rule generation1: function CACHERULEGENERATE(packet, tree)2: Rm D packet.match./3: if Rm is a leaf node of tree then4: return Rm

5: else6: Rout D Rm

7: for each child node Robs of Rm in tree do8: for each d in Œ1;D� do9: predict volume loss % W lossŒd �

10: end for11: choose d with the minimum lossŒd �12: adjust Rout in the d -th dimension13: end for14: return Rout

15: end if16: end function

In contrast with the strict optimal solution, the greedyalgorithm reduces the time complexity to �.ND).Additionally, the data size N is significantly reducedas the nesting hierarchy simplifies the problem model.

5.3 Rule set update

The nesting hierarchy naturally supports rule set updatein the dynamic SDN scenarios. When adding a new setof rules to the rule set, the rule set update thread of HBDfirst uses Algorithm 2 to partition this new rules intoseveral groups, each of which we refer to as a branch.Then the update thread incrementally adds each branchto the nesting hierarchy trees according to the locationof the branch’s root rule. The algorithm for adding abranch is shown in Algorithm 4.

As for a rule deletion request, HBD simply gives therules descendants to its father node, and removes thisrule node from current hierarchy tree.

5.4 Bonus attribute of HBD

The cache rules dispatched by HBD have an bonusattribute: Any pair of the cache rules are eitherdisjointed or “harmlessly” overlapped (i.e., the twocache rules overlap with each other but are derivedfrom a same original rule). It is called the order-independent attribute. The impact of this attribute isstudied in SAX-PAC[11], one latest work that simplifiesthe classifier matching problem into several sub-problems. In general, the order-independent attributecan be leveraged by the packet processing moduleof forwarding devices in the following aspects:(1) The order-independence can eliminate all typesof rule conflictions caused by rule-overlapping[22].

Algorithm 4 Add branch1: function ADDBRANCH(branch, tree)2: R D branch.root3: if R:leftcorner.match./Š D R:rightcorner.match./ then4: AddBranch(branch, next tree/5: return6: else7: Rm D R:leftcorner.match()8: end if9: list D fg

10: for each child node Rc of Rm in tree do11: if Rc is nested in R then12: list.add.Rc/

13: else if Rc and R are overlapped then14: AddBranch(branch, next tree/15: return16: end if17: end for18: R:addChildren(list)19: Rm:deleteChildren(list)20: Rm:addChild.R/21: end function

(2) Order-independent rules are disjointed in thesearch space, which can improve both the spacialand time performances of most packet classificationalgorithms[23, 24]. (3) As the space that a certain rulelocates is associated with only one single action, it ismore convenient to track and debug these rules.

In this part, we brief an order-independent rulelookup method based on Open vSwitch (OVS)[25] forimproving its packet processing performance. As shownin Fig. 6a, the dispatched rules are installed in theuserspace of OVS and organized in a two-stage liststructure. Originally, due to the dependency amongrules, the classifier needs to organize the priorityinformation and sort the former stage list in terms ofthe rule priorities. The main cost of the rule lookupprocess comes from two aspects: (1) the traverse ofthe two-stage list, which can hardly be pre-terminateddue to rule dependency; and (2) the degradation of“megaflow”[25] that will be inserted to the kernel ofOVS. Specifically, the megaflow (with many wildcards)is always degraded to a “microflow” (with no wildcards)when identifying the mask associated with the flow,which correspondingly degrades the packet processperformance. It is because that the sorting method ofthe rule lists is bound with the rule priority and thuscannot be customized for performance consideration.

Using HBD, since the rules dispatched to OVS areorder-independent, the traverse of the rule lists can be

Page 11: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

206 Tsinghua Science and Technology, April 2016, 21(2): 196-209

(a) Original rule classifier in OVS userspace

(b) Simplification of the classifier

Fig. 6 Improvement on OVS classifier leveraging the order-independent attribute.

directly terminated when a rule matching occurs, andthe OVS classifier no longer needs to sort the rulelists based on priorities. Furthermore, to maximum thewildcard number of megaflows, as shown in Fig. 6b,we sort the former stage list in descending order ofwildcard number. Thus, the coverage of megaflows canbe enhanced and the netlink overhead of OVS can besignificantly reduced.

6 Evaluation

The performance of HBD is evaluated throughsimulation, and three other algorithms are implementedfor comparison, which includes: (1) the naiveexact dispatching approach; (2) a dependency-basedalgorithm that works in reactive mode, copies thestrategy of Cover-set[12], and treats the hits on the coversets as cache misses (We refer to it as Dependency-Based Dispatching, DBD); and (3) the cache rulegeneration algorithm used in CAB[16].

For DBD, it should be noted that in the experiments,we use the Cover-set strategy as a benchmark on thereactive dispatching model, which is different from theuse case in CacheFlow (i.e., in the proactive dispatchingmodel)[12]. If the number of dependent rules triggered

by a packet-in exceeds the cache limit, DBD simplydispatches an exact-match rule.

6.1 Data set and test-bed

The rule sets used for the evaluation include: (1) tworeal-life rule sets described in Section 4, vAccess andvGateway; and (2) another two larger-sized syntheticrule sets in Classbench[19], ACL1-5K and FW1-5K,with 4415 and 4653 five-tuple rules, respectively.

For vAccess, the corresponding traces are capturedat the physical server where the access software switchlocates, with a duration of one hour. For vGateway, thetraces are captured at the virtual gateway with a durationof 15 minutes. For the two synthetic rule sets, we rewritethe trace generation function of ClassBench to generatesynthetic traces that better follow the distributionof real-life traffic (in terms of locality). Concretely,we keep Pareto distribution used in trace generatoralgorithm the same and replace the copy function, fordepicting the temporal locality of packets. Besides, weintroduce Zipf distribution to replace the random-cornerfunction, for better depicting the spacial locality offlows.

On an HP Z220 SFF workstation with 3.40 GHzCPU, 16 GB memory, and 64 bit Ubuntu Server 12.04LTS, we simulate a controller along with a connectedforwarding device. In the controller, the HyperSplit[13]

algorithm is adopted for finding the matching rule ofa “cache miss” packet. In the forwarding device, theLeast Recently Used (LRU) algorithm is adopted forcache rule replacement. Besides, all the source codeof the evaluated rule dispatching algorithms is writtenin C language.

6.2 Experimental results

For the optimization objective of (1) reducing Tavrg D

Thit C �miss � .Ttrans C Tgen/ and (2) satisfying the rulespace constraints in the forwarding devices, we firstdefine the evaluation metrics.� Cache miss ratio (���miss): the percentage of cache

miss packets when testing on the correspondingtrace of a specific rule set.� Average cache generation time (Tgen): the

maximum packet-in rate that guarantees no packetloss in the controller dispatcher.� Average Traffic Coverage Capability (ATCC):

ATCC Dhyper volume.[Rcache/

n(8)

where n is the maxinum rule number of aforwarding device, [Rcache is the union of all

Page 12: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

Chang Chen et al.: HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined Networks 207

hyper-rectangles representing the current cacherules. Note that this metric reflects the coveragerange of cache rules but does not regard trafficlocality.� Effective ATCC (e-ATCC):

ATCC Dhyper volume.[Reffective-cache/

n(9)

where Reffective-cache is the cache rules that areactually hit by traffic.

6.2.1 Effectiveness on reducing Tavrg

The cache miss ratios �miss across a range of rule-cache size are illustrated in Fig. 7. As a benchmark,the exact dispatching method reflects the temporallocality of traffic. According to the results, though DBDbetter reduces the cache miss ratio against the exactdispatching method, the cache capability achieved byCAB and HBD are much more significant, indicatingthe feasibility and benifits of reactive rule dispatching.For example, on the rule set vAccess, cache rulesgenerated by HBD are able to cover 90% traffic withonly 8% cache size. Compared with CAB, we cansee that the smaller the cache size, the better cachecapability HBD achieves than CAB. It is due to the“one cache rule per packet-in” principle of HBD, whichconcentrates on making optimal use of every single rulespace in the forwarding devices.

For calculating Tgen, we measure the throughput ofthe rule dispatchers, i.e., the maximum packet-in ratethat guarantees no packet loss in the dispatcher. Asshown in Table 4, the throughput of the HBD dispatcheris around one million of packets per second (pps),indicating that Tgen � Ttrans. Thus, according to the

Table 4 HBD dispatcher throughput.

Set Throughput (Mpps)ACL1 5K 1.19FW1 5K 0.82vAccess 1.55

vGateway 0.98

analysis in Section 3, HBD guarantees that the benefit(i.e., the significant reduction of �miss) outweighs thecost (i.e., Tgen).

6.2.2 Efficiency on memory utilization offorwarding device

The ATCC and the e-ATCC are used for testing devicememory utilization. Unlike the ATCC calculating allthe dispatched cache rules, the e-ATCC only takes intoconsideration the effective cache rules among them (i.e.,rules that are actually hit by the traffic). For example,the cache rules dispatched by DBD have high ATCC,but only a small proportion of them are actually usefulfor handling traffic, which is indicated by e-ATCC. ForHBD, ATCC equals to e-ATCC because of the “onecache rule per packet-in” principle.

To calculate the ATCC and the e-ATCC for aspecific algorithm, we set the rule-cache size to 10%(a reasonable cache size with both space efficiencyand cache hit rate guaranteed), and take three cachesnapshots during the algorithm processing. The ATCCand the e-ATCC illustrated in Fig. 8 are the averagevalue in the three snapshots. As we can see, thoughthe ATCC of HBD is usually less than that of DBD andCAB (because HBD calculates the accurate “inscribed”

Fig. 7 Cache miss ratio.

Page 13: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

208 Tsinghua Science and Technology, April 2016, 21(2): 196-209

Fig. 8 Traffic coverage capability.

hyper-rectangle representing a cache rule, instead offinding several intact rules or generating a large-coverage bucket), the e-ATCC of HBD is usually largerthan that of DBD and CAB, especially for the rulesets with serious rule overlaps such as FW1-5K andvGateway.

6.2.3 Effectiveness of the greedy cache generationalgorithm

To test the effectiveness of the greedy cache rulegeneration algorithm, we compare the cache rulesgenerated by HBD with the cache rules generated bythe optimal solution presented in Section 3. As shownin Table 5, on the two small-sized real-life rule sets,ACL1 and FW1, ATCC of the HBD-generated rules isclose to ATCC of the optimal rules. The latter takes ourworkstation more than twenty hours to compute.

To conclude, according to the above experimentalresults, the naive exact dispatching approach suffersfrom high cache miss penalty due to the weak trafficcoverage capability of exact-match rules; and thedependency-based algorithm is not suitable for reactivedispatching mode because it usually inevitably cachenumerous dependent rules (even if the cover-sets splicethe dependency chains), most of which are ineffectivefor covering upcoming traffic. Given limited cache size,HBD outperforms CAB in terms of cache hit ratio.Addtionally, to achieve the nice performance, CABrequires the underlying forwarding devices to supporttwo-stage flow table pipeline. In contrast, HBD iscompatible with the legacy forwarding devices withany kind of packet classification approach applied.Moreover, as an additional (but not mandatory) bonusoption, the forwarding devices can leverage the order-

Table 5 ATCC: Greedy vs. optimal.

Method ATCC on ACL1 ATCC on FW1Greedy 15.25 14.85Optimal 16.40 15.50

independent attributes of cache rules dispatched byHBD to further optimize their rule lookup processes.

7 Conclusion

In this work, we take a deep-dive to the reactive ruledispatching problem in software-defined networks, andpropose the HBD approach for optimizing the rules tobe dispatched. Compared with existing approaches, thestrategies used in HBD enable the network to achievebetter performance in terms of transmission latency anddevice memory utilization.

The code of HBD and the trace generation functionwe used for evaluation will be available on ourwebsite[26] to encourage more intensive research in thisarea.

References

[1] ONF Market Education Committee, Software-definednetworking: The new norm for networks, ONF WhitePaper, 2012.

[2] M. Yu, J. Rexford, M. J. Freedman, and J. Wang, Scalableflow-based networking with DIFANE, ACM SIGCOMMComputer Communication Review, vol. 41, no. 4, pp. 351–362, 2011.

[3] M. Moshref, M. Yu, A. Sharma, and R. Govindanm, vcrib:Virtualized rule management in the cloud, in Proc. NSDI,2013.

[4] Y. Kanizo, D. Hay, and I. Keslassy, Palette: Distributingtables in software-defined networks, in Proc. INFOCOM,2013, pp. 545–549.

[5] N. Kang, Z. Liu, J. Rexford, and D. Walker, Optimizing theone big switch abstraction in software-defined networks, inProc. CoNEXT, 2013, pp. 13–24.

[6] K. C. Claffy, Internet traffic characterization, PhDdissertation, UCSD, CA, USA, 1994.

[7] M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. Gude, N.McKeown, and S. Shenker, Rethinking enterprise networkcontrol, IEEE/ACM Transactions on Networking, vol. 17,no. 4, 1270–1283, 2009.

[8] T. Koponen, M. Casado, N. Gude, J. Stribling, L.Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue,T. Hama, et al., Onix: A distributed control platform forlarge-scale production networks, in Proc. OSDI, 2010, vol.10, pp. 1–6.

[9] Y. Hassas and Y. Ganjali, Kandoo: A framework forefficient and scalable offloading of control applications, inProc. HotSDN, 2012, pp. 19–24.

[10] A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula,P. Sharma, and S. Banerjee, DevoFlow: Scalingflow management for high-performance networks, ACMSIGCOMM Computer Communication Review, vol. 41, no.4, pp. 254–265, 2011.

[11] K. Kogan, S. Nikolenko, O. Rottenstreich, W. Culhane,and P. Eugster, SAX-PAC (scalable and expressive packetclassification), in Proc. SIGCOMM, 2014, pp. 15–26.

Page 14: HBD: Towards Efficient Reactive Rule Dispatching in ...security.riit.tsinghua.edu.cn/share/ChenChang-Tsinghua2016.pdf · HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined

Chang Chen et al.: HBD: Towards Efficient Reactive Rule Dispatching in Software-Defined Networks 209

[12] N. Katta, O. Alipourfard, J. Rexford, and D. Walker,Infinite cacheflow in software-defined networks, in Proc.HotSDN, 2014, pp. 175–180.

[13] Y. Qi, L. Xu, B. Yang, Y. Xue, and J. Li, Packetclassification algorithms: From theory to practice, in Proc.INFOCOM, 2009, pp. 648–656.

[14] Q. Dong, S. Banerjee, J. Wang, and D. Agrawal,Wire speed packet classification without tcams: A fewmore registers (and a bit of logic) are enough, ACMSIGMETRICS Performance Evaluation Review, vol. 35,no. 1, pp. 253–264, 2007.

[15] Y. Ma, S. Banerjee, S. Lu, and C. Estan, Leveragingparallelism for multi-dimensional packet classificationon software routers, ACM SIGMETRICS PerformanceEvaluation Review, vol. 38, no. 1, 227–238, 2010.

[16] B. Yan, Y. Xu, H. Xing, K. Xi, and H. J. Chao, CAB: Areactive wildcard rule caching system for software-definednetworks, in Proc. HotSDN, 2014, pp. 163–168.

[17] P. Gupta and N. McKeown, Packet classification usinghierarchical intelligent cuttings, in Proc. Hot InterconnectsVII, 1999, pp. 34–41.

[18] S. Singh, F. Baboescu, G. Varghese, and J. Wang, Packetclassification using multidimensional cutting, in Proc.

SIGCOMM, 2003, pp. 213–224.[19] Evaluation of packet classification algorithms,

http://www.arl.wustl.edu/ hs1/PClassEval.html, 2015.[20] X. Wang, Y. Qi, Z. Liu, and J. Li, LiveCloud: A lucid

orchestrator for cloud datacenters, in Proc. CloudCom,2012, pp. 341–348.

[21] X. Wang, Z. Liu, B. Yang, Y. Qi, and J. Li, Tualatin:Towards network security service provision in clouddatacenters, in Proc. ICCCN, 2014, pp. 1–8.

[22] E. Al-Shaer, H. Hamed, R. Boutaba, and M. Hasan,Conflict classification and analysis of distributed firewallpolicies, Selected Areas in Communications, IEEE Journalon, vol. 23, no. 10, pp. 2069–2084, 2006.

[23] B. Vamanan, G. Voskuilen, and T. Vijaykumar, Efficuts:Optimizing packet classification for memory andthroughput, ACM SIGCOMM Computer CommunicationReview, vol. 41, no. 4, pp. 207–218, 2010.

[24] X. Wang, C. Chen, and J. Li, Replication free rule groupingfor packet classification, ACM SIGCOMM ComputerCommunication Review, vol. 43, no. 4, pp. 539–540, 2013.

[25] Open vSwitch 2.3.1, http://openvswitch.org/, 2015.[26] NSLab, http://security.riit.tsinghua.edu.cn/, 2015.

Chang Chen received the BEng degreefrom Tsinghua University, China, in 2015.He is now a master student in Departmentof Automation at Tsinghua University. Hisresearch interests include software-definednetworking, high-performance algorithmsin computer networking, and systemarchitectures.

Xiaohe Hu received the BEng dgree fromTsinghua University, China, in 2014. Heis now a PhD candidate in Department ofAutomation at Tsinghua University. Hisresearch interests include software-definednetworking and cloud datacenter networks.

Jun Li received the BEng amd MEngdegrees from Tsinghua University, China,in 1985 and 1988, respectively, and thePhD degree from New Jersey Institute ofTechnology, in 1997. He is currently aprofessor and the dean at the ResearchInstitute of Information Technology, andthe executive vice dean of the School

of Information Science and Technology, Tsinghua University.He was a board director of XML Global and is currently aboard member or advisor to several venture capital and startupcompanies, including GigaDevice and Agate Logic. He is alsoa deputy director of the Tsinghua National Lab for InformationScience and Technology. He is a co-author of more than 100papers, and co-inventor of 10 patents. He is also a managingdirector of an angle fund Versatile Venture Capital. His research

interests include network security, pattern recognition, and imageprocessing.

Kai Zheng received the BS degreefrom Beijing University of Posts andTelecommunications, China, in 2001,and the PhD degree from TsinghuaUniversity, China, in 2006. From then onhe joined IBM China Research Lab as aresearch staff member on computer systemarchitecture. His research interests include

cloud networking, software-defined networking, and software-defined protocol.

Xiang Wang received the BS degreefrom Xidian University, China, in 2007,and the MS degree from Universityof Science and Technology of China,China, in 2010. He is working towardthe PhD degree at the Department ofAutomation, Tsinghua University, China.His research interests include software-

defined networking, distributed system, and performance issuesin computer networking, and system architectures.

Yang Xiang received the BS degreefrom Jilin University, China, in 2008,and the PhD degree from TsinghuaUniversity, China, in 2013. He is currentlya postdoctor in the Network SecurityLab, Research Institute of InformationTechnology, Tsinghua University. Hisresearch interests include software defined

network, network architecture, inter-domain routing security, andintrusion detection.