16
Panopticon: Reaping the Benefits of Partial SDN Deployment in Enterprise Networks Dan Levin Marco Canini Stefan Schmid Anja Feldmann Technische Universität Berlin / Deutsche Telekom Laboratories Bericht-Nummer: 2013-04 ISSN-Nummer: 1436-9915

Panopticon: Reaping the Benefits of Partial SDN - TU Berlin

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Panopticon: Reaping the Benefits of Partial SDNDeployment in Enterprise Networks

Dan LevinMarco Canini

Stefan SchmidAnja Feldmann

Technische Universität Berlin / Deutsche Telekom Laboratories

Bericht-Nummer: 2013-04 ISSN-Nummer: 1436-9915

Panopticon: Reaping the Benefits of Partial SDNDeployment in Enterprise Networks

Dan Levin Marco Canini Stefan Schmid Anja FeldmanTU Berlin / T-Labs

<first name>@net.t-labs.tu-berlin.de

ABSTRACTThe operational challenges posed in enterprise networks,present an appealing opportunity for the software-definedorchestration of the network (SDN). However, the primarychallenge to realizing solutions built on SDN in the enter-prise is the deployment problem. Unlike in the data-center,network upgrades in the enterprise start with the existing de-ployment and are budget and resource-constrained.

In this work, we investigate the prospect for partial Soft-ware Defined Network (SDN) deployment. We present Pan-opticon, an architecture and methodology for planning andoperating networks that combine legacy and upgraded SDNswitches. Panopticon exposes an abstraction of a logicalSDN in a partially upgraded legacy network, where the SDNbenefits extend potentially over the entire network.

We evaluate the feasibility of our approach through sim-ulation on real enterprise campus network topologies entail-ing over 1500 switches and routers. Our results suggest thatwith only a handful of upgraded switches, it becomes possi-ble to operate most of an enterprise network as a single SDNwhile meeting key resource constraints.

1. INTRODUCTIONMid to large enterprise campus networks present com-

plex operational requirements: The network must oper-ate reliably and provide high-performance connectivitywhile enforcing organizational policy. It must also pro-vide isolation across complex boundaries, yet remaineasy to manage. All the while, operational and capitalcosts must be kept low.

In the face of these requirements, numerous opera-tional challenges threaten high availability and lead toincreased costs: Policy changes and resource realloca-tions lead to reconfigurations, often scattered acrossthe network. This demands coordination among mul-tiple human operators to avoid inconsistencies and hu-man reasoning to ensure correctness of the resulting up-date. In personal correspondence, a network operatorremarked that “for the fear of breaking policy, humanoperators are reluctant to ever remove existing rules.”Troubleshooting is an expensive, time-consuming activ-ity for network operators. A recent survey [32] shows

that 56.2% of reported network tickets take over 30 min-utes to resolve, and 35% of surveyed operators receivemore than 100 tickets per month. Scheduled mainte-nance poses an additional challenge: The network pro-vides no means for the operator to express that certaindevices will be unavailable for a period of time. There-fore, he must reason about the consequences of reconfig-uring the network for device removal, the correctness ofthe order in which he performs the actions, and manu-ally introduce the correct configuration changes to mit-igate the impact of service.

Software Defined Networking (SDN) has the poten-tial to provide a principled solution to the orchestrationof these challenging tasks. SDN is a paradigm that of-fers a programmatic, logically-centralized interface forspecifying network behavior via direct control of theswitch hardware forwarding state. Through this inter-face, a software program acts as a network controller bywriting forwarding rules into switch tables and reactingto topology and traffic changes. Therefore, SDN rep-resents a clear opportunity over manual, error-prone,ad-hoc approaches to handling the above tasks.

However, most of the existing work leveraging SDN(e.g., [4,9,13,27]) has so far assumed a full deploymentof SDN switches. Rather than a green field, networkupgrade starts with the existing deployment and is typ-ically a staged process —budgets are constrained, andonly a part of the network can be upgraded at a time.SDN deployment in the enterprise is no exception.

The realities of network upgrade and the operationalchallenges facing existing networks lead us to question:(i) What are the benefits of upgrading to a partial SDNdeployment? (ii) How do the benefits of principlednetwork orchestration depend on the location of SDNswitches? (iii) Given budget constraints, what subsetof legacy switches should be SDN upgraded to maximizebenefits?

To answer these, we present Panopticon, an ar-chitecture and methodology for aiding operators inplanning and operating networks that combine legacyswitches and routers and SDN switches. We call suchnetworks transitional networks. Panopticon overcomes

1

Panopticon SDN + Legacy

SDN Platform

App 1

App 2

App 3

SDN Platform

Legacy Mgmt

App 1

App 2

App 3

(b) Access edge

SDN Platform

Legacy Mgmt ?

(a) Dual-stack (c) Panopticon

SDN switches:

End-point ports: Hybrid SDN switches:

Switches:

Figure 1: Current transitional network approaches vs.Panopticon: (a) Dual-stack ignores legacy and SDN in-tegration. (b) Full edge SDN deployment enables end-to-end control. (c) Panopticon partially-deployed SDNyields an interface that acts like a full SDN deployment.

the limitations of current approaches for transitionalnetworks, which we now briefly review.

1.1 Current Transitional NetworksThe first approach (Figure 1a) partitions the flow

space into several disjoint slices and assigns each sliceto either SDN or legacy processing [20]. Individualtraffic flows of interest may be explicitly selected forSDN processing. This mode’s prime limitation is thatit is essentially a dual-stack approach (as with IPv6 +IPv4) rather than a means to integrate legacy hard-ware and expose the resulting transitional network asSDN. Further, this approach necessitates a contiguousdeployment of hybrid programmable switches capable ofprocessing packets according to both legacy and SDNmechanisms, i.e., those switches studied under the ONFHybrid Working Group [24].

The second approach (Figure 1b) involves deployingSDN at the network access edge [6]. This mode hasthe benefit of enabling full control over the access pol-icy and the introduction of new network functionalityat the edge, e.g., data-center network virtualization [1].Unlike a data-center environment where the networkedge may terminate at the VM hypervisor, the cam-pus network edge terminates at an access switch. In anenterprise network, this approach thus involves upgrad-ing thousands of access switches and incurs a high cost.SDN deployment limited to the edge additionally im-pairs the ability to control forwarding decisions withinthe core of the network (e.g., load balancing, waypointrouting).

1.2 PanopticonPanopticon encompasses (i) a methodology for de-

termining the cost-aware partial deployment of SDNswitches for specific operational objectives and (ii) anarchitecture for operating transitional networks as SDN.

Our main insight is that the key benefits of the SDNabstraction to enterprise networks can be realized forevery source-destination path that includes at least oneSDN switch. Thus, we do not mandate a full SDNdeployment—a relatively small subset of all switchesmay suffice. Each path which traverses even just oneSDN switch, can be used to realize a programmatic,logically-centralized interface for orchestrating e.g., thenetwork access control policy. Moreover, traffic whichtraverses two or more SDN switches may be controlledat even finer levels of granularity enabling further cus-tomized forwarding decisions e.g., for load balancing.

Based on this insight, we first develop a cost-awareoptimization framework as a tool for the network opera-tor to determine the location of the partial SDN deploy-ment based on their objectives (e.g., capex or forward-ing efficiency). Second, we design Panopticon whichprovably guarantees that traffic destined to operator-selected end-points passes through at least one SDNswitch. Just as enterprise networks regularly divert traf-fic (e.g., one VLAN to reach another on the same switchmust traverse a gateway), Panopticon explicitly lever-ages waypoints to control traffic.

As opposed to the dual-stack approach, Panopticon(Figure 1c) fundamentally integrates legacy and SDNswitches yielding an abstraction of a logical SDN to thecontrol platform. As we reason later (§ 7), many SDNcontrol paradigms can be achieved. Panopticon enablesthe expression of any end-to-end policy, as though thenetwork were one big, virtual switch. End-to-end poli-cies include access control and application load balanc-ing. Routing and path-level policy, e.g., traffic engi-neering can be expressed too, however the fidelity ofthe global network view (and path diversity) presentedto the control logic is reduced to the fidelity of the log-ical SDN. As more of the network is upgraded to sup-port SDN, more fine-grained path-level policy can beexpressed. In a sense, Panopticon generalizes a fabricdeployment [6], which is impractical in the enterprisecontext where the edge is so large and embroiled inlegacy equipment.

The namesake of our approach is inspired by thePanopticon prison architecture, in which prisoners areconfined to cell-blocks, observable and controlled fromstrategic vantage points. Analogous to this prison, weisolate end-hosts in the legacy network using VLANs (inwhat we term Solitary Confinement Trees or SCTs) andrestrict their traffic to traverse strategically upgradedSDN programmable switches.1 Figure 2 illustrates theforwarding in Panopticon which we revisit in § 3.

We face the challenges of (i) the need to maintaincompatibility with legacy switches and protocols, and(ii) scalability issues with VLAN and flow table state.Panopticon overcomes these challenges by (i) relying

1See § 2.3 for background on OpenFlow, an SDN platform.

2

on features such as VLANs ubiquitously available onenterprise-grade switches, as confirmed through our op-erator survey (§ 2), and (ii) a rigorous design method-ology that considers VLAN and flow table constraintsyielding a careful assignment of VLAN IDs that allowsus to create a spanning tree for every end-point in ascalable fashion.

1.3 Research ContributionsIn summary, Panopticon enables us to expose an in-

terface for operating a transitional network as if it werea nearly fully deployed SDN and reap the benefits ofpartial SDN deployment for most of the network, notjust the part that is upgraded. Through a rigorously de-signed SDN deployment that takes control over the un-derlying legacy resources, Panopticon is more than justan ad-hoc tunneled SDN overlay. The novelty of ourarchitecture lies in the way we overcome the challengeof combining existing mechanisms readily available inlegacy switches. We make the following contributions:

1. We design a network architecture for operatinga partially upgraded network as an SDN (§ 3).Also, we study the interaction mechanisms be-tween legacy and upgraded switches (§ 4). We alsoinclude proofs for the correctness of our approach(Appendix).

2. We formalize the problem of determining the opti-mal upgrade location as a mixed-integer optimiza-tion program (§ 5) and we devise efficient algo-rithms to solve it (§ 5.4).

3. We evaluate our approach using real enterprisenetwork topologies (with over 1500 switches) andtraffic traces (§ 6). We find that with only a hand-ful of switches, it becomes possible to operate mostof an enterprise as a single SDN while meeting allpractical resource constraints.

4. We demonstrate our system-level feasibility with aprototype implementation in (§ 6.5).

To motivate our problem formulation we conduct asurvey of network operators, which we now present.

2. ENTERPRISE NETWORK SURVEYCommercial deployment of SDN started within data-

centers. The next steps are enterprise and/or ISP net-works. In this paper we focus on mid to large enter-prise campus networks, i.e., networks serving hundredsto thousands of users, whose infrastructure is physicallylocated at a campus site. We choose this environmentdue to its challenging complexity as well as the impactpotential that SDN network orchestration promises.

Cell Blocks

Path(A,B)

Frontier(C)

SCT(A) SCT(C)

Inter-Switch Mesh Path

A SCT(B)

Path(A,C)

B

C

Frontier(A)

Frontier(B)

Figure 2: The forwarding path between A and B goesvia the frontier shared by SCT (A) and SCT (B); thepath between A and C goes via an Inter-Switch Meshpath connecting SCT (A) and SCT (C).

2.1 Enterprise Network Operator SurveyTo support our key assumptions on the challenges and

operational objectives within enterprise campus net-works, we conducted two on-site interviews with op-erators from both large (≥10,000 users) and medium(≥500 users) enterprise networks. We then solicited 60responses via e-mail to open-answer survey questionsfrom a wider audience of 60 network operators.

The top three responses to, “what is the most impor-tant technical, operational objective you must achievein your network” were: (1) That it “just works”, im-plying basic, usable IP connectivity for employees toconduct their business, (2) strict traffic filtering andperimeter protection against intrusion prevention andsystem exploit, and (3) minimize operational costs andcomplexity, e.g., by maintaining homogeneous hardwaredeployments with vendor-specific management tools.

The top response to“what is the most demanding op-erational challenge faced in your network”was related tomaintaining a consistent view of the network’s physicaland configuration state. When asked whether networkcapacity bottlenecks present problems, no operator in-dicated that his or her network experienced such issues.

2.2 Legacy Network AssumptionsConsequently, based on our operator conversations,

and in conjunction with several design guidelines (e.g.,see [7, 15]), we make the following assumptions aboutmedium to large enterprise campus networks and hard-ware capabilities. Enterprise network hardware consistsprimarily of Ethernet bridges, namely, switches thatimplement standard L2 mechanisms (i.e., MAC-basedlearning and forwarding, and STP) and support VLAN(specifically, 802.1Q and per VLAN STP). Routers orL3 switches are used as gateways to route betweenVLANs-isolated IP subnets. For our purposes, we as-sume an L3 switch is also capable of functioning as a L2switch. In addition, we assume that no enterprise in-tentionally operates“flood-only”hub devices for generalpacket forwarding.

3

2.3 Background on OpenFlowThe current de-facto standard SDN platform is Open-

Flow [20]: it defines a switch model and an API to itsforwarding tables, as well as a protocol that exposes theAPI to a controller program. OpenFlow provides us areference model to reason about the types of forward-ing behaviors that an upgraded switch may practicallyimplement. Henceforth, we assume that SDN switchesadhere to the OpenFlow programmable switch model,which we now briefly review.

OpenFlow specifies that switches have flow tableswhich store a list of rules for processing packets. Eachrule consists of a pattern (matching on packet headerfields), actions (such as forwarding, dropping, sendingto the controller, etc.), a priority (to distinguish be-tween overlapping patterns), counters (to measure bytesand packets processed so far), and a timeout (indicatingif/when the rule expires). An OpenFlow switch thenmatches every incoming packet against the flow tablerules. Whenever there is a match, the switch selectsthe highest-priority matching rule, updates the coun-ters, and performs the specified action(s). If there is nomatching rule, the switch sends the packet (in full orjust its header) to the controller and awaits a responseon what actions to take.

3. PANOPTICON SDN ARCHITECTUREThis section presents our architecture for partially-

deployed software defined networks. We extend theSDN capabilities to legacy switchports by ensuring thatevery pair of SDN controlled legacy switchports are re-stricted to communicate over an end-to-end path thattraverses at least one SDN switch. We call this property,the Waypoint Enforcement policy. The WaypointEnforcement policy can however be violated if legacydevices are allowed to make standard forwarding deci-sions (i.e., based on destination MAC address).

To guarantee, Waypoint Enforcement we mustchoose a forwarding set that constrains the space of pos-sible forwarding decisions in such a way that the trafficalways follows safe end-to-end paths. In addition, wemust do so using only existing mechanisms and featuresreadily available on legacy switches, since these switchesare not being upgraded. We start by formally introduc-ing the concepts of Waypoint Enforcement, for-warding set and safe path.

3.1 The Transitional Network ModelWe define a transitional network as G = (N , E). G is

connected, and consists of a set of nodes N which arepartitioned into end-points Π, legacy switches L andSDN switches S, i.e., N = Π t L t S, and a set ofundirected links E between two elements of N .

We represent a end-to-end path traversed by pack-ets from source end-point s ∈ Π to destination end-

point t ∈ Π as a list of traversed links p(s, t) =(e1 = {s, u1}, e2 = {u1, u2}, . . . , ek = {uk−1, t}), whereu1, . . . , uk−1 ∈ LtS. To express paths between switchesrather than between end-points, we simply overload thedefinition of path p(s, t) with s, t ∈ L t S.

We define the reachability matrix R where Rs,t =1 iff end-point pair (s, t) is allowed to communicate,otherwise 0. From all possible paths between end-points(s, t) in G, we denote with FS(s, t) the subset of utilizedpaths subject to Rs,t, which we call the forwarding set.

The set of end-points Π is further subdivided intoSDN-policed end-points Π• and non SDN-policed end-points Π◦. We call a SDN-policed end-point a SDN port.In Panopticon, we want to ensure that traffic from orto a SDN port can only reach another end-point (re-gardless of its type) via a SDN switch. To this end, weintroduce the concept of Waypoint Enforcement.

Definition 1 (Waypoint Enforcement).Waypoint Enforcement requires that every path in theforwarding set FS(s, t) includes at least one SDNswitch. Formally, ∀ s, t ∈ Π• ∀ p ∈ FS(s, t)∃ u ∈ S : u ∈ p .

Henceforth, we call an end-to-end path safe iff it sat-isfies the Waypoint Enforcement.

3.2 Selecting the Forwarding SetWe next show how to construct the forwarding set

using VLANs i.e., 802.1Q, to isolate and constraintraffic in the legacy network to safe paths, ultimatelyachieving Waypoint Enforcement. To conceptu-ally illustrate how VLANs restrict forwarding to usesafe paths in a transitional network, we first consider astraightforward, yet impractical scheme: For every pairof SDN ports (i.e., end-points subject to waypoint en-forcement), choose one SDN switch as a waypoint, andcompute the (shortest) end-to-end path that includesthe waypoint. Next, assign a unique VLAN ID to ev-ery end-to-end path and configure the legacy switchesaccordingly. This ensures that all forwarding decisionsmade by every legacy switch only send packets alongsafe paths.

Due to practical constraints, this simple solution isinfeasible as the VLAN ID space is limited to 4096 val-ues. Indeed, VLAN ID space may be smaller yet, due toswitch hardware limitations. Thus, the simple solutionsupports at most 64 hosts (in a full mesh) before deplet-ing all available VLAN IDs. Moreover, end-point portsmust operate in “access mode” with a single VLAN IDas end-host interfaces may not support or be trusted tocorrectly use 802.1Q trunking.

A naive work around is to assign a single designatedSDN switch per SDN port. However, this rigid solu-tion limits the opportunity to use the best path to the

4

6

2

7

4

3 1 5 A

B C

D

E F

2

4

A D

B C

F E

(a) Physical (b) Logical

Cell Blocks

Pseudo-wires:

SCTs:

Inter-Switch Mesh paths:

SDN ports:

Cell Blocks

Figure 3: An example transitional network of 7 switches(SDN switches are shaded). (a) Shows the SCTs (Soli-tary Confinement Trees) of every SDN ports overlaidto the physical topology. (b) Presents the correspond-ing logical view in which all SDN ports are virtuallyconnected to SDN switches via pseudo-wires.

destination according to distance or load. Moreover, itcreates immense reliability concerns.

3.3 Solitary Confinement Trees (SCTs)To construct the forwarding set in Panopticon, we

introduce the concept of Solitary Confinement Tree(SCT) to provide end-to-end path diversity while en-suring isolation and a parsimonious use of VLAN IDs.We first describe the terms Cell Blocks and Frontierand then formally define the SCT, using an example toillustrate.

Definition 2 (Cell Blocks).Given a transitional network G, Cell Blocks CB(G) ={c1, . . . , ck} is defined as the set of connected compo-nents of the network G′ obtained after removing from Gthe SDN switches S and their incident links. Formally,G′ = (Π t L, E ′), where E ′ = E \ {e = {u1, u2} | ∃u ∈e : u ∈ S}.

Definition 3 (Frontier).Given a cell block c ∈ CB(G), we define the FrontierF(c) as the subset of SDN switches that are adjacent inG to a switch in c.

Intuitively, the solitary confinement tree is a spanningtree within a cell block, plus its frontier. The principlerole of each SCT is to provide a safe path from eachSDN port π to every SDN switch in its frontier. Wecan then assign a single VLAN ID to each SCT whichensures traffic isolation and overcomes the limitationof binding each SDN port to a single SDN switch. Weintroduce an algorithm for choosing VLAN IDs in § 4.2.Formally we define SCT as:

Definition 4 (Solitary Confinement Tree).Let c(π) be the cell block to which an SDN port π ∈ Π•

belongs. And let STP(c(π)) denote the STP-computedspanning tree on c(π). Then, the Solitary ConfinementTree SCT(π) is the network obtained by augmentingSTP(c(π)) with the upgraded frontier F(c(π)), togetherwith all links in G connecting a switch u ∈ F(c(π))with a switch in SCT(π).

Example. Let us consider the example transitionalnetwork of seven switches in Figure 3a. In this example,SCT (A) is the tree that consists of the paths 5→ 1→ 2and 5 → 1 → 4. Instead note that SCT (B), whichcorresponds to the path 6 → 2, includes a single SDNswitch because switch 2 is the only SDN switch adjacentto cell block c(B). Figure 3b shows the correspondinglogical view of the physical transitional network enabledby having SCTs. In this logical view, every SDN port isconnected to at least one SDN switch via a pseudo-wire.

3.4 Packet Forwarding in PanopticonWe now illustrate Panopticon’s basic forwarding be-

havior (Figure 2). Let us first consider traffic betweena pair of SDN ports s and t. When a packet from s en-ters SCT(s)’s VLAN, the legacy switches forward thepacket to the frontier based on MAC-learning which es-tablishes a symmetric path. Note, that a packet froms may use a different path within SCT(s) to the fron-tier for each distinct destination. Once traffic toward treaches its designated SDN switch u ∈ F(c(s)), one oftwo cases arises:SDN switches act as VLAN gateways: This is thecase when the destination SDN port t belongs to a cellblock whose frontier F(c(t)) shares at least one switch uwith F(c(s)). Switch u acts as the designated gatewaybetween SCT(s) and SCT(t): That is, u rewrites theVLAN tag and places the traffic within SCT(t). Forinstance, in the example of Figure 3a, switch 2 acts asthe gateway between ports A, B and C.Inter-Switch Mesh (ISM) and path diversity:When no SDN switch is shared, we use an Inter-SwitchMesh (ISM) path: point-to-point, VLAN-based tunnelsthat realize a full mesh between SDN switches. In thiscase, the switch u chooses one of the available pathsto forward the packet to an SDN switch w ∈ F(c(t)),where w is the designated switch for the end-to-endpath p(s, t). In our example of Figure 3a, ISM pathsare shown in gray and are used e.g., for traffic from Bor C to E or F , and vice versa.

As in any SDN, the control platform is responsible forinstalling the necessary forwarding state according tothe reachability matrix M and for reacting to topologychanges (fault tolerance is discussed in § 4.4).

We next turn to the forwarding behavior of non SDNports. Again, we distinguish two cases. First, when

5

there exists a path in the legacy network between twonon SDN ports, forwarding is performed as usual andis unaffected by the partial SDN deployment. Policyenforcement and other operational objectives must beimplemented through traditional means, e.g., ACLs.

In the second case, anytime a path between two nonSDN ports necessarily encounters an SDN switch, theSDN mechanism can be leveraged to police the traffic.This is also the case for all traffic between any pair ofSDN and non SDN ports. In other words, Panopticonalways guarantees safe paths for packets from or to ev-ery SDN port. We formally prove this property in theAppendix.

3.5 Architecture DiscussionHaving described all components of the architecture,

we now highlight the key properties of SCT and ISM.A VLAN ID per SCT is scalable. Using SCTs isinherently more scalable than using one VLAN ID oneach end-to-end path since we parsimoniously use justone VLAN ID per SDN port. In addition, our schemedoes not preclude using a different path within the SCTfor every destination ingress port, granting more flex-ibility over using a single designated switch per SDNport.VLAN IDs are reused across Cell Blocks. In ad-dition, we observe that SCTs allow us to reuse VLANIDs because any VLAN ID can be used once in each cellblock independently. This is because SDN switches ef-fectively act as gateways between cell blocks and modifyVLAN IDs accordingly.SCTs can be statically precomputed. We observethat potentially, there is only a one time cost to computeall SCTs (of course, re-computation is needed wheneverswitches are added or removed). Also, it is possible toautomatically produce configuration settings to correctlysetup each legacy switch without requiring any manual,tedious and error-prone effort.ISM path diversity trade-offs. Within the ISM,there may be multiple paths between any given pair ofSDN switches. We expect that some applications mayrequire a minimum number of paths. For example, aminimum of two disjoint paths is necessary, to toler-ate single link failures. On the other hand, each pathconsumes a VLAN ID from the ID space of every tra-versed cell block. Henceforth in this paper, we limitourselves to a single shortest path per switch pair (un-less we specify otherwise).2 Path control over the meshis exercised by the logically centralized SDN controllerwhich we describe next.

3.6 Realizing SDN Benefits

2Readers familiar with the proposal in [6] for a fabric ab-straction for SDN, will notice that our ISM is an instance ofsuch fabric, that spans over legacy devices.

By now, we have described how Panopticon shiftsthe active network management burden away from thelegacy devices and onto the upgraded SDN deployment.This conceptually reduces the network to a logical SDNas presented in Figure 3b. The transitional network,as an SDN, stands to benefit in all the ways as pre-vious work has demonstrated—dynamic policy enforce-ment [4], consistent policy updates [27], network behav-ior customization and debugging [13,14,31], etc.

Putting it all together, Panopticon is the first archi-tecture that realizes an approach for operating a transi-tional network as though it were a fully deployed SDN,yielding the benefits of partial SDN deployment for theentire network, not just the part that is upgraded.

4. LEGACY SWITCH INTERACTIONNext, we consider the interaction of upgraded

switches with legacy switches. Accordingly, we discusshow the interaction with STP works, as well as howto choose VLAN IDs, cope with broadcast traffic, andtolerate failures.

4.1 Interacting with STPThe Spanning Tree Protocol (STP) or a variant such

as Rapid STP, is commonly used to achieve loop free-dom within L2 domains and we interact with STP intwo ways. First, to ensure network-wide loop freedomfor traffic from non SDN ports, SDN switches behave asordinary STP participants. That is, the SDN controllerimplements STP and coordinates the computation anddistribution of STP messages on every SDN switch.

Second, within each SCT, we run a per-VLAN span-ning tree protocol (e.g., Multiple STP) rooted at theSCT ingress port’s switch. For this STP instance, eachSDN switch passively listens to learn the least-cost pathto the SCT’s ingress port, but does not reply with anySTP messages. Collectively, this behavior guaranteesthat each SCT is loop free and fault tolerant via theexisting STP failover mechanisms.

4.2 Deploying VLANsThe configuration of VLANs and the assignment of

VLAN IDs can be computed efficiently in Panopticon:For each cell block c ∈ CB(G), we compute all SCT(π)(for each SDN port π ∈ Π•), and use one VLAN IDper SCT. As noted before (§ 3.3), VLAN IDs can bereused across different cell blocks. Subsequently, we addthe VLANs for the Inter-Switch Mesh (ISM). For eachswitch pair connected in the ISM by a path p, we useone VLAN whose ID is the smallest available identifierin the cell blocks traversed by p.

4.3 Coping with Broadcast TrafficBroadcast traffic can be a scalability concern. We

take advantage of the fact that each SCT limits the

6

broadcast domain size, and we rely on SDN capabilitiesto enable in-network ARP and DHCP proxies as shownin [17]. We focus on these important bootstrappingprotocols as it was empirically observed that broadcasttraffic in enterprise networks is primarily contributedby ARP and DHCP [17,25].

Last, we note that in the general case, if broadcasttraffic must be supported, the overhead that Panopticonproduces is proportional to the number of SCTs in a cellblock which, at worst, grows linearly with the numberof SDN ports of a cell block.

4.4 Tolerating FailuresWith Panopticon, we decompose fault tolerance into

three orthogonal aspects.Reusing existing STP mechanisms. Within anSCT, Panopticon relies on standard STP mechanismsto survive link failures, although to do so, there mustexist sufficient physical link redundancy in the SCT.The greater the physical connectivity underlying theSCT, the higher the fault-tolerance. Additionally, thecoordination between SDN controller and legacy STPmechanisms allows for more flexible fail-over behaviorthan STP alone.

When an SDN switch at the frontier F of SCT noticesan STP re-convergence, the SDN controller adapts theforwarding decisions at F ’s SDN switches to restore per-end-point connectivity as necessary. This may involveassigning a given end-point to a different frontier switchto restore its connectivity, or to re-balance traffic. Asimilar scheme can address link failures within the InterSwitch Mesh (ISM).SDN switch and link failure. When SDN switchesand/or their incident links fail, the SDN controller re-computes the forwarding state and installs the necessaryflow table entries. Furthermore, precomputed failoverbehavior can be leveraged as of OpenFlow version 1.1.Finding an efficient and lightweight solution is part ofour ongoing work.SDN controller robustness. Third, the SDN con-trol platform must be robust and available. To this re-spect, previous work [18] demonstrates that well-knowndistributed systems techniques effectively achieve thisgoal, although the implications of inconsistent networkviews remain under-explored [19].

5. COST-AWARE SDN DEPLOYMENTIn principle, the Waypoint Enforcement policy

can be obtained with a single SDN switch by forward-ing all traffic through this switch. However, this so-lution is untenable for several reasons, including: (i)many end-to-end paths may become excessively long,(ii) bottleneck links may experience severe congestion,(iii) the forwarding state, bandwidth, and performancerequirements of the SDN switch may be stretched be-

yond feasible limits, and (iv) it introduces a single pointof failure into the network.

Therefore, in this section we show how to upgrade toan SDN network in stages in a cost-aware and efficientmanner. We present an upgrade planning tool that isgiven an enterprise campus network topology G0 = (ΠtL0, E), where L0 denotes the set of legacy switches. Thegoal of our planning tool is to select a subset of switchesS ⊆ L0 for SDN deployment and return the upgradednetwork G+ = (Π t S t L, E), where L = L0 \ S.

We next describe the parameters, including a simpli-fied switch price model and the properties of desirablesolutions. We then develop an optimal upgrade algo-rithm Opt and devise two heuristic algorithms Degand Vol to solve it efficiently.

5.1 Tunable ParametersTo accommodate different designers’ needs, our tool

exposes several parameters: (i) priorities for upgradingend-points to SDN ports, (ii) switch prices, and (iii)several thresholds to encompass resource constraints ofVLAN IDs, link utilizations, and flow-table entries.SDN Port Priorities. Due to practical constraints(e.g., monetary budget or capacity) it may not be feasi-ble to find a partial SDN deployment where all end-points are SDN ports Π• (i.e., policed by an SDNswitch). Also, a network administrator may not evenwish to police certain end-points Π◦, as others takeprecedence based on specific considerations includingbudget constraints, security, and reachability (Π =Π• t Π◦). Therefore, we allow a designer to specify (i)the end-points that must be SDN ports (Π•1 ⊆ Π•), and(ii) the end-points that, on a best-effort basis, should bemade SDN ports (Π•2 ⊆ Π•). Note that Π• = Π•1 t Π•2.End-points Π◦ are ports that may violate the waypointenforcement policy.Switch Price Model. The price γ(u) of an SDNswitch u ∈ S depends on features and capabilities suchas the number of switch ports Π(u), port speeds Ψ, andswitch fabric latency and capacity. Also, the price de-pends on the number k of distinct traffic flows (e.g.,source-destination MAC pairs or IP 5-tuples) it musthandle: A larger k implies higher TCAM, memory andCPU requirements. Hence, we model the cost of aswitch as the function γ(u) = f(Π(u),Ψ, k). To thisend, we assume that the network designer has a price ta-ble specifying switch prices together with their featuresand capabilities. Our operator survey (cf also Section 2)confirmed the validity of the price assumptions.Link Utilizations. When upgrading the network any-way, the designer may choose to ensure a certain de-gree of link capacity over-provisioning w.r.t. the esti-mated traffic matrix. A tunable safety margin ε canbe used to influence the individual switch upgrade andpath choices to ensure the upgraded network maintains

7

at least an ε percentage of the capacity of all links (orper link) as slack.VLAN IDs and Flow Table Entries. Within eachcell block, one may want to specify a certain upperbound on the percentage of used VLAN IDs. That is,given a maximal number of usable VLAN IDs tmax anda percentage ν, the number of VLAN IDs t used in agiven cell block should not exceed ν · tmax. Similarly,given a maximal flow table capacity ftmax and a thresh-old µ, switches may be chosen for upgrade such that thetable at a new switch will not exceed µ · ftmax.

5.2 Properties of Desirable SolutionsWe want the following properties:

1. Waypoint enforcement: Any end-to-end path be-tween two communicating end-points π1, π2 ∈ Π•

must be SDN-policed if at least one end-point be-longs to Π•1. Paths including an end-point π ∈ Π•2should be policed too if capacity and budget allow.

2. Cost within budget: The total upgrade cost doesnot exceed the given budget β.

3. Feasible: SDN switches have sufficient capacityto support all end-to-end paths assigned to themand expected link utilizations are within tolerablethresholds.

4. Path stretch within limit: As a metric to capturethe impact of Waypoint Enforcement, we definestretch as ρ(s, t) = d+(s, t)/d0(s, t), where d+(s, t)is the length of the path p+(s, t) that satisfies Way-point Enforcement in G+ and d0(s, t) denotes thepath in the original network G0. We require thestretch remain below a tolerable threshold.

5.3 Optimal Upgrade AlgorithmWe now formalize an optimal cost-aware upgrade al-

gorithm Opt based on mathematical programming. Forpresentation sake, we first introduce a path-based for-mulation that ensures Waypoint Enforcement, assum-ing every end-point must be a SDN port (i.e., Π•1 ≡ Π•),but does not take into account traffic matrix volumesnor best-effort SDN ports Π•2 and remaining ports Π◦.We later extend our formulation to include these.

Opt is a Mixed Integer Program (MIP) based on thefollowing constants and variables. Let the binary con-stants li,j be 1 iff there exists a link {i, j} ∈ E in thelegacy network G0. The integer constants d0(s, t) de-note the path lengths (e.g., w.r.t. hops) in G0.

We define the binary variable xs,ti,j that becomes 1 ifflink i to j lies on the path p(s, t) in the transitional net-work G+. We also define yi, a binary decision variablethat is 1 iff switch i is to be upgraded.

One difficulty in the MIP formulation is to describepaths: in contrast to classical programs for shortest path

or multi-commodity flow computations, we require thata path from s ∈ Π• to t ∈ Π• goes via a SDN switch u ∈S, where u is a variable itself. To avoid sacrificing thelinearity of the program, our approach is to introducea binary variable us,ti to determine whether path p(s, t)goes through SDN switch i.Opt can be used with different strategies, depending

on which properties of the upgrade are strictly requiredand which are subject to optimization. In the following,we seek to minimize the total path stretch, subject tothe upgrade budget β (Constraint (2)).

minxs,ti,j ,u

s,ti

∑i,j∈L0;s,t∈Π•

xs,ti,jd0(s, t)

(1)

such that∑i∈L0

γ(i) · yi ≤ β (2)

∀s, t ∈ Π• :

xs,ti,j ≤ li,j , ∀i, j ∈ L0 (3)

∑j∈L0

(xs,ti,j − xs,tj,i ) =

1, if i = s (∀i ∈ L0)

−1, if i = t (∀i ∈ L0)

0, otherwise

(4)

∑i∈L0

us,ti = 1 (5)

us,ti ≤ yi, ∀i ∈ L0 (6)∑j∈L0

(xs,ti,j + xs,tj,i ) ≥ us,ti , ∀i ∈ L0 (7)

∑j∈L0

xs,ti,j ≤ 1;∑j∈L0

xs,tj,i ≤ 1, ∀i ∈ L0 (8)

us,ti = ut,si , ∀i ∈ L0 (9)∑i

xs,ti,j +∑j

xs,tj,i ≤ µi · ftimax, ∀i ∈ L0 (10)

∑π∈Π•∧π∈c

1 ≤ ν · tmax, ∀c ∈ CB(G) (11)

Constraint (3) ensures that a path can only use ex-isting links, and Constraint (4) represents the flow con-servation constraints along the switch paths. Con-straint (5) states that there must be one SDN switch as-signed to each path and Constraint (6) requires that theassigned switch is chosen for upgrade. Constraint (7)ensures that the assigned switch is included in the path;it takes effect when yi = 1. Constraint (8) guaranteesthat a switch is only used once per path (loop-freedom)and Constraint (9) specifies path symmetry (this con-straint can safely be omitted if not required). Con-straint (10) guarantees that the number of used flowtable entries (or MAC table entries for legacy switches)are within the allowed limit. Finally, Constraint (11)

8

ensures that the number of VLAN IDs used in everygiven cell block is within a tolerable limit.

Finally, note that our formulation assumes the ini-tial network consists only of legacy switches. In prac-tice, after the first SDN partial-deployment phase hastaken place one can reuse our planning tool by fixing thevalues of yi to 1 according to the previously upgradedswitches.

5.3.1 Extension to Traffic and Best-Effort SDN PortsWe now introduce traffic-awaress into Opt and then

extend it to account for best-effort SDN ports.Traffic matrix. Let κ(e) denote the bandwidth ofthe link e = {i, j} between switches i, j. We refer tothe traffic matrix as Ms,t, which denotes the estimateddemand between two end-points s, t ∈ Π. The followingconstraint enforces that every link utilization is at mostthe capacity minus safety margin ε:∑

s,t∈Π

xs,ti,j ·Ms,t ≤ (1− ε) · κ({i, j}), ∀i, j ∈ L t S

As with any network planning approach, for betterresults with capacity constraints satisfaction, we sug-gest using a long-term traffic matrix, which we assumeto be sufficiently stable. In an enterprise network, weexpect it to be the norm that the utilization of everylink is monitored (e.g., via SNMP counter) at a finelevel of granularity over a time scale spanning severalmonths. With these measurements one can leverage ex-isting methodologies (e.g., see [21,33]) to obtain a trafficmatrix.Best-effort SDN ports. For end-points in Π•2, we in-troduce the binary variable h(π) that is 1 iff end-pointπ ∈ Π•2 becomes a SDN port. This variable is used todisable Constraints (5) and (7) through an additionalhelper variable defined similarly to us,ti . The objec-tive function is extended to allow a parameter α(≥ 0)to trade-off the number of best-effort ports made SDNagainst path stretch, as follows:

minxs,ti,j ,u

s,ti

∑i,j,s,t

xs,ti,jd0(s, t)

− α ·∑π∈Π•

2

h(π)

Of course, best-effort paths can be further prioritized(e.g., ordered or weighted).

5.4 HeuristicsWhile Opt computes optimal upgrade solutions, the

MIP formulation exhibits a high runtime. Alone, theproblem of finding just the minimal stretch end-to-endpaths through given waypoints is NP-hard [3]. Hence,to provide fast approximate results, we extend the plan-ning tool with two heuristics, called Vol and Deg.

Both heuristics use a greedy strategy, in that Vol andDeg upgrade one switch after another, without back-tracking. Both heuristics differ only in the selectioncriterion of the next switch to upgrade.

Vol is based on the intuition that a switch forwardinglarge volumes of traffic is likely to be on many shortest

paths between end-point pairs, and an upgrade of thisswitch allows us to keep many existing paths even underWaypoint Enforcement. Hence, detours are avoidedand the stretch remains small. To also take into accountswitch costs, Vol chooses the next switch u to upgradeby computing the max volume-cost ratio vol(u)/γ(u),where vol(u) denotes the traffic on physical links inci-dent to u before the Waypoint Enforcement.Deg tries to upgrade switches which are likely to

yield many disconnected components (or cell blocks).This allows to reuse VLAN tags and improves thesystem scalability. Concretely, Deg chooses the nextswitch u to upgrade by computing the best degree-costratio, formally, the ratio max ∆(u)/γ(u), where ∆(u)denotes the number of links incident to u. Indirectly,the hope is that the high-degree switch is at a locationwhich does not change shortest paths by much.

Algorithm 1 gives the pseudo-code for both heuristics.(Since both greedy algorithms only differ in the switchselection, we present them together and highlight theline where they differ.) Note that due to capacity con-straints (e.g., the maximal flow table size at the switchesor the capacity of the links), the first couple of switchesupgraded by Vol and Deg may not yield a feasible so-lution yet. The choice of the next switch ignores theseconstraints: The algorithms return “infeasible” if no so-lution is found that meets the capacity constraints or ifnot all ports Π•1 are subject to waypoint enforcement.

Note that similar techniques are used for many classi-cal problems, such as Knapsack, Facility Location, Max-imal Independent Sets, etc. We leave the formal anal-ysis of the worst-case approximation ratios of the twoheuristics for future research, however we observe insmall-scale experiments that these heuristics are within10% of the optimal algorithm.

Data: Legacy network G0 = (Π t L, E)Result: Upgraded network G+ = (Π t S t L, E)S = ∅;while (

∑u∈S γ(u) ≤ β) do

(* choose next switch u ∈ L to upgrade *);

(* in case of Vol *);u := arg maxu∈L vol(u)/γ(u);

(* in case of Deg *);u := arg maxu∈L∆(u)/γ(u);

L = L \ {u}, S = S ∪ {u};G+ := (Π t S t L, E);

endif (feasible) return G+; else return ⊥;Algorithm 1: Upgrade heuristics Vol and Deg.

9

Site Access/Dist/Core max/avg/min degree

LARGE 1293/417/3 53/2.58/1MEDIUM –/54/3 19/1.05/1SMALL –/14/2 15/3/2

Table 1: Topology Characteristics

6. EVALUATIONOur goal in this section is to evaluate different “hy-

pothetical” partial SDN upgrade scenarios using threeexample campus networks. This simulation lets us(i) evaluate the scalability limits of our mechanism,namely, the available VLAN IDs and SDN flow tableentries, and (ii) explore the extent to which SDN con-trol benefits extend to the overall network. We focus onour simple heuristics Deg and Vol, and remark thatmore refined heuristics may result in more SDN enabledend-points for a given budget.

6.1 MethodologyTo simulate a network upgrade with Panopticon, we

must first obtain network topologies, traffic matrices,resource constraints, as well as cost estimates.Topologies: Detailed topological information, includ-ing switch and router-level configuration, link capaci-ties, and end-host placements is difficult to obtain: net-work operators are reluctant to share these details dueto privacy concerns. Hence, we leverage several pub-licly available enterprise network topologies [30] andthe topology of a private, local large scale campus net-work. The topologies range from SMALL, comprisingjust the enterprise network backbone, to a MEDIUMnetwork with 54 distribution switches and unknown ac-cess characteristics, to a comprehensive large-scale cam-pus topology derived from anonymized device-level con-figurations from 1713 L2 and L3 switches. Summaryinformation on the topologies is given in Table 1. Alllinks in each topology are annotated with their respec-tive link capacities as well as their switch port densities.

We use the SMALL network only when comparingour heuristics with the optimal solution computed byOpt (see § 6.3). We used the MEDIUM network as a“playground” during heuristic development and evalua-tion. However, as the results for the MEDIUM networkare qualitatively equivalent to the ones for the LARGEnetwork, for the remainder of the paper, we present onlyresults for the LARGE network.Focus on distribution switches: We distinguish be-tween access switches (switches of degree one), distri-bution switches, and core switches. As core switchesare typically very expensive and specialized, and henceare unlikely to be upgraded, we focus on distributionswitches (in the following referred to as the candidateset for the upgrade). In case of the LARGE network,this candidate set has cardinality 417. Within this net-

work, we reason about legacy network end-points whichwe wish to subject to SDN waypoint enforcement asdistribution-layer switch ports which lead to individualhost-facing access-layer switches. In this topology, weidentify 1160 such end-points.Traffic matrices: We use a methodology similar tothat used in SEATTLE [17] to generate a traffic ma-trix based on actual traces from an enterprise campusnetwork, the Lawrence Berkeley National Laboratory(LBNL) [25]. The LBNL dataset contains more than100 hours of anonymized packet level traces of activityof several thousand internal hosts. The traces were col-lected by sampling all internal switch ports periodically.We aggregate this data to obtain estimates of traffic ma-trix entries. More precisely, for all source-destinationpairs in each sample we estimate the load they imposeon the network over 10 and 60 minute periods, respec-tively. We note that the data contains sources from 22subnets.

To project the load onto our topologies, we take ad-vantage of the subnet information to partition eachof our topologies into subnets as well. Each of thesesubnets contains at least one distribution switch. Inaddition, we pick one node as the Internet gateway.Each end-point in every cluster is associated—in ran-dom round-robin fashion—with a traffic source from theLBNL network. Then all traffic within the LBNL net-work is aggregated to produce the intra-network trafficmatrix. All destinations outside of the LBNL networkare assumed to be reachable via the Internet gatewayand, thus mapped to the designated gateway node. Bypicking a different random assignment, we generate dif-ferent traffic matrices which we use in our simulations.Before using a traffic matrix we ensure that the topol-ogy is able to support it. For this purpose we projectthe load on the topology using shortest path routes.Resource constraints: Most mid- to high-end enter-prise network switches support 1024 VLAN IDs for si-multaneous use. The maximum number of VLAN IDsexpressible in 802.1Q is 4096. Accordingly, we use bothnumbers as parameters to capture VLAN ID resourceconstraints, one realistic; one optimistic. Most currentOpenFlow capable switches support at least 1,000 flowtable entries while many support up to 10,000 entriesacross TCAM and SRAM. Bleeding edge devices sup-port up to 100,000 flow table entries. Again, we use allnumbers as possible flow table capacity resource con-straints, one conservative, one realistic; one optimistic.Unless mentioned otherwise, we assume a setting whereflow table entries are only persistently (pro-actively)populated along paths that are used (rather than forall potential src-dst endpoint pairs). The results pre-sented in this paper include both optimistic as well asconservative slack factors. With regards to the link uti-lization when enforcing the waypoint enforcement pol-

10

SDN Switches

SD

N P

orts

[%]

●● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

Mor

eF

T E

ntrie

sM

ore

VLA

Ns

0 2 4 6 8 10 12 14 16 18 20

020

4060

8010

0

DEG 1k FT Entries, 1024 VLANsDEG 10k FT Entries, 1024 VLANsDEG 100k FT Entries, 1024 VLANsDEG 100k FT Entries, 4096 VLANsVOL 100k FT Entries, 1024 VLANs

(a) Realistic scenario

SDN Switches

SD

N P

orts

[%]

0 20 50 80 110 140 170 200 230 260 290 320

020

4060

80 DEG 1k FT Entries, 4096 VLANsVOL 1k FT Entries, 4096 VLANs

(b) Worst-case scenario

SDN Switches

[%]

●●

● ●● ●

Avg

. Str

etch

0 10 20 30 40 50 60 70 80 90 100

020

4060

8010

0

13.

25.

47.

69.

812

Max. Link Util. [%]2 x SDN Ports [%]Avg. Stretch

(c) Link utilization and path stretch

Figure 4: (a): Percentage of SDN-policed ports as a function of upgraded switches, available flow table entries (FT),and available VLAN IDs. (b): Percentage of SDN-policed ports in a scenario where flow table entries are very scarceand tables are populated along all possible paths. (c): Link utilization and average stretch trade-off as a functionof the number of SDN switches.

icy of SDN ports, we do not consider a partial-upgradefeasible whenever any projected link exceeds 50% of itsnominal link capacity.Switch cost estimates: We use the following sim-plified switch cost model. A switch with more than48 ports (up to 96) costs USD 60k. Switches with 48or fewer ports are priced w.r.t. per link bandwidth. Aswitch supporting 10 Gbps links costs USD 30k, 1 Gbpscost USD 3k. These numbers correspond to currentlyavailable hardware.

6.2 How much SDN can I get for my budget?The first key question we address is, “what fraction

of an enterprise campus network can Panopticon oper-ate as software-defined, given budget and resource con-straints for upgrading legacy switches?”Scenario 1: To investigate the interactions betweenflow table capacity and VLAN ID space we run 5 it-erations (with different random seeds) of both heuris-tics. Both heuristics greedily upgrade one switch at atime subject to the resource constraints, and we trackfor each upgrade the number of end-points that can bewaypoint enforced. Figure 4a plots the correspondingresults with 95% confidence intervals for different re-source parameters. Each point in the graph correspondsto a set of upgraded SDN switches and a specific frac-tion of upgraded SDN waypoint enforced end-points.The exact number of end-points that can be SDN way-point enforced depends on the traffic matrix, however,the spread of the intervals is small.Observations 1: The major take-away from Figure 4ais that switch flow table capacity plays a dominant rolewith regards to what fraction of the enterprise legacynetwork can be SDN-enabled. In the optimistic casewhen the switch supports 100,000 entries and 4,096VLAN IDs the whole network can be SDN enable byupgrading just a single distribution switch while ensur-ing reasonable link utilizations. This case, however, isunrealistic.

With more realistic resource assumptions, namely10,000 flow entries and 1,024 VLANs over 80% of thelegacy network end-points can be SDN enabled withjust five switches. This corresponds to upgrading just1.2% of the distribution switches of the LARGE net-work.

For the most conservative case (1,000 flow table ca-pacity and 1,024 VLAN IDs) we must upgrade a largerfraction of the legacy network in order to get a rea-sonable pay-back in terms of SDN enabled ports. 15switches allow us to way-point enforce roughly 20% ofthe end-ports.

In this scenario, VLAN resource constraints play verylittle role, as the number of large number of poorly-endowed switches leads to more connected componentsand enables better VLAN ID reuse. Also, note that thedifference in performance for the two heuristics for thisscenario is relatively minor.

6.3 Which heuristic should one use?Ideally, it would be optimal to always solve the Opt

for each scenario. Due to high runtime complexity,however it may not be practical. For the SMALL andMEDIUM networks, the runtimes of our un-tuned Optsolver on modern hardware are in the order of multipledays, while the performance of the heuristics is min-utes. Given our observation that heuristic results arewithin 10% of the Opt, we next focus on comparingthe capabilities of the heuristics.Scenario 2: Accordingly, the second experiment eval-uates what fraction of the end-points can be made SDNwaypoint enforced ports under severe flow table scala-bility limitations. We define a “worst case” where flow-table capacity is very small (1k), and enforce an all-to-all traffic matrix model where flow table entries arerequired for all paths between end-points; the numberof VLAN tags is set such that it does not pose any con-straint. We use both Deg and Vol.Observations 2: The major take away from Figure 4b

11

Time [s]

Pac

kets

/s

0 4 8 12 16 20 24 28 32 36 40

00.

51

Link failure

FailoverPrimaryBackup

Figure 5: Panopticon SCT fail-over mechanism

is that Vol can significantly outperform Deg when flowtable capacity is the major constraint. Vol performsbetter, and initially, approx. 5% upgraded switches aresufficient to police almost 20% of the ports. However,the marginal utility of additional upgrades declines;eventually, we achieve 70% SDN ports. The intuitionbehind this is that Deg upgrades switches in decreas-ing order of their degree. These high-degree switch typ-ically serve more (but smaller) traffic flows while thehigh-volume switches serve less (but larger) flows. Ac-cordingly, Vol needs a smaller number of flow entries tocapture a larger fraction of the traffic. Indeed, for thisharsh scenario, no iterative GREEDY upgrade solutionto achieve fully upgraded SDN network may exist at all.Finding better heuristics is the subject of our future re-search.

6.4 How will Panopticon affect my trafficScenario 3: Our third experiment investigates the im-pact of SDN waypoint enforcement on traffic. We startwith optimistic resource constraints (1024 VLAN IDsand 10K flow table entries) conservative traffic matrix,namely scaling the amount of traffic by two. This sce-nario has potentially the most severe impact on linkutilization and path stretch.Observations 3: Figure 4c plots the maximum linkutilization over all links (left y-axis) as the number ofupgraded switches increases. In addition, it shows theaverage path stretch across all waypoint enforced end-to-end paths of the traffic matrix (right y-axis). Themajor take away from Figure 4c is that Panopticon per-forms rather well in terms of both link utilization andstretch. While upgrading a single switch may yield alarge average stretch, upgrading a dozen SDN switches(less than 3% of all distribution switches) reduces thestretch to less than 10%. Moreover, the maximum linkutilization stays at reasonable levels. When only a smallnumber of legacy switches are upgraded, the maximumlink utilization may exceed 80%. However, as the num-ber of upgraded switches increases the maximum linkutilization stabilizes to roughly 60%.

6.5 Evaluating a Panopticon prototypeTo cross-check certain assumptions on which Pan-

opticon is founded, we created a prototype OpenFlowcontroller which implements the key functionalities of

legacy switch interaction. The primary goal of our pro-totype is to demonstrate feasibility for legacy Switchinteraction – namely the ability to construct and re-spond to link failure events and other behaviors withinthe SCT.

We design a simple experiment run in Mininet [12]which demonstrates the ability of an SDN pro-grammable switch to react to an STP re-convergence,and adapt the network forwarding state accordingly.Figure 5 illustrates one such fail-over event within atopology modeled after Figure 3. Host A sends pingsover switch 4 to host F until 20 seconds into the ex-periment, when a link failure between switch 1 and 4 issimulated and an STP re-convergence is triggered. Theresulting BDPU updates are observed by the controllerand the connectivity is restored over switch 2.

7. DISCUSSIONAs hinted in the introduction, Panopticon exposes an

SDN abstraction of the underlying partial-SDN deploy-ment, however the SDN fidelity of the global networkview is reduced to the set of upgraded switches. In thissection we focus on what this means for the SDN pro-gramming abstraction and control applications.Panopticon SDN vs. full SDN. As opposed to con-ventional links in a full SDN, links in Panopticon arepseudo-wires made up of legacy switches and links thatrun STP. Accordingly, the SDN controller must takeinto account the behaviors of STP within each SCT andISM. For example, an STP re-convergence in an SCTcan create the impression that a pseudo-wire “jumps”from one SDN switch to another. Conceptually, thismay correspond to a physical topology change, e.g., thebehavior of a multi-chassis link access group in a fullSDN.Hiding the partial deployment from the app. Aseach Panopticon end-point is not necessarily attachedto a SDN switch port, but rather to a group of SDNswitch port–the frontier. Consequently, the SDN con-trol platform may present or hide this information fromthe application to conceal the nature of the partial-deployment as needed. For example, if a control ap-plication wants to see the first packet of a flow to knowfrom where the packet originated, the control platformcan conceal the pseudo-wire nature of the legacy net-work, such that—to the application—the packet arriveddirectly from an end-point and not the physically neigh-boring legacy switch.Which SDN applications are possible? Theessence of SDN is to remove the application’s awarenessof the underlying physical network state and operate asa function over the global network view. As the con-trol platform is responsible for morphing the Panopti-con partial deployment into a consistent global networkview, we do not foresee any restrictions for the con-

12

trol applications that can be supported in a Panopticonnetwork as opposed to a full SDN. Naturally, early gen-eration SDN applications that attempt to interact withlegacy network islands will not work in Panopticon—but we do not view this as a limitation. Panopticonsubsumes this functionality and thus deprecate theseapplications.Scalability The key scaling factor of our system is thenumber of flow table entries–which is problematic whenmany wildcard matching rules are needed. Panopticoncan leverage recent work on Palette [16] which proposesan approach to decompose large SDN tables into smallones and then distribute them across the network, whilepreserving the overall SDN policy semantics. Further-more, as was demonstrated in [29], it is possible to useprogrammable MAC tables to off-load the TCAM andtherefore improves scalability.Why fully-deploy SDN in enterprise? Why shouldan enterprise ever move to a full deployment of SDN?Perhaps many enterprise networks do not need to fullydeploy SDN. As our results show, it is a question ofthe trade-offs between budget limitations and resourceconstraint satisfaction. Our Panopticon evaluation sug-gests that partial deployment may in-fact be the rightlong-term approach for some enterprise networks.

8. RELATED WORKOur overarching goal is to build a scalable network

architecture that fundamentally integrates legacy andSDN switches, while exposing an abstract global net-work view to the benefits of the control logic above. Assuch, Panopticon differs from previous work in software-defined networking, evolvable inter-networking, scalabledata-center network architectures, and enterprise net-work design and architecture.SDN. In the enterprise, SANE [5] and consecutivelyEthane [4] propose architectures where a centralizedpolicing system enforces fine-grained network policy.Ethane overcomes SANE [5]’s obstacles to deploymentby allowing compatibility with legacy devices. How-ever, its integration with the existing deployment is onlyad-hoc and the behavior of legacy devices falls out ofEthane’s control. Panopticon supports policy enforce-ment with an architecture that integrates with the ex-isting deployment through a rigorously-derived networkdesign. Recently, Casado et al. in [6] propose a fab-ric abstraction for SDN to decouple the network “edge”from the “core” so as to introduce the necessary flex-ibility to evolve network design. We view Panopticonas a generalization of such abstraction for the specificdimension of exposing a global network view as a layerof indirection over an upgrade-able network substrate.Evolvable inter-networking. The question of howto evolve or run a transitional network, has been dis-cussed in many contexts. Plutarch [8], proposes an

inter-networking approach that subsumes existing ar-chitectures such as IP. The notions of context and in-terstitial function are introduced to deal with (and ex-ploit!) network heterogeneity. Xia [11] addresses the“narrow waist” design of the Internet as a barrier toevolvability. It natively supports the ability to evolve itsfunctionality to accommodate new, unforeseen, princi-pals over time. Admittedly, Panopticon has more mod-est aims. Still, we believe it provides a reference demon-stration for operating transitional enterprise networks.Scalable data-center network architectures.There is a wealth of recent work towards improvingdata-center network scalability. To name a few, Al-Fares et al. [2], VL2 [10], PortLand [23], NetLord [22],PAST [29] and Jellyfish [28], offer scalable alternativesto classic data-center architectures at lower costs. How-ever, as a clean-slate data-center architectures, theseapproaches are less applicable to transitional enterprisenetworks. Data-center network topologies are oftenhighly regular, and far less embroiled in legacy hard-ware. The main objective of the data-center networkis to support full bisection bandwidth for low- latencyany-to-any communication patterns. In contrast, enter-prise network structure is less homogeneous and grow“organically” over time. Enterprise networks typicallyexhibit low utilization, and the main objective is toprovide connectivity while ensuring isolation policies.Waypoint Enforcement [5], as in Panopticon is one wayto achieve this goal.Enterprise network design and architecture.Sung et al. [30] propose a systematic redesign of enter-prise networks with the goal of making a parsimoniousallocation of VLANs to ensure reachability and provideisolation; this paper focuses on legacy networks only.The SEATTLE [17] network architecture uses a one-hop DHT host location lookup service to scale large en-terprise Ethernet networks. However, such clean-slateapproach is not applicable for the type of transitionalnetworks we consider. Today’s scalability issues in largeenterprise networks are typically dealt with by buildinga network out of several (V)LANs interconnected viaL3 routers [7, 15]. TRILL [26] is an IETF Standard forso-called RBridges that combines bridges and routers.TRILL bridges use their own link state routing protocol,improving flexibility and add routing. Similar to ourapproach, VLANs are used to support multi-path for-warding. While TRILL requires hardware and firmwarechanges, it can be deployed incrementally. However, weare not aware of any work discussing and rigorouslyevaluating the use of TRILL for efficient policy enforce-ment in enterprise networks, or where to optimally de-ploy RBridges.

To the best of our knowledge, there is no previouswork on partial SDN upgrade.

13

9. SUMMARYManaging and configuring enterprise networks is a

non-trivial challenge given their complexity. While SDNpromises to ease these challenges through principlednetwork orchestration, it is nearly impossible to fullyupgrade an existing legacy network to an SDN in a sin-gle operation. Accordingly, in this paper, we systemati-cally tackle the problem of how to partially deploy SDN-enabled switches into existing legacy networks, and reapthe benefits for as much of the network as is feasible,subject to budget and resource constraints.

Accordingly, we have developed Panopticon, anarchitecture and methodology for aiding operators inplanning and operating networks that combine legacyswitches and routers and SDN switches. Our evaluationhighlights that our approach can deeply extend SDNcapabilities into existing legacy networks. By upgrad-ing just 3% of the distribution switches in a large cam-pus network, it is possible to realize the network as anSDN, without violating reasonable resource constraints.Our results motivate the argument, that partial SDNdeployment may indeed be an appropriate long-termoperational strategy for enterprise networks. In futurework, we plan to expand our prototype implementationto serve end-users, and further refine our algorithms.

10. REFERENCES[1] Nicira Network Virtualization Platform.

http://nicira.com/en/

network-virtualization-platform.[2] M. Al-Fares, A. Loukissas, and A. Vahdat. A

scalable, commodity data center networkarchitecture. In SIGCOMM, 2008.

[3] A. Bley. Approximability of unsplittable shortestpath routing problems. J. Netw., 54(1):23–46,2009.

[4] M. Casado, M. J. Freedman, J. Pettit, J. Luo,N. McKeown, and S. Shenker. Ethane: Takingcontrol of the enterprise. In SIGCOMM, 2007.

[5] M. Casado, T. Garfinkel, A. Akella, M. J.Freedman, D. Boneh, N. McKeown, andS. Shenker. SANE: a protection architecture forenterprise networks. In USENIX SecuritySymposium, 2006.

[6] M. Casado, T. Koponen, S. Shenker, andA. Tootoonchian. Fabric: a retrospective onevolving SDN. In HotSDN, 2012.

[7] Cisco. Campus Network for High AvailabilityDesign Guide, 2008. http://www.cisco.com/en/US/docs/solutions/Enterprise/Campus/HA_

campus_DG/hacampusdg.html.[8] J. Crowcroft, S. Hand, R. Mortier, T. Roscoe, and

A. Warfield. Plutarch: an argument for networkpluralism. SIGCOMM CCR, 33(4), 2003.

[9] N. Foster, R. Harrison, M. J. Freedman,

C. Monsanto, J. Rexford, A. Story, andD. Walker. Frenetic: A network programminglanguage. In ICFP, 2011.

[10] A. Greenberg, J. R. Hamilton, N. Jain,S. Kandula, C. Kim, P. Lahiri, D. A. Maltz,P. Patel, and S. Sengupta. VL2: A scalable andflexible data center network. In SIGCOMM, 2009.

[11] D. Han, A. Anand, F. Dogar, B. Li, H. Lim,M. Machado, A. Mukundan, W. Wu, A. Akella,D. G. Andersen, J. W. Byers, S. Seshan, andP. Steenkiste. Xia: efficient support for evolvableinternetworking. In NSDI, 2012.

[12] N. Handigol, B. Heller, V. Jeyakumar, B. Lantz,and N. McKeown. Reproducible networkexperiments using container-based emulation. InProceedings of the 8th international conference onEmerging networking experiments andtechnologies, CoNEXT ’12, pages 253–264, NewYork, NY, USA, 2012. ACM.

[13] N. Handigol, B. Heller, V. Jeyakumar,D. Mazieres, and N. McKeown. Where is thedebugger for my Software-Defined Network? InHotSDN, 2012.

[14] B. Heller, S. Seetharaman, P. Mahadevan,Y. Yiakoumis, P. Sharma, S. Banerjee, andN. McKeown. ElasticTree: Saving energy in datacenter networks. In NSDI, 2010.

[15] Juniper. Campus Networks ReferenceArchitecture, 2010.http://www.juniper.net/us/en/local/pdf/

reference-architectures/8030007-en.pdf.[16] Y. Kanizo, D. Hay, and I. Keslassy. Palette:

Distributing tables in software-defined networks.In INFOCOM, 2013.

[17] C. Kim, M. Caesar, and J. Rexford. Floodless inSeattle: A scalable ethernet architecture for largeenterprises. In SIGCOMM, 2008.

[18] T. Koponen, M. Casado, N. Gude, J. Stribling,L. Poutievski, M. Zhu, R. Ramanathan, Y. Iwata,H. Inoue, T. Hama, and S. Shenker. Onix: Adistributed control platform for large-scaleproduction networks. In OSDI, 2010.

[19] D. Levin, A. Wundsam, B. Heller, N. Handigol,and A. Feldmann. Logically Centralized? StateDistribution Tradeoffs in Software DefinedNetworks. In HotSDN, 2012.

[20] N. McKeown, T. Anderson, H. Balakrishnan,G. Parulkar, L. Peterson, J. Rexford, S. Shenker,and J. Turner. OpenFlow: enabling innovation incampus networks. SIGCOMM CCR, 38(2), 2008.

[21] A. Medina, N. Taft, K. Salamatian,S. Bhattacharyya, and C. Diot. Traffic matrixestimation: existing techniques and newdirections. In SIGCOMM, 2002.

[22] J. Mudigonda, P. Yalagandula, J. Mogul,

14

B. Stiekes, and Y. Pouffary. NetLord: a scalablemulti-tenant network architecture for virtualizeddatacenters. In SIGCOMM, 2011.

[23] R. Niranjan Mysore, A. Pamboris, N. Farrington,N. Huang, P. Miri, S. Radhakrishnan,V. Subramanya, and A. Vahdat. PortLand: ascalable fault-tolerant layer 2 data center networkfabric. In SIGCOMM, 2009.

[24] ONF. Hybrid Working Group. https://www.opennetworking.org/working-groups/hybrid.

[25] R. Pang, M. Allman, M. Bennett, J. Lee,V. Paxson, and B. Tierney. A first look at modernenterprise traffic. In ACM IMC, 2005.

[26] R. Perlman, D. Eastlake, D. G. Dutt, S. Gai, andA. Ghanwani. Rbridges: Base protocolspecification. In Technical report, IETF, 2009.

[27] M. Reitblatt, N. Foster, J. Rexford,C. Schlesinger, and D. Walker. Abstractions fornetwork update. In SIGCOMM, 2012.

[28] A. Singla, C.-Y. Hong, L. Popa, and P. B.Godfrey. Jellyfish: networking data centersrandomly. In NSDI, 2012.

[29] B. Stephens, A. Cox, W. Felter, C. Dixon, andJ. Carter. PAST: Scalable Ethernet for datacenters. In CoNEXT, 2012.

[30] Y.-W. E. Sung, S. G. Rao, G. G. Xie, and D. A.Maltz. Towards systematic design of enterprisenetworks. In CoNEXT, 2008.

[31] R. Wang, D. Butnariu, and J. Rexford.Openflow-based server load balancing gone wild.In Hot-ICE, 2011.

[32] H. Zeng, P. Kazemian, G. Varghese, andN. McKeown. Automatic test packet generation.In CoNEXT, 2012.

[33] Y. Zhang, M. Roughan, N. Duffield, andA. Greenberg. Fast accurate computation oflarge-scale ip traffic matrices from link loads. InSIGMETRICS, 2003.

APPENDIXTheorem A.1. Panopticon ensures that: (1) Forwarding sets

FS(s, t) are loop-free. (2) End-to-end paths including an SDNport are subject to Waypoint Enforcement. (3) End-to-endpaths are mutually isolated. (4) VLANs can be reused in differentcell blocks.

Proof. We prove the three correctness properties (1), (2),and (3), and the efficiency property (4) in turn.

Property (1): An end-to-end path from s ∈ Π• to t ∈ Π• isof the following form: SCT (s) → u1 ∈ SCT (s) → ISM(u1, u2)→ u2 ∈ SCT (t) → SCT (t). Therefore, loop-freeness followsdirectly from the loop-freeness of the VLAN STPs in SCT (s),ISM(u1, u2), and SCT (s). Moreover, even if only a small subsetof inter-SDN switch paths are realized as VLANs, and u1 andu2 are not directly connected, the SDN controller can make surethat sequence of VLANs used for forwarding traffic from u1 to u2

is loop-free.Property (2): This property directly follows from the definition

of the Solitary Confinement Tree (SCT ). Since each s ∈ Π• isin its own VLAN, the packets sent from s can only reach anotherVLAN via the SDN switches in the frontier F of s.

Property (3): The isolation property is due to the fact thateach SCT contains exactly one end-point π ∈ Π, and that theinter-switch VLANs (in the ISM) to connect two SDN switchesu1 ∈ F , u2 ∈ F (u1, u2 ∈ S) do not traverse any end-point π ∈ Π.

Property (4): By the cell block definition, all communicationbetween two end-points π1 ∈ c1 ∈ CB and π2 ∈ c2 ∈ CB withc1 6= c2 passes through the SDN switches in F(c1) and F(c2).Hence, all VLANs can be isolated from each other, and VLANIDs reused in the different cell blocks.

15