23
B.2 Technical Results & Lessons learned B.2.1 LASH-5G system deployment The details of the LASH-5G orchestration system deployment on the Fed4FIRE+ platform are depicted in Fig. 2. The setup consists of five slices instantiated on the Virtual Wall testbed, including a slice with the Chain Optimizer, a slice with the WAN SDN domain, three slices with the SDN-based Edge cloud domains (named DC-1, DC-2 and DC-3). The established Edge cloud slices and the WAN slice interact at the data plane level by exchanged packet data traffic and at the orchestration plane level by exchanging control messages between Chain Optimizer, WIM and VIMs. Each Edge cloud slice is connected to the WAN slice at the data plane level by means of VXLAN tunnels established on top of the Virtual Wall management network. In particular, each VXLAN tunnel is established between the node representing the egress router of an Edge cloud slice and one of the node in the WAN slice. The VXLAN virtual tunnel endpoint (VTEP) located at the egress router appears as an IP-routable interface of the router itself, whereas the corresponding VTEP located at the WAN slice node is bridged to an Open vSwitch (OvS) instance running in the same node. This particular setup allows making the WAN slice act as a layer-2 infrastructure connecting the three Edge cloud slices. Because each WAN slice node is running an OvS instance, the slice itself can be programmed by an SDN controller. The Chain Optimizer slice exchanges messages with the other slices at the orchestration plane level via the Virtual Wall management network. In particular, the Chain Optimizer sends service function chaining CRUD (Create, Read, Update, Delete) requests to the relevant VIMs and WIM through their intent-based northbound interface. Interactions at the network control plane level do not take place between different slices. This is in line with the LASH-5G architecture, where each domain is supposed to adopt its own SDN control plane solution independently of the choice made by other domains. Fig. 2: Deployment of the LASH-5G experiment on the Fed4FIRE+ testbed The following subsections provide further details of each slice/domain deployed on the Fed4FIRE testbed and its operation.

B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

B.2 Technical Results & Lessons learned

B.2.1 LASH-5G system deployment

The details of the LASH-5G orchestration system deployment on the Fed4FIRE+ platform are depicted in

Fig. 2. The setup consists of five slices instantiated on the Virtual Wall testbed, including a slice with the

Chain Optimizer, a slice with the WAN SDN domain, three slices with the SDN-based Edge cloud domains

(named DC-1, DC-2 and DC-3). The established Edge cloud slices and the WAN slice interact at the data

plane level by exchanged packet data traffic and at the orchestration plane level by exchanging control

messages between Chain Optimizer, WIM and VIMs.

Each Edge cloud slice is connected to the WAN slice at the data plane level by means of VXLAN tunnels

established on top of the Virtual Wall management network. In particular, each VXLAN tunnel is established

between the node representing the egress router of an Edge cloud slice and one of the node in the WAN

slice. The VXLAN virtual tunnel endpoint (VTEP) located at the egress router appears as an IP-routable

interface of the router itself, whereas the corresponding VTEP located at the WAN slice node is bridged to

an Open vSwitch (OvS) instance running in the same node. This particular setup allows making the WAN

slice act as a layer-2 infrastructure connecting the three Edge cloud slices. Because each WAN slice node is

running an OvS instance, the slice itself can be programmed by an SDN controller.

The Chain Optimizer slice exchanges messages with the other slices at the orchestration plane level via the

Virtual Wall management network. In particular, the Chain Optimizer sends service function chaining CRUD

(Create, Read, Update, Delete) requests to the relevant VIMs and WIM through their intent-based

northbound interface. Interactions at the network control plane level do not take place between different

slices. This is in line with the LASH-5G architecture, where each domain is supposed to adopt its own SDN

control plane solution independently of the choice made by other domains.

Fig. 2: Deployment of the LASH-5G experiment on the Fed4FIRE+ testbed

The following subsections provide further details of each slice/domain deployed on the Fed4FIRE testbed and its operation.

Page 2: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

B.2.1.1 WAN SDN Slice

The WAN SDN slice, shown in Fig. 3, consists of three main components: the SDN network, the WAN SDN

controller and the WIM. All these components have been deployed within a unique experiment slice where

one physical node has been allocated for the SDN controller and the WIM, and 5 other physical nodes were

allocated for the SDN topology. All the nodes run a Ubuntu 16.04 distribution. Specifically, we have

installed the Open vSwitch (OvS) software on the physical nodes composing the topology in order to

emulate OpenFlow switches. OvS is a multilayer virtual switch designed to enable massive network

automation through programmatic extension, while still supporting standard management interfaces and

protocols. Three of the OpenFlow switches are connected, through VXLAN tunnels, to the Edge cloud

domains gateways. The OvS instances connect to the SDN controller, which is an instance of the ONOS

controller that has been downloaded and installed on the 6th physical node of the slice. The ONOS version

is Junco (1.9.0).

Fig. 3: Deployment of the WAN SDN island

The WIM is an orchestration software component, written in Java, that allows for the programmable

provision of data delivery paths connecting DCs where VFs are deployed [9]. The set-up and the teardown

of such delivery paths can be triggered through an intent-based API exposed at Northbound that allows

upper-layer components (i.e., Chain Optimizer) to use an application-oriented semantic rather than dealing

with technology-specific low-level network details.

Hence, the set-up of delivery paths in the WAN to connect VFs is triggered by the Chain Optimizer to the

WIM by specifying the source DC and destination DC to connect the respective source VF/node and

destination VF/node in the chain. Then, the WIM derives Edge cloud domains gateways to be connected

and performs mapping operations by identifying the network path and, accordingly, enforces the

forwarding rules to the switches along the path.

Moreover, the WIM offers adaptation capabilities for the established paths to recover from congestion

events (e.g., service outages or degradation events) that are possible when a concurrent resource usage

takes place. According to IETF SFC guidelines where load status control is expected [31] if a (risk of)

degradation is detected, the WIM performs the redirection of the delivery path or a segment of a delivery

path, with an overall load balancing beneficial effect [24]. More specifically, the WIM periodically monitors

the switches load status and retrieves, from the SDN controller, the amount of bytes received/transmitted

at the switches and then calculates their throughput by dividing this amount by the duration of the polling

interval. Finally, the WIM is also responsible of the collection of network latency information in order to

Page 3: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

retrieve the inter-DCs delays. Those delays are then made available to the Chain Optimizer to enhance its

resource orchestration capabilities by computing a minimum-latency service graph.

B.2.1.2 Edge Cloud Slices

To deploy the Edge cloud domains, three different slices were set-up on Virtual Wall. This choice allowed us

to run, deploy, and test each Edge cloud domain incrementally and independently of the other ones, thus

saving configuration and setup time with respect to the alternative option of deploying a single slice

including all three domains.

Fig. 4: Deployment of one of the Edge cloud slices

As shown in Fig. 4 for the case of DC-1, each Edge cloud slice includes an independent OpenStack cluster (Pike version) and consists of the following nodes, running a Linux Ubuntu 16.04 operating system:

an OpenStack controller node, where all required OpenStack services are executed and where the

latter expose their REST API endpoints;

two or three OpenStack compute nodes, where virtual machine instances are running over a

QEMU-KVM hypervisor; the controller node acts as one of the compute nodes;

an OpenStack network node, providing external network connectivity to the virtual instances; the

controller node acts also as the network node;

a node connected to the aforementioned OpenStack nodes and running an instance of OvS,

representing the SDN infrastructure of the Edge cloud data centre;

a node acting as the egress router of the Edge cloud slice, connected to the SDN WAN slice through

VXLAN tunnels;

a virtual node used for testing purposes.

In addition, three network segments are present within each slice implementing:

the OpenStack management network, used by compute nodes to communicate with the controller;

the OpenStack data network, for traffic exchanged by the virtual instances;

an external network, connecting the virtual routers instantiated in the OpenStack network node to

the slice egress router.

Page 4: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

The enhanced VIM runs on one of the nodes, typically on the OvS node interconnecting the OpenStack

cluster, although this choice is not mandatory. An instance of the ONOS controller, acting as the SDN

controller of the Edge cloud data center network infrastructure, is also executed on the OvS node. Finally,

each Edge cloud slice is provided with public IP addresses in order to have easy access to the OpenStack

and ONOS dashboards.

The OpenStack cluster running in each slice exposes the essential services and related APIs, including:

compute and placement (Nova), identity (Keystone), image (Glance), and network (Neutron). In addition,

the metric service (Gnocchi) and APIs are enabled in order to collect processing latency measurements

periodically reported by the VF instances. The collected data are then queried by the Chain Optimizer.

The Neutron service running in each compute node was configured with the OvS plugin, in order to add

SDN functionality to the virtual bridges internal to OpenStack nodes. This approach is also facilitated by an

innovation introduced in the most recent versions of Neutron, which takes advantage of an instance of the

Ryu SDN controller running in each compute node to control the internal forwarding mechanisms [32], as

shown in Fig. 5.

Fig. 5: The presence of Ryu controller inside OpenStack nodes enables native SDN capabilities [32]

In order to achieve the traffic steering capabilities inside each Edge cloud slice, as required by the LASH-5G

architecture, the enhanced VIM has to install appropriate OpenFlow rules to both the OvS node

representing the data center network infrastructure of the Edge cloud data center, and the virtual switches

used internally by the OpenStack compute nodes, as shown by the dashed lines in Fig.4. To this purpose,

we took advantage of the ONOS intent-based REST API and the Ryu OpenFlow REST API, after enabling the

latter in each Neutron instance. The OpenFlow rules to be installed on the OvS node depend on the specific

virtual instance network architecture configured in Neutron. We decided to adopt a “flat network”

architecture, so that the OvS node is able to natively see (and properly steer) the traffic exchanged by the

instances running in the compute nodes. Although this solution actually breaks the tenant traffic isolation

feature offered by OpenStack, we were forced to make this choice because: (i) to achieve better traffic

Page 5: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

control, we wanted to avoid tunneling solutions such as GRE and VXLAN; (ii) using VLANs for that purpose

does not work on the Virtual Wall test bed, probably because the underlying physical network

infrastructure uses VLANs for slice isolation and does not allow nested VLAN tagging (Q-in-Q).

The enhanced VIM is an orchestration software component, written in Python, that exposes an intent-

based northbound REST interface, allowing specifying a service chain by means of a high-level descriptive

syntax, agnostic to the specific SDN technology adopted. The details of the VIM design and intent-based

northbound interface specification can be found in [30].

B.2.1.3 Chain Optimizer

The Chain Optimizer (CO) has been deployed in an isolated experiment slice on a physical node on Virtual

Wall. The node runs a Ubuntu 16.04 distribution. It communicates with the other slices (WIM and VIMs) via

the management network. Indeed, the Chain Optimizer handles service function chaining requests and

orchestrates virtual infrastructure managers (WIM/VIMs) for enforcing traffic flows to be steered along the

selected VF instances. An example of interworking between the Chain Optimizer and the VIMs and WIM is

shown in the Fig. 7.

Fig. 6: Example of Service chain request

More specifically, with reference to the example of VFs chaining shown in Fig.6, the CO is expected to

receive requests specified as follows:

{"serviceType" : "Type1 ",

"maxLatency " : 100,

"source" : "Node-A",

"destination" : "Node-B",

"vfChain" : "VF-1,VF-2"}

In this example, the request means that a traffic flow identified by the given serviceType, source and

destination, should be processed by a chain of VFs of type VF-1, VF-2. The CO handles this request by first

selecting the instances of VF-1, VF-2 available in a distributed multi-DC environment (e.g. VF-1 instance in

Domain 1 and VF-2 instance in Domain 2) and then sends appropriate forwarding instructions to concerned

VIMs and WIM so that the flow is actually steered through these instances (see Fig. 7).

Page 6: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 7: Example of interworking between Chain Optimizer and VIM/WIM

The CO architecture is depicted in Fig. 8. It contains the following components:

· REST API: CRUD operations on service chains are exposed through REST APIs.

· Controller: It is in charge of handling received requests for creating, retrieving, updating or deleting

a service chain. It orchestrates the interaction with the Wrapper, Monitoring, Forwarding

Instruction Dispatcher and Storage blocks. CRUD operations on service chains are exposed as REST

APIs.

· Wrapper: It invokes a VNF selection algorithm that, leveraging an algorithm presented in [22],

selects VF instances available from different clouds over the path that minimizes the end-to-end

latency considering both processing delays and network delays information. The implemented

optimization model selects the nodes (i.e. DCs) that provision the VFs over the path that

minimizes the overall latency (i.e. network and processing latency). The optimization problem has

been formulated as a Resource Constrained Shortest Path problem on an auxiliary layered graph

properly defined. The layered structure of the graph ensures that the order of VFs specified in the

request is preserved. Additional constraints (e.g., maximum allowed network latency on the whole

path or between two specific VFs) can be taken into account and properly enforced during the

graph construction phase. The algorithm should receive as input the service chaining request as

well as an up-to-date view of the underlying infrastructure topology. As output, the algorithm will

provide a solution or a not feasible solution reply. If a solution is given, an estimated end-to-end

latency is also provided. If this value exceeds the maximum latency value provided in the request,

the request is not accepted and an error message is returned to the client.

Page 7: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 8: Chain Optimizer architecture

· Monitoring: this component periodically interacts with VIMs and WIM monitoring API to collect

measurements needed to maintain an up-to-date view of the underlying infrastructure topology.

Collected measurements include: inter-DC latencies, types and instances of VF deployed at each

DCs and related processing latency.

· Dispatcher. This component handles the interactions with VIMs and WIMs in order to enforce

operations on a target service chain. For instance creating a service chain will imply sending

forwarding instructions to affected VIMs and WIMs, while deletion will imply sending delete

operations. The CO leverages Northbound REST APIs provided by VIMs and WIM to send such

instructions.

· Storage. Relevant data concerning service chains are persisted for online operation (e.g. monitoring

and update operation) as well as for collecting data for statistics (e.g. acceptance ratio,

performance metrics, etc.).

Fig. 9 shows the sequence diagram related to the flow of interactions toward the computation of the

latency-minimized VF chain and its establishment leveraging VIM/WIM. A request generator (acting as a

Client and named Actor) sends a chaining request to the CO endpoint (POST operation on URI). The CO

parses the request, checking the validity of provided information, and invokes the optimization algorithm. If

the algorithm does not find any solution, an error is returned. If a solution is found, a control is performed

to check if the maximum latency constraint provided in the request is fulfilled by the solution provided by

the CO. If the constraint is not satisfied, an error is returned to the client. If the computed latency is lower

than the maximum value, the CO process the solution to generate appropriate instructions to be sent to

affected VIMs and WIM through their intent-based interfaces. If all VIMs and WIM provide an

acknowledgement reply, the CO sends a response to the client.

Page 8: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 9: Service chain request reception and handling

B.2.2 Experiment deployment and preliminary system validation

After having deployed and individually debugged all the components described in section B.2.1, we

proceeded with the instantiation of VFs and service endpoints to be used for the final experiment. This

phase required the placement of VFs across the three Edge cloud data centers. Since the placement

problem is out of the scope of the LASH-5G experiment, we used the predefined placement shown in Table

1, which assumes that some VFs (VF-1 to VF-4 and VF-8 to VF-10) are available only in DC-1 and DC-2,

respectively, whereas VF-5 to VF-7 are available on each data center. The table also shows how many

instances of each VF type have been activated in each data center.

Page 9: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Table. 1: Placement of the VF instances on the three Edge cloud data centers.

The correct placement of service endpoints and VF instances as per Table. 1 is proved by the following

screenshots taken from the OpenStack dashboards on each data centers (see Fig. 10 – Fig. 12).

Fig. 11: DC-1 OpenStack dashboard

Page 10: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 12: DC-2 OpenStack dashboard

Fig. 13: DC-3 OpenStack dashboard

In order to carry out the experiment activities, besides the jFed software needed to define and deploy slices

on the Fed4FIRE+ testbeds, we used the following tools:

- POSTman: it is a widely adopted tool for REST APIs development and testing

- CO GUI: this is a web-based GUI implemented for LASH-5G the CO for visualizing the inter-DC

topology at the level of abstraction handled by the CO. It visualizes the last created chain and

collects some statistics (i.e. number of accepted/failed request, computation time, etc.)

- CO Client GUI: it is a web-based GUI implemented for LASH-5G for easing the generation and

delivery of service chain creation requests and their basic management (e.g., deletion).

- iperf3: a widely used packet generator for testing the actual traffic steering across the chains;

- a number of customized bash and python scripts developed to automate experiment operations.

Page 11: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Moreover, in order to emulate VF instances whose processing latency varies with traffic load, we

implemented a Java application that, according to a configurable processing capacity value, computes a

processing latency value as a function of the processing capacity and the traffic input rate measured at the

network interface. This measurement is periodically posted on an ad-hoc defined metric maintained by the

Gnocchi database in the OpenStack deployment.

Fig. 13 provides a graphical representation (as provided by a CO GUI) of the topology at the level of

abstraction handled by the CO. The VF placement is correctly discovered by the Chain Optimizer that also

periodically receives monitoring information to build and maintain and update view on the multi-DC NFVI

topology. For each VF type, processing latency offered by different DCs is displayed, as well as minimal

latency between each pair of DCs. Please notice that the topology maintained by the CO is a logical view on

top of the topologies handled by the WIM and VIMs and some details (instances of VFs and WAN links) are

not handled.

Fig. 13: Topology view maintained by the Chain Optimizer, shown through the Chain Optimizer GUI

The CO has been tested against its capability to find the optimal chaining given the established VF instance

placement and offered values of network and processing latency. Consider the abstract chain reported

below and also graphically represented in Fig. 14:

{"serviceType" : "Type5" ,

"maxLatency" : 300 ,

" source " : "Node-B.dc1" ,

" destination " : "Node-E.dc2" ,

" vfChain " : "VF-1,VF-2,VF-6,VF-9,VF-10"}

Page 12: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 14 Service chain request (i.e., abstract chain) processed by the Chain Optimizer:

Fig. 15 depicts the solution computed by the CO(as visualized by the CO GUI) for such a service chain

request, i.e VF-1 and VF-2 in DC1, VF-6 in DC3 and VF-9b, VF-10 in DC-2).

Fig. 15: The Chain Optimizer GUI showing computed solution computed for the considered service chain request.

Finally, Fig. 16 shows the configuration messages sent to the VIMs/WIM to actually deploy the service chain

(according to the computed solution) across the DC-1, DC-2, and DC-3 interconnected through the WAN.

Page 13: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 16: The solution computed by the Chain Optimizer and the configuration messages sent to the VIMs/WIM

B.2.3. Experiment Workflow

The following two experiment steps have been performed: latency-optimized VF chains establishment and

service chain adaptation.

1. Latency-optimized VF chains establishment.

This experiment step aims at evaluating the capability of the Chain Optimizer to process service chain

requests, to compute the latency-optimized VF chains by correctly elaborating monitoring data on

processing and network latency, and to set-up VF chains across network and cloud domains by properly

interacting with the underlying enhanced VIM and WIM.

To this purpose, we used the CO Client GUI that allows for the service chain creation and deletion request

generation as well as their delivery to the CO. In particular, five different service chain requests with their

respective requirements have been generated and sent to the CO through the CO Client GUI, according to

the following sequence:

SC1: NODE-A.dc1, VF1, VF2, VF8, NODE-D.dc2 1Mbps 500ms

SC2: NODE-B.dc1, VF3, VF6, VF9, VF10, NODE-E.dc2 1Mbps 500ms

SC3: NODE-A.dc1, VF4, VF7, VF10, NODE-E.dc2 1Mbps 500ms

SC4: NODE-B.dc1, VF4, VF5, VF9, VF10, NODE-D.dc2 1Mbps

SC5: NODE-C.dc1, VF1, VF2, VF7, VF8, NODE-F.dc2 1Mbps

Fig. 17 shows an example of a manually triggered chain request at the time of SC4 creation.

After the service chain deployment, the CO Client GUI allows also to delete by clicking on the ‘remove’

button.

Page 14: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

FIg. 17 creation and delivery of a service chaining request

As described above, the CO handles the request, computes a latency-optimized solution and sends the

corresponding forwarding instructions to affected VIMs and WIMs. If this procedure is successful, a

response is returned to the CO client GUI with the solution (set of DCs where the chain has been deployed),

and the computed end-to-end latency and computation time needed by the algorithm for solving the

optimization problem (see Fig. 17 - light green box). Fig 18 shows the CO GUI visualizing the new service

chain (i.e., SC4).

Page 15: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 18: The Chain Optimizer GUI shows the deployment of a service chain specified in the previous figure across DC-1,

DC-2, and DC-3 interconnected through the WAN.

The CO persists all the established service chains in the DB together with related measurements (e.g.,

response time, computation time, etc.). The deletion operation on a service chain is handled by the CO by

sending delete requests to the concerned VIMs and WIM and updating the status of request in the DB.

At the VIMs/WIM side, the service chain set-up requests result in a proper set of configuration actions to be

enforced into the internal nodes according to the configuration messages with the forwarding instructions

sent by the CO.

More specifically, as soon as the VIMs involved in the service chain deployment receive the request, it is

their responsibility to discover where the specified VF instances are located in order to properly interact

with the relevant SDN controller(s). In particular, traffic steering within a given OpenStack node is

controlled by Ryu, whereas flows crossing different OpenStack nodes are managed by ONOS. VF discovery

allows also to gather all the information needed to compose either Ryu flow messages or ONOS intent

messages; these messages are essential in order to install proper Ryu flow rule(s) and/or ONOS intent(s) via

the controllers’ REST API interaction. Once service chains have been established, traffic starts flowing along

the whole path from source to destination, traversing VF instances of the type specified in the request.

On the other hand, as requests arrives to the WIM, the network path is computed that connects the

specified DCs across the WAN and forwarding rules established in the involved switches, accordingly.

Page 16: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 19 SDN Network before service request deployment

Fig. 19 shows the status of the SDN network before SC1-SC5 requests are sent. All the switches are

unloaded since no traffic is traversing the network. Once a service chain request arrives to the WIM, it

selects the best path in terms of network latency and switches availability and then interacts with the SDN

controller in order to setup the relevant flow entries. As soon as the switches are configured and the chain

is established in the Edge cloud domains, the traffic starts flowing across the network and we notice an

increase of the overall throughput of the involved switches, as plotted in Fig. 20.

Fig. 20 SDN Network after service request deployment

The experiment demonstrated the correct deployment of the service chains across all the involved domains

(Edge cloud and SDN WAN through VIMs and WIM respectively). As a proof of the correct chaining, Fig. 21

shows the sequential time diagram of the throughput measured at some of the VF instances involved in the

aforementioned chains, after some UDP-based iperf3 flows were generated.

Page 17: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 21 Sequential time diagram of the throughput measured at some VF instances while the service chains are

deployed

2. service chain adaptation.

This step aims at evaluating the adaptive capability of the orchestration system in dynamically adjusting

established service (i.e., VF) chains with respect to the current service and network contexts (e.g.,

occurrence of SLA violations, switch/link congestions), based on the processing of a selective set of

monitoring data (e.g., data throughput at switches/links). Adaptations to the service context may involve

re-optimization operations to be performed by the Chain Optimizer. Hereafter we describe adaptation in

terms of: a) Service chain update and b) Service chain path redirection.

Service chain update

As for adaptations of VF chains with respect to the service context, this may include the update of the chain

to cope with SLA violations or demand change and consequent re-optimization operations to be performed

by the Chain Optimizer. More specifically, re-optimization may consists in updating the established service

chain by adding one or more new VFs in the chain, while keeping the rest of the chain unchanged. This

would imply to change only a part of the service function path to include the newly specified VF.

In order to trigger a chain update, a CO client sends a request for updating an already existing chain by

specifying the source and destination nodes and serviceType (identifying an existing chain), the new VF

type and its ordered position in the chain. This request is sent by invoking a PATCH operation on the URI

identifying the chain resource. The CO handles the request by invoking the optimization algorithm to find

the VF instance to be added to the chain, taking into account how the pre-existing chain has been

deployed. The CO then processes the algorithm output to provide appropriate instructions to VIMs and

WIM for updating the chain accordingly.

As part of the experiment, two service chain updates have been triggered, according to the following

pattern:

SC4bis: NODE-B.dc1, VF1, VF4, VF5, VF9, VF10, NODE-D.dc2 1Mbps

SC5bis: NODE-C.dc1, VF1, VF2, VF3, VF7, VF8, NODE-F.dc2 1Mbps

In both cases, a new VF was added to an existing chain. The corresponding service chain update requests

have been sent directly to the CO endpoint using POSTman.

Fig. 22 shows the CO GUI after the update of an existing service chain. More precisely, for this example,

preexisting chain is that depicted in Fig. 18 (SC4 mentioned above) and a request has been generated to

add VF-1 as first VF in the chain (i.e., SC4bis has been triggered as an update to the existing SC4).

Page 18: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig 22. Update of an existing service chain.

In order implement such update, the CO sends appropriate forwarding instructions to affected VIMs and

WIM. In general, WIM is updated whenever the updated chain needs to traverse an Edge cloud data center

that was not involved in the original chain deployment. As far as the affected VIMs are concerned, they

process update requests by re-evaluating the chain and by comparing the new chain against the old one:

parts of the chain that remain unchanged are kept as is, parts of the chain that are no longer needed are

removed, while new parts of the chain are added. The update operations consist in removal and creation of

relevant Ryu flow rules and/or ONOS intents that are performed via their respective REST APIs. Following

this approach, VIMs are always kept up-to-date with requests received by the CO.

In the example of Figs. 18 and 22, only the traffic steering inside DC-1 is updated by adding proper flow

rules and intents based on the actual location of the VF-1 instance that minimizes the latency. The actual

traffic flowing through the VFs after the update of chains is visible in Fig. 21 (e.g., the throughput at VF1

increase from 2Mbps to 3Mbps as a consequence of the chain update).

Service chain path redirection

The adaptation feature offered by the orchestration system with respect to the network status in the SDN

WAN has been tested. In this case, the WIM comes into play by adapting the network paths connecting

Edge cloud domains and underpinning the VF chain path segments with respect to the load status

information of switches/links derived from a selective set of monitoring data (e.g., data throughput).

In the following, we show an example in which after the deployment of a service chain request, a subset of

the switches in the WAN SDN domain become overloaded which triggers the dynamic adaptation

capability, thus redirecting the traffic through other available switches.

Page 19: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 23 plots the status of the OvS switches in the WAN SDN domain before the deployment of any service

chain requests. We notice that the overall throughput of every switch is equal to zero since no traffic is

traversing the network.

Fig. 23: SDN domain before service deployment

Fig. 24: SDN domain after service deployment: before and after adaptation

Once the service-chain path setup is performed, traffic starts flowing from the source to the destination by

traversing the VFs and the transit domain connecting the various Edge cloud domains. The Statistics

Collector engine of the WIM periodically collects OF statistics and processes them to obtain the throughput

information on per-switch basis. Such mechanism allows adapting the active service chain paths while

recovering from service degradations due to switch congestions. In fact, a high throughput will cause the

buffer to overflow causing a loss of packets not only for the affected flow but also for the other flows

Page 20: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

traversing the switch. In Fig. 24, we show that the path was initially setup through switches 1, 2 and 5.

Switches 2 and 5 are traversed by other flows which makes them exceed the throughput threshold and

result as overloaded switches. At this point, the WIM deletes every active service chain path traversing that

switches and redirects them through other switches, if available. In the figure, we can see that some of the

flows have been redirected through the switch 3 which confirms the load-balancing characteristic of the

WIM.

B.2.4 Measurements

Some preliminary measurements have been carried out while the experiment was being executed.

Table 2 reports the average value and standard deviation of the response time of the SDN controllers used

within the Edge cloud domains. These values have been collected from all ONOS and Ryu instances present

in the three data centers, when the sample service function chains discussed above were instantiated

(ADD) or removed (DELETE) through their respective NBIs. In the Ryu case, the response time measures the

time required by the controller to install OpenFlow matching rules in the OpenStack internal SDN switches.

In the ONOS case, the response time measures the time required by the controller to install relevant

intents in its core modules (an operation that is decoupled from the actual installation of OpenFlow

matching rules in the inter-DC infrastructure switch). While Ryu’s NBI response time is equally fast when

adding or deleting flows, ONOS’ NBI shows a smaller (larger) response time when intents are added

(deleted).

Table 2 - Response time of the Edge cloud SDN controllers NBI (average and stdev)

ADD DELETE

ONOS 2.17 (+/- 1.55) ms 14.41 (+/- 2.45) ms

Ryu 5.09 (+/- 0.84) ms 5.04 (+/- 0.46) ms

Table 3 reports the average value and standard deviation of the time required by the WAN SDN controller

to setup a path for a service chain, and the time required to perform redirection in case of switches

overload. It is worth noting that the time required to setup a path upon request reception by the WIM is

(generally) influenced by the length of the chain being established, e.g.: setup time is around 10 seconds

when chain is spread across 2 DCs, and around 25s/30s for chains spreading across 3 DCs.

Table 3 - Setup and redirection time of the WAN SDN controller (average and stdev)

ADD REDIRECTION

17 (+/- 3) s 6 (+/-120ms) s

Hereafter we provide performance metrics measured regarding the CO operation. We considered the

following metrics:

Request handling time: the time measured at CO side elapsing from the reception of a request and the

computation of a solution (right before sending the forwarding instructions to VIMs and WIM)

Page 21: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Overall response time: the time measured at CO side elapsing from the reception of a request and the

delivery of a response to the client. In case of successful request, this time includes the time needed for

sending the forwarding instructions to VIMs and WIM and receiving their reply.

Computation time: the time needed by the VNF Selection algorithm for solving the ILP optimization

problem.

Table 4 reports the average value and standard deviation of the CO performance metrics. The computation

time accounts for service chain creation as well as chain update. It is evident how the delay introduced by

the CO is low with respect to the delay required by the delivery of the forwarding instructions and chain

installation. This is mainly due to the additional latency introduced by the multiple REST interfaces (VIMs,

WIM, controllers) that must be called during the overall process of service chain setup, both at the

orchestration and SDN control plane level. It is worth noticing that forwarding instructions delivered to

VIMs are sent in parallel, while the message to WIM is sent after VIMs have replied.

Table 4. CO performance for service chaining creation and update requests

Metrics Mean (ms) Std deviation (ms)

Request handling time 17,28 12,16

Overall response time 17213,78 7937,89

Computation time 3,27 0,43

Additional experiments are planned in the next weeks to consolidate experiment results in order to finalize

submission of scientific papers for dissemination purposes in the international scientific community.

As part of a side-activity assessment, Fig. 25 shows the capacity of the Virtual Wall testbed used to perform

the LASH-5G experiment. More specifically, the histogram shows the average TCP throughput measured

with iperf3 when the generated traffic traverses service chains with an increasing number of VF instances.

Chains no. 1 to 4 include 1 to 4 VF instances, all running in DC-1: the average throughput is very close to the

maximum value achievable with the 1 Gbps physical interfaces installed in the Virtual Wall servers. Chains

no. 5 to 7 include an additional VF each, running in DC-3, whereas chains no. 8 to 10 add incremental VFs

running in DC-2, where the destination endpoint is located: in this case the throughput is more than halved,

meaning that traversing three slices introduces some kind of bottleneck effect that requires further

investigation. Finally, chain no. 11 consists of ten VF instances, as chain no. 10, but none of them runs in

dc3: the throughput shows values comparable to the case of chains running in a single slice, confirming that

the bottleneck is somehow related to the presence of DC-3.

Page 22: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

Fig. 25. Average TCP throughput measured in the LASH-5G experimental setup on Virtual Wall testbed for chains with

incremental number of VF instances

The measurements reported above prove that the Virtual Wall facility is able to provide full capacity to a

typical NFV/SDN infrastructure based on OpenStack and Open vSwitch components. This is mainly due to

the possibility offered by Virtual Wall to deploy slices using bare-metal servers, thus avoiding the overhead

of an infrastructure emulated through, e.g., nested virtualization. Therefore, the Fed4FIRE+ facilities (and

Virtual Wall in particular) can be considered a good candidate to perform realistic experiments on non-

trivial NFV/SDN infrastructures based on production-level software tools.

B.2.5 Lessons Learned

The lessons learned from LASH-5G can be summarized as follows.

Firstly, LASH-5G allowed us to get hand-on practice about the OpenStack cloud platform and different SDN

controllers (i.e., ONOS, Ryu) through a substantial multi-domain SDN/NFV deployment (i.e., 28 virtual

machines spread across 3 cloud domains interconnected through 5-nodes WAN). Indeed, we deployed 3

OpenStack clusters, each running several virtual machines, and each with SDN-based networking

technology to connect compute nodes (i.e., underlying network) as well as virtual machines (i.e., tenant

network), controlled by ONOS and Ryu controllers, respectively. Definitely, this composite cloud

deployment allowed us to develop and fine-tune the VIM software components, especially in terms of

handling underlying heterogeneous network controllers. In addition, we could finely tune the VIM

operation to handle all possible cases of needed configurations while enforcing dynamic service chaining

rules when multiple cloud domains are involved (e.g., correctly deploying and updating service chains while

handling different combinations of VF instances and service endpoints located in different OpenStack

nodes and clusters) .

As regards the orchestration layer handled by the CO, we had the chance of validating the interworking of

the CO with a composite cloud deployment including an inter-DC WAN controller. This allowed us to

measure and experience communication latency in realistic network environments and also required the

design and implementation of roll-back mechanism to cope with possible communication faults (typical in

distributed transaction schemes). For instance, a rollback mechanism is needed when, to cope with a

service chain request, some of VIMs except one returns a response. We handle this situation as follows: if

after a timeout the response from the missing VIM has not been received, a delete message is sent to the

Page 23: B.2 Technical Results & Lessons learned · the OpenStack management network, used by compute nodes to communicate with the controller; the OpenStack data network, for traffic exchanged

other VIMs and a fault response is returned to the client (of course more complex recovery strategies can

be put in place). We also had the chance of testing the VNF Selection algorithm included in the CO

implementation on top of a dynamic view of underlying virtual resources and network topology. This

topology view is periodically fed by polling the VIM and WIM monitoring services for retrieving available

end nodes, VF instances and related processing latency, inter-DC latency measurements. VF instance

processing latency has been added as a custom metric estimated by VF instance implementations and

posted on the Gnocchi database in the OpenStack deployment. Inter-DC latency measurements have been

provided by the WAN controller through an ad-hoc API. According to the performance metrics measured

with the experiments we learnt that the computation time of the algorithm is very low with respect to the

time elapsed for forwarding instructions delivery and chain installation. This is also due to the fact that the

algorithm has been conceived for working on an abstract topology. Considering the scale of the deployed

experiment, it would be feasible to centralize additional orchestration features at the level of CO, if needed

(e.g. intra-DC policies), without compromising latency performance.

Moreover, as for the WAN domain interconnecting the above mentioned clouds we experimented the

actual performance of the WAN orchestrator when deployed in real networks (although virtualized as the

one offered by Fed4FIRE) in contrast with emulated environments. We carried out most of the experiments

with 5 nodes WAN. Nevertheless, we also prepared a WAN set-up with an incremental number of nodes,

up to 12. By deploying an ONOS controller in 12 nodes network environment we experienced a controller

overload after a certain amount of time (around 45 minutes). From this experience, we learned about the

relevant memory consumption to control such a network and we could also derive some helpful feedback

to give to Fed4FIRE administrators. Indeed, based on our experience, the Fed4FIRE platform should provide

“physical nodes” with higher capacity, especially in terms of RAM, to host SDN controllers and applications

and to cope with the dynamicity and the large amount of data to process in relatively large networks.

Another lesson we learned comes from the tests on VF chain path redirections. Prior to LASH-5G we carried

out tests mainly using emulated SDN network environments (i.e., Mininet) where redirections of paths

could be carried out in much less time (even one third of time) and without loss of packets. Instead, using

real network environments (although virtual) we realized that much time is needed due to the

accomplishment of all changes to the configuration rules into the real switches. In addition, we realized

that there is loss of packets due to the need to first delete the old configuration rules before setting the

new ones, so there is a time frame in which the traffic is discarded by switches.

In general, we learned how to use virtual testbeds and their federation to carry out experiments using

scales that are generally larger than experiments carried out in a university laboratory. Thank to this

opportunity we could test the feasibility of our approach and, in particular, the performance our

orchestration system on a large scale and also leveraging real measurements of latency. We also performed

tests about the data throughput performance along the VF chains, and learned how the throughput can

reach optimal levels even when traffic crosses up to ten VFs, unless multiple slices are traversed. This is

clearly due to some sort of bottleneck effect present in the testbed infrastructure (to be further

investigated), which we could not be aware of if using either emulated environments (e.g., Mininet) or

ordinary validation tests.