Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
B.2 Technical Results & Lessons learned
B.2.1 LASH-5G system deployment
The details of the LASH-5G orchestration system deployment on the Fed4FIRE+ platform are depicted in
Fig. 2. The setup consists of five slices instantiated on the Virtual Wall testbed, including a slice with the
Chain Optimizer, a slice with the WAN SDN domain, three slices with the SDN-based Edge cloud domains
(named DC-1, DC-2 and DC-3). The established Edge cloud slices and the WAN slice interact at the data
plane level by exchanged packet data traffic and at the orchestration plane level by exchanging control
messages between Chain Optimizer, WIM and VIMs.
Each Edge cloud slice is connected to the WAN slice at the data plane level by means of VXLAN tunnels
established on top of the Virtual Wall management network. In particular, each VXLAN tunnel is established
between the node representing the egress router of an Edge cloud slice and one of the node in the WAN
slice. The VXLAN virtual tunnel endpoint (VTEP) located at the egress router appears as an IP-routable
interface of the router itself, whereas the corresponding VTEP located at the WAN slice node is bridged to
an Open vSwitch (OvS) instance running in the same node. This particular setup allows making the WAN
slice act as a layer-2 infrastructure connecting the three Edge cloud slices. Because each WAN slice node is
running an OvS instance, the slice itself can be programmed by an SDN controller.
The Chain Optimizer slice exchanges messages with the other slices at the orchestration plane level via the
Virtual Wall management network. In particular, the Chain Optimizer sends service function chaining CRUD
(Create, Read, Update, Delete) requests to the relevant VIMs and WIM through their intent-based
northbound interface. Interactions at the network control plane level do not take place between different
slices. This is in line with the LASH-5G architecture, where each domain is supposed to adopt its own SDN
control plane solution independently of the choice made by other domains.
Fig. 2: Deployment of the LASH-5G experiment on the Fed4FIRE+ testbed
The following subsections provide further details of each slice/domain deployed on the Fed4FIRE testbed and its operation.
B.2.1.1 WAN SDN Slice
The WAN SDN slice, shown in Fig. 3, consists of three main components: the SDN network, the WAN SDN
controller and the WIM. All these components have been deployed within a unique experiment slice where
one physical node has been allocated for the SDN controller and the WIM, and 5 other physical nodes were
allocated for the SDN topology. All the nodes run a Ubuntu 16.04 distribution. Specifically, we have
installed the Open vSwitch (OvS) software on the physical nodes composing the topology in order to
emulate OpenFlow switches. OvS is a multilayer virtual switch designed to enable massive network
automation through programmatic extension, while still supporting standard management interfaces and
protocols. Three of the OpenFlow switches are connected, through VXLAN tunnels, to the Edge cloud
domains gateways. The OvS instances connect to the SDN controller, which is an instance of the ONOS
controller that has been downloaded and installed on the 6th physical node of the slice. The ONOS version
is Junco (1.9.0).
Fig. 3: Deployment of the WAN SDN island
The WIM is an orchestration software component, written in Java, that allows for the programmable
provision of data delivery paths connecting DCs where VFs are deployed [9]. The set-up and the teardown
of such delivery paths can be triggered through an intent-based API exposed at Northbound that allows
upper-layer components (i.e., Chain Optimizer) to use an application-oriented semantic rather than dealing
with technology-specific low-level network details.
Hence, the set-up of delivery paths in the WAN to connect VFs is triggered by the Chain Optimizer to the
WIM by specifying the source DC and destination DC to connect the respective source VF/node and
destination VF/node in the chain. Then, the WIM derives Edge cloud domains gateways to be connected
and performs mapping operations by identifying the network path and, accordingly, enforces the
forwarding rules to the switches along the path.
Moreover, the WIM offers adaptation capabilities for the established paths to recover from congestion
events (e.g., service outages or degradation events) that are possible when a concurrent resource usage
takes place. According to IETF SFC guidelines where load status control is expected [31] if a (risk of)
degradation is detected, the WIM performs the redirection of the delivery path or a segment of a delivery
path, with an overall load balancing beneficial effect [24]. More specifically, the WIM periodically monitors
the switches load status and retrieves, from the SDN controller, the amount of bytes received/transmitted
at the switches and then calculates their throughput by dividing this amount by the duration of the polling
interval. Finally, the WIM is also responsible of the collection of network latency information in order to
retrieve the inter-DCs delays. Those delays are then made available to the Chain Optimizer to enhance its
resource orchestration capabilities by computing a minimum-latency service graph.
B.2.1.2 Edge Cloud Slices
To deploy the Edge cloud domains, three different slices were set-up on Virtual Wall. This choice allowed us
to run, deploy, and test each Edge cloud domain incrementally and independently of the other ones, thus
saving configuration and setup time with respect to the alternative option of deploying a single slice
including all three domains.
Fig. 4: Deployment of one of the Edge cloud slices
As shown in Fig. 4 for the case of DC-1, each Edge cloud slice includes an independent OpenStack cluster (Pike version) and consists of the following nodes, running a Linux Ubuntu 16.04 operating system:
an OpenStack controller node, where all required OpenStack services are executed and where the
latter expose their REST API endpoints;
two or three OpenStack compute nodes, where virtual machine instances are running over a
QEMU-KVM hypervisor; the controller node acts as one of the compute nodes;
an OpenStack network node, providing external network connectivity to the virtual instances; the
controller node acts also as the network node;
a node connected to the aforementioned OpenStack nodes and running an instance of OvS,
representing the SDN infrastructure of the Edge cloud data centre;
a node acting as the egress router of the Edge cloud slice, connected to the SDN WAN slice through
VXLAN tunnels;
a virtual node used for testing purposes.
In addition, three network segments are present within each slice implementing:
the OpenStack management network, used by compute nodes to communicate with the controller;
the OpenStack data network, for traffic exchanged by the virtual instances;
an external network, connecting the virtual routers instantiated in the OpenStack network node to
the slice egress router.
The enhanced VIM runs on one of the nodes, typically on the OvS node interconnecting the OpenStack
cluster, although this choice is not mandatory. An instance of the ONOS controller, acting as the SDN
controller of the Edge cloud data center network infrastructure, is also executed on the OvS node. Finally,
each Edge cloud slice is provided with public IP addresses in order to have easy access to the OpenStack
and ONOS dashboards.
The OpenStack cluster running in each slice exposes the essential services and related APIs, including:
compute and placement (Nova), identity (Keystone), image (Glance), and network (Neutron). In addition,
the metric service (Gnocchi) and APIs are enabled in order to collect processing latency measurements
periodically reported by the VF instances. The collected data are then queried by the Chain Optimizer.
The Neutron service running in each compute node was configured with the OvS plugin, in order to add
SDN functionality to the virtual bridges internal to OpenStack nodes. This approach is also facilitated by an
innovation introduced in the most recent versions of Neutron, which takes advantage of an instance of the
Ryu SDN controller running in each compute node to control the internal forwarding mechanisms [32], as
shown in Fig. 5.
Fig. 5: The presence of Ryu controller inside OpenStack nodes enables native SDN capabilities [32]
In order to achieve the traffic steering capabilities inside each Edge cloud slice, as required by the LASH-5G
architecture, the enhanced VIM has to install appropriate OpenFlow rules to both the OvS node
representing the data center network infrastructure of the Edge cloud data center, and the virtual switches
used internally by the OpenStack compute nodes, as shown by the dashed lines in Fig.4. To this purpose,
we took advantage of the ONOS intent-based REST API and the Ryu OpenFlow REST API, after enabling the
latter in each Neutron instance. The OpenFlow rules to be installed on the OvS node depend on the specific
virtual instance network architecture configured in Neutron. We decided to adopt a “flat network”
architecture, so that the OvS node is able to natively see (and properly steer) the traffic exchanged by the
instances running in the compute nodes. Although this solution actually breaks the tenant traffic isolation
feature offered by OpenStack, we were forced to make this choice because: (i) to achieve better traffic
control, we wanted to avoid tunneling solutions such as GRE and VXLAN; (ii) using VLANs for that purpose
does not work on the Virtual Wall test bed, probably because the underlying physical network
infrastructure uses VLANs for slice isolation and does not allow nested VLAN tagging (Q-in-Q).
The enhanced VIM is an orchestration software component, written in Python, that exposes an intent-
based northbound REST interface, allowing specifying a service chain by means of a high-level descriptive
syntax, agnostic to the specific SDN technology adopted. The details of the VIM design and intent-based
northbound interface specification can be found in [30].
B.2.1.3 Chain Optimizer
The Chain Optimizer (CO) has been deployed in an isolated experiment slice on a physical node on Virtual
Wall. The node runs a Ubuntu 16.04 distribution. It communicates with the other slices (WIM and VIMs) via
the management network. Indeed, the Chain Optimizer handles service function chaining requests and
orchestrates virtual infrastructure managers (WIM/VIMs) for enforcing traffic flows to be steered along the
selected VF instances. An example of interworking between the Chain Optimizer and the VIMs and WIM is
shown in the Fig. 7.
Fig. 6: Example of Service chain request
More specifically, with reference to the example of VFs chaining shown in Fig.6, the CO is expected to
receive requests specified as follows:
{"serviceType" : "Type1 ",
"maxLatency " : 100,
"source" : "Node-A",
"destination" : "Node-B",
"vfChain" : "VF-1,VF-2"}
In this example, the request means that a traffic flow identified by the given serviceType, source and
destination, should be processed by a chain of VFs of type VF-1, VF-2. The CO handles this request by first
selecting the instances of VF-1, VF-2 available in a distributed multi-DC environment (e.g. VF-1 instance in
Domain 1 and VF-2 instance in Domain 2) and then sends appropriate forwarding instructions to concerned
VIMs and WIM so that the flow is actually steered through these instances (see Fig. 7).
Fig. 7: Example of interworking between Chain Optimizer and VIM/WIM
The CO architecture is depicted in Fig. 8. It contains the following components:
· REST API: CRUD operations on service chains are exposed through REST APIs.
· Controller: It is in charge of handling received requests for creating, retrieving, updating or deleting
a service chain. It orchestrates the interaction with the Wrapper, Monitoring, Forwarding
Instruction Dispatcher and Storage blocks. CRUD operations on service chains are exposed as REST
APIs.
· Wrapper: It invokes a VNF selection algorithm that, leveraging an algorithm presented in [22],
selects VF instances available from different clouds over the path that minimizes the end-to-end
latency considering both processing delays and network delays information. The implemented
optimization model selects the nodes (i.e. DCs) that provision the VFs over the path that
minimizes the overall latency (i.e. network and processing latency). The optimization problem has
been formulated as a Resource Constrained Shortest Path problem on an auxiliary layered graph
properly defined. The layered structure of the graph ensures that the order of VFs specified in the
request is preserved. Additional constraints (e.g., maximum allowed network latency on the whole
path or between two specific VFs) can be taken into account and properly enforced during the
graph construction phase. The algorithm should receive as input the service chaining request as
well as an up-to-date view of the underlying infrastructure topology. As output, the algorithm will
provide a solution or a not feasible solution reply. If a solution is given, an estimated end-to-end
latency is also provided. If this value exceeds the maximum latency value provided in the request,
the request is not accepted and an error message is returned to the client.
Fig. 8: Chain Optimizer architecture
· Monitoring: this component periodically interacts with VIMs and WIM monitoring API to collect
measurements needed to maintain an up-to-date view of the underlying infrastructure topology.
Collected measurements include: inter-DC latencies, types and instances of VF deployed at each
DCs and related processing latency.
· Dispatcher. This component handles the interactions with VIMs and WIMs in order to enforce
operations on a target service chain. For instance creating a service chain will imply sending
forwarding instructions to affected VIMs and WIMs, while deletion will imply sending delete
operations. The CO leverages Northbound REST APIs provided by VIMs and WIM to send such
instructions.
· Storage. Relevant data concerning service chains are persisted for online operation (e.g. monitoring
and update operation) as well as for collecting data for statistics (e.g. acceptance ratio,
performance metrics, etc.).
Fig. 9 shows the sequence diagram related to the flow of interactions toward the computation of the
latency-minimized VF chain and its establishment leveraging VIM/WIM. A request generator (acting as a
Client and named Actor) sends a chaining request to the CO endpoint (POST operation on URI). The CO
parses the request, checking the validity of provided information, and invokes the optimization algorithm. If
the algorithm does not find any solution, an error is returned. If a solution is found, a control is performed
to check if the maximum latency constraint provided in the request is fulfilled by the solution provided by
the CO. If the constraint is not satisfied, an error is returned to the client. If the computed latency is lower
than the maximum value, the CO process the solution to generate appropriate instructions to be sent to
affected VIMs and WIM through their intent-based interfaces. If all VIMs and WIM provide an
acknowledgement reply, the CO sends a response to the client.
Fig. 9: Service chain request reception and handling
B.2.2 Experiment deployment and preliminary system validation
After having deployed and individually debugged all the components described in section B.2.1, we
proceeded with the instantiation of VFs and service endpoints to be used for the final experiment. This
phase required the placement of VFs across the three Edge cloud data centers. Since the placement
problem is out of the scope of the LASH-5G experiment, we used the predefined placement shown in Table
1, which assumes that some VFs (VF-1 to VF-4 and VF-8 to VF-10) are available only in DC-1 and DC-2,
respectively, whereas VF-5 to VF-7 are available on each data center. The table also shows how many
instances of each VF type have been activated in each data center.
Table. 1: Placement of the VF instances on the three Edge cloud data centers.
The correct placement of service endpoints and VF instances as per Table. 1 is proved by the following
screenshots taken from the OpenStack dashboards on each data centers (see Fig. 10 – Fig. 12).
Fig. 11: DC-1 OpenStack dashboard
Fig. 12: DC-2 OpenStack dashboard
Fig. 13: DC-3 OpenStack dashboard
In order to carry out the experiment activities, besides the jFed software needed to define and deploy slices
on the Fed4FIRE+ testbeds, we used the following tools:
- POSTman: it is a widely adopted tool for REST APIs development and testing
- CO GUI: this is a web-based GUI implemented for LASH-5G the CO for visualizing the inter-DC
topology at the level of abstraction handled by the CO. It visualizes the last created chain and
collects some statistics (i.e. number of accepted/failed request, computation time, etc.)
- CO Client GUI: it is a web-based GUI implemented for LASH-5G for easing the generation and
delivery of service chain creation requests and their basic management (e.g., deletion).
- iperf3: a widely used packet generator for testing the actual traffic steering across the chains;
- a number of customized bash and python scripts developed to automate experiment operations.
Moreover, in order to emulate VF instances whose processing latency varies with traffic load, we
implemented a Java application that, according to a configurable processing capacity value, computes a
processing latency value as a function of the processing capacity and the traffic input rate measured at the
network interface. This measurement is periodically posted on an ad-hoc defined metric maintained by the
Gnocchi database in the OpenStack deployment.
Fig. 13 provides a graphical representation (as provided by a CO GUI) of the topology at the level of
abstraction handled by the CO. The VF placement is correctly discovered by the Chain Optimizer that also
periodically receives monitoring information to build and maintain and update view on the multi-DC NFVI
topology. For each VF type, processing latency offered by different DCs is displayed, as well as minimal
latency between each pair of DCs. Please notice that the topology maintained by the CO is a logical view on
top of the topologies handled by the WIM and VIMs and some details (instances of VFs and WAN links) are
not handled.
Fig. 13: Topology view maintained by the Chain Optimizer, shown through the Chain Optimizer GUI
The CO has been tested against its capability to find the optimal chaining given the established VF instance
placement and offered values of network and processing latency. Consider the abstract chain reported
below and also graphically represented in Fig. 14:
{"serviceType" : "Type5" ,
"maxLatency" : 300 ,
" source " : "Node-B.dc1" ,
" destination " : "Node-E.dc2" ,
" vfChain " : "VF-1,VF-2,VF-6,VF-9,VF-10"}
Fig. 14 Service chain request (i.e., abstract chain) processed by the Chain Optimizer:
Fig. 15 depicts the solution computed by the CO(as visualized by the CO GUI) for such a service chain
request, i.e VF-1 and VF-2 in DC1, VF-6 in DC3 and VF-9b, VF-10 in DC-2).
Fig. 15: The Chain Optimizer GUI showing computed solution computed for the considered service chain request.
Finally, Fig. 16 shows the configuration messages sent to the VIMs/WIM to actually deploy the service chain
(according to the computed solution) across the DC-1, DC-2, and DC-3 interconnected through the WAN.
Fig. 16: The solution computed by the Chain Optimizer and the configuration messages sent to the VIMs/WIM
B.2.3. Experiment Workflow
The following two experiment steps have been performed: latency-optimized VF chains establishment and
service chain adaptation.
1. Latency-optimized VF chains establishment.
This experiment step aims at evaluating the capability of the Chain Optimizer to process service chain
requests, to compute the latency-optimized VF chains by correctly elaborating monitoring data on
processing and network latency, and to set-up VF chains across network and cloud domains by properly
interacting with the underlying enhanced VIM and WIM.
To this purpose, we used the CO Client GUI that allows for the service chain creation and deletion request
generation as well as their delivery to the CO. In particular, five different service chain requests with their
respective requirements have been generated and sent to the CO through the CO Client GUI, according to
the following sequence:
SC1: NODE-A.dc1, VF1, VF2, VF8, NODE-D.dc2 1Mbps 500ms
SC2: NODE-B.dc1, VF3, VF6, VF9, VF10, NODE-E.dc2 1Mbps 500ms
SC3: NODE-A.dc1, VF4, VF7, VF10, NODE-E.dc2 1Mbps 500ms
SC4: NODE-B.dc1, VF4, VF5, VF9, VF10, NODE-D.dc2 1Mbps
SC5: NODE-C.dc1, VF1, VF2, VF7, VF8, NODE-F.dc2 1Mbps
Fig. 17 shows an example of a manually triggered chain request at the time of SC4 creation.
After the service chain deployment, the CO Client GUI allows also to delete by clicking on the ‘remove’
button.
FIg. 17 creation and delivery of a service chaining request
As described above, the CO handles the request, computes a latency-optimized solution and sends the
corresponding forwarding instructions to affected VIMs and WIMs. If this procedure is successful, a
response is returned to the CO client GUI with the solution (set of DCs where the chain has been deployed),
and the computed end-to-end latency and computation time needed by the algorithm for solving the
optimization problem (see Fig. 17 - light green box). Fig 18 shows the CO GUI visualizing the new service
chain (i.e., SC4).
Fig. 18: The Chain Optimizer GUI shows the deployment of a service chain specified in the previous figure across DC-1,
DC-2, and DC-3 interconnected through the WAN.
The CO persists all the established service chains in the DB together with related measurements (e.g.,
response time, computation time, etc.). The deletion operation on a service chain is handled by the CO by
sending delete requests to the concerned VIMs and WIM and updating the status of request in the DB.
At the VIMs/WIM side, the service chain set-up requests result in a proper set of configuration actions to be
enforced into the internal nodes according to the configuration messages with the forwarding instructions
sent by the CO.
More specifically, as soon as the VIMs involved in the service chain deployment receive the request, it is
their responsibility to discover where the specified VF instances are located in order to properly interact
with the relevant SDN controller(s). In particular, traffic steering within a given OpenStack node is
controlled by Ryu, whereas flows crossing different OpenStack nodes are managed by ONOS. VF discovery
allows also to gather all the information needed to compose either Ryu flow messages or ONOS intent
messages; these messages are essential in order to install proper Ryu flow rule(s) and/or ONOS intent(s) via
the controllers’ REST API interaction. Once service chains have been established, traffic starts flowing along
the whole path from source to destination, traversing VF instances of the type specified in the request.
On the other hand, as requests arrives to the WIM, the network path is computed that connects the
specified DCs across the WAN and forwarding rules established in the involved switches, accordingly.
Fig. 19 SDN Network before service request deployment
Fig. 19 shows the status of the SDN network before SC1-SC5 requests are sent. All the switches are
unloaded since no traffic is traversing the network. Once a service chain request arrives to the WIM, it
selects the best path in terms of network latency and switches availability and then interacts with the SDN
controller in order to setup the relevant flow entries. As soon as the switches are configured and the chain
is established in the Edge cloud domains, the traffic starts flowing across the network and we notice an
increase of the overall throughput of the involved switches, as plotted in Fig. 20.
Fig. 20 SDN Network after service request deployment
The experiment demonstrated the correct deployment of the service chains across all the involved domains
(Edge cloud and SDN WAN through VIMs and WIM respectively). As a proof of the correct chaining, Fig. 21
shows the sequential time diagram of the throughput measured at some of the VF instances involved in the
aforementioned chains, after some UDP-based iperf3 flows were generated.
Fig. 21 Sequential time diagram of the throughput measured at some VF instances while the service chains are
deployed
2. service chain adaptation.
This step aims at evaluating the adaptive capability of the orchestration system in dynamically adjusting
established service (i.e., VF) chains with respect to the current service and network contexts (e.g.,
occurrence of SLA violations, switch/link congestions), based on the processing of a selective set of
monitoring data (e.g., data throughput at switches/links). Adaptations to the service context may involve
re-optimization operations to be performed by the Chain Optimizer. Hereafter we describe adaptation in
terms of: a) Service chain update and b) Service chain path redirection.
Service chain update
As for adaptations of VF chains with respect to the service context, this may include the update of the chain
to cope with SLA violations or demand change and consequent re-optimization operations to be performed
by the Chain Optimizer. More specifically, re-optimization may consists in updating the established service
chain by adding one or more new VFs in the chain, while keeping the rest of the chain unchanged. This
would imply to change only a part of the service function path to include the newly specified VF.
In order to trigger a chain update, a CO client sends a request for updating an already existing chain by
specifying the source and destination nodes and serviceType (identifying an existing chain), the new VF
type and its ordered position in the chain. This request is sent by invoking a PATCH operation on the URI
identifying the chain resource. The CO handles the request by invoking the optimization algorithm to find
the VF instance to be added to the chain, taking into account how the pre-existing chain has been
deployed. The CO then processes the algorithm output to provide appropriate instructions to VIMs and
WIM for updating the chain accordingly.
As part of the experiment, two service chain updates have been triggered, according to the following
pattern:
SC4bis: NODE-B.dc1, VF1, VF4, VF5, VF9, VF10, NODE-D.dc2 1Mbps
SC5bis: NODE-C.dc1, VF1, VF2, VF3, VF7, VF8, NODE-F.dc2 1Mbps
In both cases, a new VF was added to an existing chain. The corresponding service chain update requests
have been sent directly to the CO endpoint using POSTman.
Fig. 22 shows the CO GUI after the update of an existing service chain. More precisely, for this example,
preexisting chain is that depicted in Fig. 18 (SC4 mentioned above) and a request has been generated to
add VF-1 as first VF in the chain (i.e., SC4bis has been triggered as an update to the existing SC4).
Fig 22. Update of an existing service chain.
In order implement such update, the CO sends appropriate forwarding instructions to affected VIMs and
WIM. In general, WIM is updated whenever the updated chain needs to traverse an Edge cloud data center
that was not involved in the original chain deployment. As far as the affected VIMs are concerned, they
process update requests by re-evaluating the chain and by comparing the new chain against the old one:
parts of the chain that remain unchanged are kept as is, parts of the chain that are no longer needed are
removed, while new parts of the chain are added. The update operations consist in removal and creation of
relevant Ryu flow rules and/or ONOS intents that are performed via their respective REST APIs. Following
this approach, VIMs are always kept up-to-date with requests received by the CO.
In the example of Figs. 18 and 22, only the traffic steering inside DC-1 is updated by adding proper flow
rules and intents based on the actual location of the VF-1 instance that minimizes the latency. The actual
traffic flowing through the VFs after the update of chains is visible in Fig. 21 (e.g., the throughput at VF1
increase from 2Mbps to 3Mbps as a consequence of the chain update).
Service chain path redirection
The adaptation feature offered by the orchestration system with respect to the network status in the SDN
WAN has been tested. In this case, the WIM comes into play by adapting the network paths connecting
Edge cloud domains and underpinning the VF chain path segments with respect to the load status
information of switches/links derived from a selective set of monitoring data (e.g., data throughput).
In the following, we show an example in which after the deployment of a service chain request, a subset of
the switches in the WAN SDN domain become overloaded which triggers the dynamic adaptation
capability, thus redirecting the traffic through other available switches.
Fig. 23 plots the status of the OvS switches in the WAN SDN domain before the deployment of any service
chain requests. We notice that the overall throughput of every switch is equal to zero since no traffic is
traversing the network.
Fig. 23: SDN domain before service deployment
Fig. 24: SDN domain after service deployment: before and after adaptation
Once the service-chain path setup is performed, traffic starts flowing from the source to the destination by
traversing the VFs and the transit domain connecting the various Edge cloud domains. The Statistics
Collector engine of the WIM periodically collects OF statistics and processes them to obtain the throughput
information on per-switch basis. Such mechanism allows adapting the active service chain paths while
recovering from service degradations due to switch congestions. In fact, a high throughput will cause the
buffer to overflow causing a loss of packets not only for the affected flow but also for the other flows
traversing the switch. In Fig. 24, we show that the path was initially setup through switches 1, 2 and 5.
Switches 2 and 5 are traversed by other flows which makes them exceed the throughput threshold and
result as overloaded switches. At this point, the WIM deletes every active service chain path traversing that
switches and redirects them through other switches, if available. In the figure, we can see that some of the
flows have been redirected through the switch 3 which confirms the load-balancing characteristic of the
WIM.
B.2.4 Measurements
Some preliminary measurements have been carried out while the experiment was being executed.
Table 2 reports the average value and standard deviation of the response time of the SDN controllers used
within the Edge cloud domains. These values have been collected from all ONOS and Ryu instances present
in the three data centers, when the sample service function chains discussed above were instantiated
(ADD) or removed (DELETE) through their respective NBIs. In the Ryu case, the response time measures the
time required by the controller to install OpenFlow matching rules in the OpenStack internal SDN switches.
In the ONOS case, the response time measures the time required by the controller to install relevant
intents in its core modules (an operation that is decoupled from the actual installation of OpenFlow
matching rules in the inter-DC infrastructure switch). While Ryu’s NBI response time is equally fast when
adding or deleting flows, ONOS’ NBI shows a smaller (larger) response time when intents are added
(deleted).
Table 2 - Response time of the Edge cloud SDN controllers NBI (average and stdev)
ADD DELETE
ONOS 2.17 (+/- 1.55) ms 14.41 (+/- 2.45) ms
Ryu 5.09 (+/- 0.84) ms 5.04 (+/- 0.46) ms
Table 3 reports the average value and standard deviation of the time required by the WAN SDN controller
to setup a path for a service chain, and the time required to perform redirection in case of switches
overload. It is worth noting that the time required to setup a path upon request reception by the WIM is
(generally) influenced by the length of the chain being established, e.g.: setup time is around 10 seconds
when chain is spread across 2 DCs, and around 25s/30s for chains spreading across 3 DCs.
Table 3 - Setup and redirection time of the WAN SDN controller (average and stdev)
ADD REDIRECTION
17 (+/- 3) s 6 (+/-120ms) s
Hereafter we provide performance metrics measured regarding the CO operation. We considered the
following metrics:
Request handling time: the time measured at CO side elapsing from the reception of a request and the
computation of a solution (right before sending the forwarding instructions to VIMs and WIM)
Overall response time: the time measured at CO side elapsing from the reception of a request and the
delivery of a response to the client. In case of successful request, this time includes the time needed for
sending the forwarding instructions to VIMs and WIM and receiving their reply.
Computation time: the time needed by the VNF Selection algorithm for solving the ILP optimization
problem.
Table 4 reports the average value and standard deviation of the CO performance metrics. The computation
time accounts for service chain creation as well as chain update. It is evident how the delay introduced by
the CO is low with respect to the delay required by the delivery of the forwarding instructions and chain
installation. This is mainly due to the additional latency introduced by the multiple REST interfaces (VIMs,
WIM, controllers) that must be called during the overall process of service chain setup, both at the
orchestration and SDN control plane level. It is worth noticing that forwarding instructions delivered to
VIMs are sent in parallel, while the message to WIM is sent after VIMs have replied.
Table 4. CO performance for service chaining creation and update requests
Metrics Mean (ms) Std deviation (ms)
Request handling time 17,28 12,16
Overall response time 17213,78 7937,89
Computation time 3,27 0,43
Additional experiments are planned in the next weeks to consolidate experiment results in order to finalize
submission of scientific papers for dissemination purposes in the international scientific community.
As part of a side-activity assessment, Fig. 25 shows the capacity of the Virtual Wall testbed used to perform
the LASH-5G experiment. More specifically, the histogram shows the average TCP throughput measured
with iperf3 when the generated traffic traverses service chains with an increasing number of VF instances.
Chains no. 1 to 4 include 1 to 4 VF instances, all running in DC-1: the average throughput is very close to the
maximum value achievable with the 1 Gbps physical interfaces installed in the Virtual Wall servers. Chains
no. 5 to 7 include an additional VF each, running in DC-3, whereas chains no. 8 to 10 add incremental VFs
running in DC-2, where the destination endpoint is located: in this case the throughput is more than halved,
meaning that traversing three slices introduces some kind of bottleneck effect that requires further
investigation. Finally, chain no. 11 consists of ten VF instances, as chain no. 10, but none of them runs in
dc3: the throughput shows values comparable to the case of chains running in a single slice, confirming that
the bottleneck is somehow related to the presence of DC-3.
Fig. 25. Average TCP throughput measured in the LASH-5G experimental setup on Virtual Wall testbed for chains with
incremental number of VF instances
The measurements reported above prove that the Virtual Wall facility is able to provide full capacity to a
typical NFV/SDN infrastructure based on OpenStack and Open vSwitch components. This is mainly due to
the possibility offered by Virtual Wall to deploy slices using bare-metal servers, thus avoiding the overhead
of an infrastructure emulated through, e.g., nested virtualization. Therefore, the Fed4FIRE+ facilities (and
Virtual Wall in particular) can be considered a good candidate to perform realistic experiments on non-
trivial NFV/SDN infrastructures based on production-level software tools.
B.2.5 Lessons Learned
The lessons learned from LASH-5G can be summarized as follows.
Firstly, LASH-5G allowed us to get hand-on practice about the OpenStack cloud platform and different SDN
controllers (i.e., ONOS, Ryu) through a substantial multi-domain SDN/NFV deployment (i.e., 28 virtual
machines spread across 3 cloud domains interconnected through 5-nodes WAN). Indeed, we deployed 3
OpenStack clusters, each running several virtual machines, and each with SDN-based networking
technology to connect compute nodes (i.e., underlying network) as well as virtual machines (i.e., tenant
network), controlled by ONOS and Ryu controllers, respectively. Definitely, this composite cloud
deployment allowed us to develop and fine-tune the VIM software components, especially in terms of
handling underlying heterogeneous network controllers. In addition, we could finely tune the VIM
operation to handle all possible cases of needed configurations while enforcing dynamic service chaining
rules when multiple cloud domains are involved (e.g., correctly deploying and updating service chains while
handling different combinations of VF instances and service endpoints located in different OpenStack
nodes and clusters) .
As regards the orchestration layer handled by the CO, we had the chance of validating the interworking of
the CO with a composite cloud deployment including an inter-DC WAN controller. This allowed us to
measure and experience communication latency in realistic network environments and also required the
design and implementation of roll-back mechanism to cope with possible communication faults (typical in
distributed transaction schemes). For instance, a rollback mechanism is needed when, to cope with a
service chain request, some of VIMs except one returns a response. We handle this situation as follows: if
after a timeout the response from the missing VIM has not been received, a delete message is sent to the
other VIMs and a fault response is returned to the client (of course more complex recovery strategies can
be put in place). We also had the chance of testing the VNF Selection algorithm included in the CO
implementation on top of a dynamic view of underlying virtual resources and network topology. This
topology view is periodically fed by polling the VIM and WIM monitoring services for retrieving available
end nodes, VF instances and related processing latency, inter-DC latency measurements. VF instance
processing latency has been added as a custom metric estimated by VF instance implementations and
posted on the Gnocchi database in the OpenStack deployment. Inter-DC latency measurements have been
provided by the WAN controller through an ad-hoc API. According to the performance metrics measured
with the experiments we learnt that the computation time of the algorithm is very low with respect to the
time elapsed for forwarding instructions delivery and chain installation. This is also due to the fact that the
algorithm has been conceived for working on an abstract topology. Considering the scale of the deployed
experiment, it would be feasible to centralize additional orchestration features at the level of CO, if needed
(e.g. intra-DC policies), without compromising latency performance.
Moreover, as for the WAN domain interconnecting the above mentioned clouds we experimented the
actual performance of the WAN orchestrator when deployed in real networks (although virtualized as the
one offered by Fed4FIRE) in contrast with emulated environments. We carried out most of the experiments
with 5 nodes WAN. Nevertheless, we also prepared a WAN set-up with an incremental number of nodes,
up to 12. By deploying an ONOS controller in 12 nodes network environment we experienced a controller
overload after a certain amount of time (around 45 minutes). From this experience, we learned about the
relevant memory consumption to control such a network and we could also derive some helpful feedback
to give to Fed4FIRE administrators. Indeed, based on our experience, the Fed4FIRE platform should provide
“physical nodes” with higher capacity, especially in terms of RAM, to host SDN controllers and applications
and to cope with the dynamicity and the large amount of data to process in relatively large networks.
Another lesson we learned comes from the tests on VF chain path redirections. Prior to LASH-5G we carried
out tests mainly using emulated SDN network environments (i.e., Mininet) where redirections of paths
could be carried out in much less time (even one third of time) and without loss of packets. Instead, using
real network environments (although virtual) we realized that much time is needed due to the
accomplishment of all changes to the configuration rules into the real switches. In addition, we realized
that there is loss of packets due to the need to first delete the old configuration rules before setting the
new ones, so there is a time frame in which the traffic is discarded by switches.
In general, we learned how to use virtual testbeds and their federation to carry out experiments using
scales that are generally larger than experiments carried out in a university laboratory. Thank to this
opportunity we could test the feasibility of our approach and, in particular, the performance our
orchestration system on a large scale and also leveraging real measurements of latency. We also performed
tests about the data throughput performance along the VF chains, and learned how the throughput can
reach optimal levels even when traffic crosses up to ten VFs, unless multiple slices are traversed. This is
clearly due to some sort of bottleneck effect present in the testbed infrastructure (to be further
investigated), which we could not be aware of if using either emulated environments (e.g., Mininet) or
ordinary validation tests.