Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Distributed Monitoring and Measurement System for Heterogeneous Networks
Authors/Affiliation:
R. Serral-Gracià1, L. Jakab1, M. Yannuzzi1, X. Masip-Bruin1, J. Domingo-Pascual1, J.
Sliwinski2, A. Beben2, P. Owezarski3, M. Callejo4, J. Enríquez-Gabeiras 4
1Technical University of Catalunya (UPC), Barcelona, Spain
2Warsaw University of Technology (WUT), Warsaw, Poland
3Analysis and Architecture of Systems Lab (LAAS), Toulouse, France
4Telefónica I+D (TID), Madrid, Spain
Keywords: Heterogeneous networks, QoS, monitoring, measurement, network
parameters, inter-domain
Abstract
Service providers’ port-folio is continuously being incremented by emerging real-time
services, which are offered to current and new potential clients with an associated fee.
In order to properly charge such fee, service providers should guarantee that the offered
services are correctly delivered. This is done by monitoring some traffic parameters.
Many end-to-end connections will require monitoring over several domains operated by
different companies, possibly involving different underlying network technologies,
which makes matters very complex.
This paper presents two contributions. On one hand, authors present an approach to
both monitor and measure the QoS offered by the network in an inter-domain
environment. The novelty in this approach is the use of cheap generic purpose
equipment based monitoring tools for achieving such goal. On the other hand authors
also show the validation methodology of this approach system using specific
networking equipment designed for such tasks, with tests performed in a real scenario.
The real scenario used in this paper to validate the proposals is the heterogeneous
network developed within the IST EuQoS project which provides a European wide
testbed.
1. Introduction
Usually when dealing with full traffic monitoring infrastructures (such as that provided
in (CoMo)), powerful specific purpose equipment is required. In other environments the
network load is not as big, or the needed accuracy might be less demanding. This paper
focuses on the specification and deployment of a generic passive networking analysis
system named Network Parameter Acquisition System (NPAS). While its use can be
related to any network monitoring scenario, this work aims at the applicability and
deployment of a system for on-line Quality of Service (QoS) reporting. Specifically, the
NPAS is designed to be applied to a heterogeneous multi-domain scenario supporting
end-to-end QoS capabilities. Its deployment and testing will be issued in the reliable
testbed of the (IST-EuQoS) project, which is spread all over Europe.
For its proper deployment some special requirements need to be met, ranging from high
efficiency to low resources consumption to keep up with the high demands of the
network. To validate this solution a solid verification procedure is performed using
specific purpose equipment for network monitoring.
This paper refers to general purpose equipment, in this context it means commodity
stations with standard networking capabilities. On the other hand, specific purpose
equipment refers to high end equipment using network monitoring cards.
In traffic monitoring platforms such as (Barlet et al. 2006), the real goal is to capture
and evaluate the whole bulk of used bandwidth on a link. In such scenarios, specific
purpose equipment is mandatory. This work differs from these given that in QoS
environments it is not mandatory to capture and analyze all packets on the link. In these
environments computing QoS parameters in a non intrusive way only involves the
flows with guaranteed service, which does not mean to capture all the traffic, moreover,
when reporting QoS information the goal is to indicate the upper bounds permitted for
the offered services, not the specific network parameter values. Hence this work is
centred in the architecture description, its deployment in a real scenario, and the
evaluation and feasibility analysis of the solution. Due to space limitations, issues like
stress testing and capturing limits of the provided solution are left out of the scope of
this paper.
The rest of the paper is structured as follows. The next section discusses the proposed
Network Parameter Acquisition System. After this discussion, the paper addresses the
deployment model of the whole structure. Section 4 introduces the system where the
proposal is applied, i.e., the EuQoS System (IST-EuQoS). After such description the
paper focuses on the validation of the solution. The paper finishes with the conclusions
and further studies related to this work.
2. Network Parameter Acquisition System (NPAS)
This section defines the NPAS as well as its internal structure. NPAS is an infrastructure
to provide structured network parameter information of the network traffic. It is worth
highlighting that the solution proposed as the NPAS is independent of the underlying
network technology as its infrastructure enables the system to: i) extract the QoS
parameters and ii) centralise and store the information gathering.
It is worth highlighting that NPAS supports a hierarchical structure facilitating the inter-
domain extraction and analysis of QoS parameters over heterogeneous networks.
System Architecture
NPAS is divided into two main entities: Monitoring Entities and Processing Entities.
Monitoring Entity (ME): ME is a traffic acquisition interface located in the network
across the different domains. It reports traffic parameter values using generic purpose
equipment; this is done independently of the underlying network technology. The ME’s
goal is to identify the flows that traverse the monitored networks, and to extract on-line
QoS parameters on a per-packet basis. Such parameters are timestamps, and general
packet information available in the packet’s header as will be discussed later.
Processing Entity (PE): PE extracts the minimal required flow information reported by
the ME. Such information will be used for analysing the QoS parameters of the flows in
the selected points of the network.
ME only sends the captured traffic information to the registered PEs. The PEs do not
have to register all the available ME, but only those having relevant traffic information
for the needed analysis.
Figure 1 shows the main building blocks of the NPAS. The MEs have two different
parts, namely, the Parameter Acquisition Module (PAM) and the Filtering Module
(FM). The former is in charge of taking the needed information from the network. The
PAM is dependent on underlying network technology and abstracts such information in
a common generic format.
The acquired information comes from the selection function (F) provided by the FM. F
is flow based and permits to select single flows or aggregates through the Type of
Service. Such selection might be modified by traffic sampling techniques for
performance improvements. This work focuses on basic filtering, leaving improvements
such as traffic sampling for further study.
The MEs are spread out along the monitored network, and their amount of depends on
the desired level of detail. Two is the strict lower bound of MEs, which is the minimum
for computing one-way delays, packet losses and jitter of the traffic between the
different pairs of MEs that lay in the path of the selected flows. The PEs in this
architecture are introduced to provide flexibility to the system. The scenario in Figure 1
shows a simple hierarchy of PEs, which share information with common MEs. This
sharing can be performed at different aggregate levels and facilitates different reporting
granularities. Even in the case of having different destination domains, PEs are able to
detect the flows, as long as there is a suitable ME (or PE) to connect to. This is
accomplished by smart identification of packets and flows.
In the proposed NPAS structure of Figure 1, packet detection and matching is an issue
because of the distributed nature of MEs. The monitoring stations must be able to report
the traffic parameters to the PE, which must match each bit of information with all the
other ME’s. For efficiently reporting the matched packets processing the whole payload
is not an option. The solution to this issue is to use the technique proposed by (Zseby et
al. 2001) which is based on a fast CRC computation. This CRC computation is
performed using strictly the minimum set of data which guarantees proper packet and
flow identification. Packets matching this CRC in various ME are identified as the same
packet. Flows are detected by an identifier taken from the proper fields in the packet’s
header.
With this approach it is only required to process a small portion of the packet.
Moreover, this is robust with packet losses, even when they are dropped along the path
between two different MEs. In such case, the lost packet is reported as lost, but ignored
in the rest of the statistics.
PE2
Filtering Filtering
PAPAME1 MEn
PE1 PEn
Filtering
PA
Filtering
PA
Figure 1.- NPAS structure
Reporting granularity
Centralizing the information in PEs is an issue. Depending on the operator requirements
it can be performed in different ways. i) On a per-domain basis. ii) Among peer
domains. iii) On an end-to-end basis.
Each approach permits to deliver different specific information. In the case of intra-
domain reporting, each operator obtains status information of the internal network
conditions. This can be useful for detecting link failures, bottlenecks and to report
network efficiency.
When dealing with peer domains, the information in heterogeneous QoS systems is
related with Service Level Agreements (SLAs) compliance among peers, as well as a
way to detect misbehaviours with the delivered quality assessed in the contractual
agreement.
Finally, on an end-to-end basis, the information is mostly useful for end users, hence
they can verify the fulfilment of the contracted services. It is also useful for service
providers allowing them to determine whether the required service is properly
delivered.
Each granularity has associated constraints which are tightly related with the amount
and location of the ME. Moreover, their interconnection with the different PE is a
design decision which may limit such reporting.
Control protocol
Until now the discussion has focused on the behaviour of the different parts of the
system. This subsection deals with the different protocols used for the QoS reporting.
There are two different protocols according to the following requirements:
- Reporting Protocol: is used from the ME to the PE for network parameters reporting,
as will be seen later this protocol is also used among PEs.
- Control Protocol: Is performed from the PE to the ME for setting up the monitoring
options. It can also be used between PEs for hierarchical connection of PEs.
The goal of each protocol is different, PEs send the filtering options to MEs, and MEs
are in charge of reporting the network parameters. In order to keep the system as simple
as possible, all the commands are sent uniquely from the PE to the ME. The control
protocol uses the following messages:
- connect: Connection establishment between PE and ME.
- disconnect: Stops the connection between PE and ME.
- filter: Sets the capture filter, it needs a flow or aggregate selection function.
- start: Starts the capture using the specified filter. If there is no filter all the packets will
be captured.
- stop: Stops the capture.
On the other direction, ME encapsulates the required data in a per-packet basis and
these are sent to all the connected MEs. More details about the transmitted data are
provided in the coming sections.
The other required protocol runs between PE, it actually enables the hierarchical
connectivity of PE. The protocol presented here assumes that the monitored network
topology is known, which also enables to have the list of associated PE for each needed
network segment. This issue has already been treated in recent literature (Huffaker et al.
2002).
The control protocol is very similar to the one provided between the PE and the ME.
The main difference is that the network information will reach a PE after being
processed by another PE. This allows reporting when needed, only aggregated results
per class of service. The protocol with the above constraints is the same than that used
for communicating PEs to MEs, but with a new message:
- aggregation: specifies the aggregation level, values can be, per flow, per packet or per
aggregate. With the goal of avoiding excessive inter-domain management traffic, it has
to be possible to specify a refresh rate for controlling the reporting interval.
In the specific case of inter-domain measurements, a control and reporting protocol has
been designed. The objective of Inter-domain Measurement, Control and Reporting
protocol (IMCR) is to collect that information about the QoS level offered on end-to-
end paths, which is based on local measurements performed by each transited domain.
For that purpose in a given domain, the PE periodically advertises to its neighbours the
information about measured values of QoS parameters, such as delay, jitter or packet
losses, that corresponds to the same particular destinations reachable from this domain.
For the destinations that are located inside the domain, each PE advertises the locally
measured QoS values, while for the destinations that are located outside the domain, the
PE advertises aggregated QoS values. The aggregated values are assembled using the
QoS parameters received from the neighbouring PE as well as the local QoS
contributions introduced between the corresponding pairs of border routers.
Figure 2 shows an example illustrating how the IMCR protocol collects information
about the QoS. The QoS is offered in end-to-end paths across three domain scenario,
from domain AS1 towards domain AS3. We assume that each domain allocates one
deployed PE that periodically performs measurements representing the offered QoS
inside the domain (Q1, Q2 and Q3) as well as on adequate inter-domain links (Q1-2, Q2-1,
Q2-3, Q3-2).
Q3 Q2 Q1->2 Q1
Q2->1 Q3->2
PE1 PE2 PE3
IMCR
Q2->3
Aggregated QoS value collected by IMCR protocol equal to: Q3 ⊕ Q2->3 ⊕ Q2 ⊕ Q1->2 ⊕ Q1
IMCR
AS1 AS2 AS3
Figure 2. Example of IMCR operation.
Then, PE3 propagates information about its computed QoS parameters, denoted as Q3,
to PE2. The PE2 checks whether domain AS3 is a peer (which means that it is available
in its BGP routing table) and, in such case advertises aggregated information to PE1
about the offered QoS toward the destinations located in domain AS3. This value takes
into account QoS contributions of domain AS3 and AS2 (Q3, Q2) as well as the inter-
domain link between domain AS2 and AS3 (Q2->3) so it equals to Q3 ⊕ Q2->3 ⊕ Q2. The
operator ⊕ denotes an aggregation function which is specific for each particular QoS
parameter (e.g. a sum for additive QoS constraints such as the one-way delay, a product
for multiplicative QoS constraints such as losses, or the minimum for QoS constraints
such as bandwidth). Finally, following the same scenario in domain AS2, PE1 receives
information of the QoS offered on the path toward domain AS3.
Assuming that each PE periodically advertises information about the offered QoS
toward available destinations, each PE will create a QoS map that shows updated QoS
values offered towards any reachable destination. Therefore, each PE has to maintain a
table (similar to a BGP routing table) that contains available destinations jointly with
the corresponding QoS parameters and associated PE.
The accuracy of the gathered end-to-end QoS values depends on the precision of QoS
aggregation functions as well as on how frequently PEs send updates to their
neighbours. Updates might be triggered for reporting special network conditions when
needed.
3. Deployment Model
This section defines the deployment options along with some important decisions
related with the setup of a NPAS. The presented methodology will be used in later
sections for a real deployment in a real testbed. The proposed solution complies with
the system provided in the previous section.
The reporting granularity and its different options is an important decision when
deploying NPAS, and hence several concerns have to be addressed. One refers to
administrative issues, i.e., sometimes it is not possible to deploy a ME on all the hops
over the traffic path. In this regard, NPAS is designed with such flexibility in mind. The
MEs can be spread in different points with one or more PE to report to. Depending on
this deploying density more detailed information can be extracted.
Once the parameters are gathered, the system delivers them to higher layers. The
specific way how the QoS is actually provided or enforced is out of the scope of this
paper and of NPAS.
Depending on the location of the ME it can be possible to compute from the range of
intra-domain up to end-to-end traffic with the same infrastructure. PE will be in charge
of reporting the desired metrics. In this context, end-to-end is always referred as
network level (from ingress to egress nodes of the whole path). The users experience as
well as applications specific metrics are out of the scope of this paper, as they need to
be handled by higher layer entities.
On-line QoS monitoring system
In this section we discuss three approaches for performing on-line QoS monitoring in a
multi-domain network scenario. The main objective of on-line QoS monitoring is to
provide the updated network information about the offered QoS level. The information
is about the established paths between its own domains and other destination domains.
The information provided by the on-line QoS monitoring function constitutes a base for
(i) providing “certificates” for users about assured QoS level and (ii) OAM (Operation,
Administration and Maintenance) process that triggers fault management actions in the
case when QoS level falls below the expectations.
Designing an “on-line” monitoring system for multi-domain network is a complex task
as shown in (Matray et al. 2006 and in INTERMON 2004). The main problem comes
from the fact that domains are usually under autonomous administration and as a
consequence their operators have their own policies for performing measurements. On
the other hand an on-line monitoring system also requires some co-operation between
domains. The main requirements for the measurement system deployed in a particular
domain are the following:
- The measurements performed by a particular domain should correspond to the same
metric and have to be measured at the same technology layer, usually IP level.
- The measurements should cover the whole path so they have to be performed between
clearly defined hops of the path, e.g. borders of domains where the limits of network
service of a particular domain are.
- The measurement system should be able to interoperate with other systems.
- The measurement system should allow for scalability when networks with large
number of domains are investigated.
The on-line QoS monitoring system may be implemented in different ways. Among
them we distinguish three main categories:
- Centralised solution that assumes a single entity controlling all measurements
performed inside entire network, including several domains.
- Fully distributed solution where each domain has its own control entity responsible for
performing measurements inside domain, while measurements corresponding to end-to-
end paths are collected with the aid of the control and reporting protocol. The base
information for this protocol comes from in results of local measurements performed
independently by each domain.
- Semi-centralised solution where each domain has a control entity responsible for
performing measurements on the path between its domain and any destination domain.
In the following sections we present discussed approaches in more detail.
Centralised on-line monitoring system
This approach assumes that centralised measurement controllers (represented in NPAS
by PE) are responsible for managing of all measurements preformed inside each
domain. It can be extended to any arbitrary pair of domains, see Figure 3. The PE
controls all MEs located in the domain borders as well as on the borders of access and
core networks.
PE
AS1 AS2
ME1b
ME1c
ME1d ME2
a
AS3
ME2d ME1
a ME3a ME3
d
ME2c
ME2b ME3
b ME3
c
- Border router
- ME for inter-domain measurements
- ME for intra-domain measurements
- Controlling the MEs
- Centralised Measurement Controller
- Exemplary possible measurements
PE
All measurement results are collected by the Central Measurement Controller
Figure 3.- Architecture of centralised on-line monitoring system.
The main advantage of this approach is its simplicity of implementation. However,
because of assuming a centralised point of control, this approach is difficult to deploy in
a network consisting of many autonomous domains. Moreover this method is hardly
scalable. As a consequence this approach may be applied for small networks under a
common administration, e.g. in case of trial networks, or dedicated networks. As an
example, such approach was implemented in the ETOMIC project (Matray et al. 2006)
where measurement probes were deployed in different sites that were connected to the
Géant academic network (Geant).
Fully distributed on-line monitoring system
This solution assumes that monitoring of end-to-end paths is performed in a distributed
way as presented in Figure 4.
- Inter PE control and reporting protocol
- Domain Measurement Controller
- Possible measurements
PE1
AS1 AS2
ME1b
ME1c
ME1d ME2
a
AS3
ME2d ME1
a ME3a ME3
d
ME2c
ME2b ME3
b ME3
c
- Border router
- ME for inter-domain measurements
- ME for intra-domain measurements
- Controlling the MEs
PE
PE2 PE3
Measurement results collected by PE1
Measurement results collected by PE2
Measurement results collected by PE3
Figure 4.- Architecture of distributed on-line monitoring system.
For that purpose each domain implements its own domain measurement controller (also
represented by the PE) that is responsible for performing on-line measurements inside
the domain. Those measurements are performed in a continuous way between any pair
of points where the service is offered. In a typical domain, measurements should be
performed between all pairs of border routers. With the goal to measure QoS offered for
traffic crossing the domain, between all pairs of access networks to measure QoS
offered for local traffic, as well as all pairs of access network and border router to
measure QoS offered for traffic originating or terminating in this domain. For obtaining
end-to-end measures, the intra-domain results are periodically advertised to the
neighbouring PE’s. For that purpose, a specialised inter-domain measurement, control
and reporting protocol has been proposed earlier in the paper as IMCR protocol.
The main advantage of a distributed on-line monitoring approach corresponds to the
independency of measurements performed in particular domains. The only requirements
are that the measurement systems deployed in a particular domain should measure the
same QoS metrics and implement the same measurement and control protocol. Due to
the distributed operation, this approach can be easily applied in a multi-domain
network. On the other hand, the main drawback corresponds to the accuracy of end-to-
end results. In fact the results are not directly measured as they are based on the results
measured by local systems. However, by properly designing of the assembling
functions used for calculations of cumulative values of QoS parameters, e.g. as
proposed in (Batalla 2005, Brazio et al. 2006), we may approximate the upper bound
for a particular QoS metric.
Semi-centralised on-line monitoring system
This approach assumes that each domain has its own PE that is responsible for
performing measurements on all paths from its own domain towards any required
destination domain. For instance on Figure 5, the PE1 can control the ME located in all
domains, hence we can measure paths between domains AS1 and AS3, as well as
between AS1 and AS2. Unfortunately, in this approach it is not possible to collect
measurements between AS2 and AS3.
PE1
AS1 AS2
ME1b
ME1c
ME1d ME2
a
AS3
ME2d ME1
a ME3a ME3
d
ME2c
ME2b ME3
b ME3
c
- Border router
- ME for inter-domain measurements
- ME for intra-domain measurements
Measurement results collected by the PE1
PE2 PE3
- Controlling the MPs
- Domain Measurement Controller
- Exemplary possible measurements
PE
Figure 5. Architecture of semi-centralised on-line monitoring system for performing measurements
from AS1.
This approach allows us to get measurements on all end-to-end paths, however it
requires that MEs are controlled remotely by PEs belonging to different domains. This
constitutes the main limitation of semi-centralised approach, as it may be applied only
between domains that trust each other.
Although Figure 5 presents a solution where PE can directly access any ME outside of
its own domain, it is possible to restrict such communication using the appointed PE.
Such approach is investigated by the IST-INTERMON project (INTERMON) where the
function of PE was performed by Global Controller entity that communicates using the
Specification of Monitoring Service interface.
4. The EuQoS System
The term EuQoS stands for “End-to-end Quality of Service support over heterogeneous
networks” (IST-EuQoS). This European research project is building a complete QoS
system aimed at supporting stringent QoS services among heterogeneous access
technologies, both within and between different routing domains. EuQoS customers
will be able to subscribe to this system, for which we are developing a portfolio of
novel QoS mechanisms and protocols that build upon the state of the art. Among these
we have security, admission control, signaling, monitoring measurements and reporting,
QoS Routing (QoSR), fault management, and Traffic Engineering (TE).
The focus of this paper resides on the Monitoring and Measurement System (MMS) and
the Monitoring Measurements and Fault Management (MMFM).
MMS and MMFM
The EuQoS MMS functionality is twofold: to validate of the QoS delivered by the
EuQoS system (in performance trials) and to support other EuQoS functionalities. Such
functionalities include topology acquisition, network resource usage and delivered QoS
levels in order to support traffic engineering.
The MMFM is the EuQoS function in charge of network resource monitoring and
network topology discovering, fault management and QoS monitoring. The Network
Monitor inside the MMFM manages different threads in charge of interfacing with the
different MMS tools in charge of supporting the EuQoS system.
In order to validate the EuQoS capabilities several tools have been developed, the most
relevant for this work being (OreNETa), which implements the NPAS system described
in section 2. OreNETa is in charge of performing passive monitoring of different QoS
parameters for each CoS defined in the system. The tool is composed by three main
parts:
- meter: which is in charge of actually capturing the traffic specified. This part
corresponds to ME. The meter in this implementation is currently based on the libpcap
library.
- analyzer: being the PE or PEs in the scenario, it is the part in charge of configuring
and gathering meter’s data.
- client: it is the interface towards higher layers. This part in EuQoS is formed by
the MMS and MMFM.
The interfaces to OreNETa are defined in order to support periodic monitoring. MMFM
periodically asks the tools for measurements and event notifications.
The MMFM gets the measurements from the different MMS parts, which form the
whole monitoring system, processes them and commit the changes to a database for
persistent storage. This database is used by different EuQoS modules to get information
about the network status, the QoS performance information and the usage of the link
bandwidth. This permits later queries and historical data repositories for off-line
processing.
Testbed
The EuQoS project, besides developing a solid infrastructure for end-to-end QoS
provision, providing a European wide testbed which enables to test different
technologies in the trials of the project, it also permits the deployment of NPAS in a real
scenario.
There are currently eleven different local testbeds which are interconnected via the
Géant network and the National Research and Education Networks (NRENs) through
private tunnels which form a full mesh connectivity among them. On its turn, each
partner has a local testbed with different network technologies, such technologies
include: UMTS, xDSL, Ethernet, Gigabit Ethernet and WiFi (802.11). The second
phase of the project will include G/MPLS and Satellite to the list. NPAS is deployed in
the whole heterogeneous system.
This testbed permits to perform end-to-end tests over heterogeneous networks
depending on the technology of the testbed involved. Moreover, with the QoS
provisioning infrastructure provided by the project and the MMS it is a straightforward
task to deploy and test the scenarios presented in this work.
5. Performance Evaluation
This section is devoted to the evaluation of the proposed system using the EuQoS
testbed, altogether with the methodology specified in previous sections. The actual
NPAS implementation used for the tests is EuQoS’s MMS, more specifically, OreNETa
which includes both the ME and PE entities. The whole system complies with the
requirements and constrains of the proposal. The trials will be done in the EuQoS’s
testbed where the required hardware for the validation is available. The tests will be
performed with the goal of validating NPAS’s framework. It is not the goal of the paper
to exploit the limits of software monitoring but its resilience and well behaviour under
controlled traffic loads. Such task will be left as future work for deploying the system
under tight stress conditions.
Validation principle
Validating a measurement system requires to use another measurement system we can
trust. Such validation structure is found in specific purpose networking equipment with
already validated framework. In this analysis, the NPAS system under development is
specifically designed for measuring and monitoring the parameters required by EuQoS
components in a way respecting the EuQoS systems constraints.
As depicted in Figure 6, the principle of validation in this environment deals with
generating probe traffic on an experimental network which has to cross both the NPAS
deployment scenario and the trustable monitoring tool. For easing the comparisons,
controlled traffic will be injected to the network; such traffic will be analyzed by NPAS
and with the trusted system. The tool used for actively testing the network will be
NetMeter (NetMeter).
Figure 6.- Validation principle of the EuQoS MMS system
For selecting the trustable monitoring system, it is essential to consider several
properties:
- First, it has to be transparent, i.e. it must not introduce any change in the monitored
traffic. The outgoing traffic has to be exactly the same as the incoming one, no loss, no
jitter, etc. has to be introduced by the trustable monitoring system.
GEANT
NRN 2 NRN 1 GRE tunnel
Testbed 1
Testbed 2
NetMeter
DAG system
Splitter
OreNETa
NetMeter
DAG system Splitter
OreNETa
Foreground UDP flow
GPS
GPS
- It has to be very well provisioned in order not to miss packets, even in case of a large
peak of traffic.
- It has to be very accurate in terms of packet timestamping.
To cope with such constraints, it seems that the most known solution (maybe the only
capable of guaranteeing the 3 preceding properties) is the DAG based system from
Endace.
The DAG card has been designed for monitoring traffic on links. The principle of these
boards consists in capturing a trace of all packets propagating on a fiber or copper link.
The advantage of this card is that the capture is performed using hardware components
at wire speed. The system is then able to capture complete traffic traces from high speed
links. These cards can extract either the head or the full payload for all packets passing
on the link, timestamping them with a very accurate GPS clock and store them on a
hard drive (Cleary et al. 2000). DAG is completely transparent and addresses links
whose capacity can go up to OC192. In addition, the GPS timestamping is performed in
the DAG card, using its hardware, thus leading to a very high timing accuracy.
In addition, note that transparency of DAG systems is achieved thanks to the use of
optical or electrical splitters on the monitored link. The splitter lets 80 % of the signal
power continue on the normal link (then without introducing any delay, error or loss)
and 20 % are sent to the DAG system where the capture is done.
On the other hand, for providing similar characteristics to NPAS the system will be
equipped with:
- Serial pulse per second (PPS) signal provided by the same GPS source as the one used
in DAG environment.
- GPS time source for off testbed synchronization.
- Generating and monitoring the traffic from the same machine at the end-points.
For simplicity this validation will focus in end-to-end tests, but it would be
straightforward task to upgrade the testing to inter and intra-domain monitoring. This is
left as future work given it doesn’t provide more information for the validation.
Generic purpose equipment evaluation
The study done for validating the performance of the NPAS has two main parts, first the
focus resides in resources consumption of the solution, in terms of memory, bandwidth
and accuracy. The other study is performed comparing the results obtained by testing
the platform in the EuQoS testbed and comparing the results with the reliable and
validated specific purpose cards developed by ENDACE.
Resource consumption
Previous sections dealt with the internals of a generic NPAS system. This section will
describe an estimation of the resources usage of the actual implementation used for the
validation.
Bandwidth resources
The ME is collecting packet information according to criteria set by the NPAS. A
flowID is generated for each received packet and if there is no state information for this
flow, a flow descriptor of 22 bytes is created by PE to identify the flow. ME will send
information about each packet received within that flow, including the flowID, a
packetID, a sequence number, the timestamp and size of the selected packet. All this
information is packed into 24 bytes, and sent to PE. In this way the bandwidth required
by the data depends on the new flow rate and the packet rate for each monitored flow,
which follows the following expression:
( )nPRPRNFR=BW +++ . . .24·22 1 Equation 1
Where NFR represents the rate at which new flows per second arrive. PRN holds the
packet rate of flow N. The total bandwidth, when in stationary state (all selected
ongoing flows have been identified) grows linearly with the packet rate.
Even with this high bandwidth consumption, the goal of NPAS is to provide very fast
on-line monitoring, in this sense, any compression algorithms or further bandwidth
optimizations are not the focus of this work.
Memory resources
Monitoring high-speed backbone links using generic purpose equipment poses a high
demand on optimizing memory usage, because one needs to track parameters of tens if
not hundreds of thousands of simultaneous flows. In (Barlet et. al 2006) as a network
example, they show that the average number of packets per second for a 1Gbps link is
around 300000. It’s not the goal of this infrastructure to process such big amounts of
data. Minimizing storage of redundant data is the only way to have a performing
scenario, as detailed in the previous section most redundant information is already
dropped at MEs. In their case memory consumption is not an issue, as they immediately
forward all relevant information to PEs, which then will analyze and process the
received data.
PEs will store information for each flow detected by each ME, even if the flow is not
detected on all of them. The flows will be stored in a data structure that facilitates quick
lookups but easy updates as well such as a binary tree. A flow is considered active if
packets with its descriptor have arrived within the last 30 seconds, after that the flow
will be expired and removed from the data structures.
The information kept about each flow includes the flowID computed using a CRC
function from the type of service field, source and destination addresses and ports in
order to optimize communications with MEs. The latter five fields however are stored
in the flow data structure, in order to have the details as well. Other fields stored are the
counters needed to monitor the QoS parameters, and to perform internal computations
of detecting and synchronizing the order of packets between the distributed MEs. These
are responsible for the bulk of the memory requirements but they are needed for
detecting the direction of traffic and for seeing if a flowID represents a flow that passes
through all monitored points.
Full memory analysis is out of the scope of this paper, it can be summarized in a total of
423 bytes per flow plus the control structures of the binary tree, which are fixed for the
whole system per PE.
Related to the memory consumption per packet, the system only consumes 18 bytes,
supposing the aforementioned 300000 packets per second this result around 6Mbytes of
memory per second. Knowing that for statistical purposes, a window of as much as 8
seconds is kept this means around 48Mbytes per meter which is perfectly bearable in
current systems.
Accuracy
Measurement precision is often limited by the operating system used for measurements,
in highly demanding environments, where microseconds, or even nanoseconds are
important, generic purpose operating systems are not suited for the task. In monitoring
environments, such limits are important, but when the goal of the results resides in
verifying if the QoS is properly delivered, the important aspects are the user’s
perception, which has resolution thresholds within milliseconds. Moreover, nowadays
all the operating systems bring enough precision for several milliseconds resolution
needed for proper QoS reporting. This whole issue is directly related with the
timestamping precision.
Even with proper timestamping, it is mandatory in distributed testing environments,
specially if dealing with one way delays, to have proper synchronization among the
different entities (ME in this case). In this environment, the EuQoS testbed provides a
good framework given its stable testbed deployment and proper time sources in the
involved testbeds through GPS stations which guarantee clock offsets lower than
several microseconds.
Experimental validation
This section will show the actual tests performed for the validation of NPAS
deployment in the EuQoS testbed. For achieving this, the tests will be carried between
two partners.
As it is not the goal of this work to show the limits of software based capture platforms,
the tests are not intensive in the resources consumption.
The first site that is involved in the presented tests is located in UPC’s (Universitat
Politècnica de Catalunya) premises and it is connected to Géant network using 1 Gbps
link. The latter site is a laboratory of WUT (Warsaw University of Technology) and it
operates on the 100 Mbps access link to Géant network.
In Figure 6 we showed the network topology and locations of tools that are used for the
tests. According to NPAS architecture the role of ME is performed by the meter module
of OreNETa while PE’s functions are performed by the analyzer module. In the
validation process the role of trusted measurement system is put on DAG cards coupled
with GPS clocks. As OreNETa utilizes passive measurement method, it is necessary to
introduce some foreground traffic that can be measured. Therefore we use (NetMeter)
for the purpose of emitting CBR (Constant Bit Rate) traffic. NetMeter is not part of
NPAS, it is used only for controlled traffic generation purposes.
As a method of validation the used tools and methods we have decided to compare a
number of metrics defined in (ITU-T Rec Y-1541), that are IPTD (mean IP Packet
Transfer Delay), IPDV (IP Packet Delay Variation) and IPLR (IP Packet Loss Ratio).
Consequently in Table 1 we provide with the description of all the performed tests.
Tests performed are done with different rates and packet sizes, all test used periodic
traffic patterns, and they are repeated several times each for achieving statistical
soundness. All the tests have lasted for 10 minutes each with the goal of avoiding
transient network states.
Test Rate (pkt/sec) UDP size (bytes) IP size (bytes) Bandwidt
h
Test1 20 60 88 14Kbps
Test2 96 1420 1468 1.1Mbps
Test3 897 160 208 1.4Mbps
Table 1.- Set of performed tests
IP Transfer Delay analysis
Results for the tests are summarized in Table 2 where the delays for OreNETa and DAG
are shown. The table also depicts the standard deviation of the traffic. The obtained
results of each test depending of the capturing method are different. This is caused
because they are different physical machines. Moreover, DAG timestamping is done
directly in the card, with the obvious gain in accuracy, while OreNETa timestamps the
packets at kernel level, which is prone to operating system issues and less accuracy in
the timestamping.
UPC - WUT DAG OreNETa
Avg. (ms) St. Dev. Avg. (ms) St. Dev.
Test 1 35.21 ~0 35.01 0.39
Test 2 36.60 0.35 36.15 0.15
Test 3 35.33 0.24 35.11 0.23
WUT - UPC DAG OreNETa
Avg. (ms) St. Dev. Avg. (ms) St. Dev.
Test 1 35.63 0.96 35.01 0.39
Test 2 36.9 ~0 37.43 0.67
Test 3 34.74 4.2 35.58 5.2
Table 2.- DAG and OreNETa one way delay results
It is possible to see in the table that the results are comparable in all the cases. For
showing this more clearly, Figure 7 shows the instantaneous one way delay of both
DAG and OreNETa tests. X-Axis holds the packet’s sequence number since the
beginning of the test. Y-Axis represents the delay in milliseconds. The figure shows
results for a particular test from UPC testbed towards WUT’s. The test corresponds to
the first run of Test 2 shown previously. As can be noted the results are not equal. The
reason is because the capture stations are physically separated in both end points.
Moreover, DAG timestamps at hardware level while OreNETa does the same at kernel
level.
Regardless of the above difference, for proving that the results are really comparable,
the computed correlation coefficient of these two traces is 0.9926 which is really
accurate given the inherent problems of generic purpose equipment.
Given that the goal of NPAS is to report QoS levels in the communication, the above
differences in the results are not relevant.
Figure 7.- Test 2 One Way Delay between UPC and WUT
Packet losses analysis
The methodology used for comparing the packet loss accuracy of our NPAS
implementation is to suppose that DAG will always capture 100% of the traffic, this
supposition permits to compute application level packet losses provided by our traffic
generation tool in one hand and OreNETa’s results on the other. Table 3 summarizes
this. The table shows the worst case packet loss ratio in the performed tests.
UPC WUT DAG OreNETa
Test1 0 0
Test2 4.1·10-4 4.1·10-4
Test3 9.2·10-6 3.2·10-2
WUT UPC DAG OreNETa
Test1 0 0
Test2 0 0
Test3 9.9·10-4 9.9·10-4
Table 3.- Worst case packet loss ratio
Looking at the results a first supposition could be that at high packet rates the generic
purpose equipment doesn’t behave correctly. But, investigating further, we found out
that the problem is that the current implementation doesn’t handle properly out of order
packets. Packet reordering is quite common in Géant network, it is caused by load
balancing and redundancy links as we showed in (Serral-Gracià et. al 2005). In a
sample of Test 3 there are a total of 12417 out of order packets. However, WUT to UPC
flow doesn’t have the packet losses issue. Looking at the out of order packets we
discovered no such packets in this direction.
6. Conclusions & Future Work
The main goal of this paper is to present an inter-domain QoS parameter reporting
system which has on-line reporting over heterogeneous networks. Such infrastructure
has been deployed in a European wide testbed used in the IST EuQoS project. This
testbed has also been used for validating the proposed solution using Endace’s trustable
specific purpose equipment. The obtained results show the soundness of the proposal
for reporting the QoS under controlled traffic loads.
Several issues remain wide open. For broadly deployment of the system more tests are
needed for stressing the whole system. Following these series of tests an analysis of the
performance degradation derived from this stress would be made. Related to that, the
number of MEs is an issue, modelling its proper amount and positioning in the network
using cost effective functions could help to improve the whole system deployment
costs.
Another issue is the bandwidth used for carrying the control traffic. This can be reduced
with compression algorithms, or also by using traffic sampling techniques. The latter
can also be used to overcome the capturing capabilities of the standard equipment used
by the solution. It is worth noticing though, that distributed traffic sampling is not a
straightforward task in an inter-domain scenario, hence more research in this topic is
needed.
7. References
- Pere Barlet-Ros, Josep Solé-Pareta, Javier Barrantes, Eva Codina, Jordi Domingo-
Pascual – “SMARTxAC: A Passive Monitoring and Analysis System for High-Speed
Networks” Terena conference. Journal Campus-Wide Information Systems – Volume 23
Issue 4, Pages 283-296. ISSN: 1065-0741 2006.
- J. M. Batalla, Calculating end-to-end QoS parameters from QoS objectives in
Autonomous Systems, 12th Polish Teletraffic Symposium PSRT2005, June 2005
- J. Brazio, P. Tran-Gia, N. Akar, A. Beben, W. Burakowski, M. Fiedler, E. Karasan, M.
Menth, P. Olivier, K. Tutschku, S. Wittevrongel (Eds.), "Analysis and Design of
Advanced Multiservice Networks Supporting Mobility, Multimedia and
Internetworking - COST Action 279 Final Report", Springer, 2006, ISBN: 0-387-
28172-X
- J. Cleary, S. Donnelly, I. Graham, A. McGregor and M. Pearson - "Design principles
for accurate passive measurement" - PAM 2000, Hamilton, New Zealand, April 2000.
- CoMo – Continuous Monitoring – Intel Research. http://como.intel-research.net
- Geant – Webpage: http://www.geant.net
- IST-EuQoS Webpage: http://www.euqos.org
- ITU-T Rec Y-1541: Internet protocol data communication service - IP packet transfer
and availability performance parameters. January 2005.
- B. Huffaker, D. Plummer, D. Moore and K. Claffy. Topology discovery by active
probing. In Proceedings of the Symposium on Applications and the Internet, Nara,
Japan, January 2002.
- IST-INTERMON, INTERMON Deliverable 15: "Final Architecture Specification”,
2004
- P. Matray, G. Simon, J. Stéger, I. Csabai, G. Vattay, Large Scale Network Tomography
in Practice: Queueing Delay Distribution Inference in the ETOMIC Testbed, IEEE
Infocom 2006, April 23-29, Barcelona Spain, 2006
- NetMeter – Webpage: http://www.ccaba.upc.edu/netmeter
- OreNETa – Webpage: http://www.ccaba.upc.edu/oreneta
- René Serral-Gracià, L. Jakab and J. Domingo-Pascual - Out of Order Packets Analysis
on a Real Network Environment. 2nd Conference on Next Generation Internet Design
and Engineering (EuroNGI), 2006. ISBN: 0-7803-9455-0
- Zseby, Tanja, Sebastian Zander, Georg Carle - “Evaluation of Building Blocks for
Passive One-Way Delay Measurements”, GMD FOKUS, 2001