Distributed Monitoring and Measurement System for

Distributed Monitoring and Measurement System for Heterogeneous Networks

Authors/Affiliation:

R. Serral-Gracià1, L. Jakab1, M. Yannuzzi1, X. Masip-Bruin1, J. Domingo-Pascual1, J.

Sliwinski2, A. Beben2, P. Owezarski3, M. Callejo4, J. Enríquez-Gabeiras 4

1Technical University of Catalunya (UPC), Barcelona, Spain

2Warsaw University of Technology (WUT), Warsaw, Poland

3Analysis and Architecture of Systems Lab (LAAS), Toulouse, France

4Telefónica I+D (TID), Madrid, Spain

Keywords: Heterogeneous networks, QoS, monitoring, measurement, network

parameters, inter-domain

Abstract

Service providers’ port-folio is continuously being incremented by emerging real-time

services, which are offered to current and new potential clients with an associated fee.

In order to properly charge such fee, service providers should guarantee that the offered

services are correctly delivered. This is done by monitoring some traffic parameters.

Many end-to-end connections will require monitoring over several domains operated by

different companies, possibly involving different underlying network technologies,

which makes matters very complex.

This paper presents two contributions. On one hand, authors present an approach to

both monitor and measure the QoS offered by the network in an inter-domain

environment. The novelty in this approach is the use of cheap generic purpose

equipment based monitoring tools for achieving such goal. On the other hand authors

also show the validation methodology of this approach system using specific

networking equipment designed for such tasks, with tests performed in a real scenario.

The real scenario used in this paper to validate the proposals is the heterogeneous

network developed within the IST EuQoS project which provides a European wide

testbed.

1. Introduction

Usually when dealing with full traffic monitoring infrastructures (such as that provided

in (CoMo)), powerful specific purpose equipment is required. In other environments the

network load is not as big, or the needed accuracy might be less demanding. This paper

focuses on the specification and deployment of a generic passive networking analysis

system named Network Parameter Acquisition System (NPAS). While its use can be

related to any network monitoring scenario, this work aims at the applicability and

deployment of a system for on-line Quality of Service (QoS) reporting. Specifically, the

NPAS is designed to be applied to a heterogeneous multi-domain scenario supporting

end-to-end QoS capabilities. Its deployment and testing will be issued in the reliable

testbed of the (IST-EuQoS) project, which is spread all over Europe.

For its proper deployment some special requirements need to be met, ranging from high

efficiency to low resources consumption to keep up with the high demands of the

network. To validate this solution a solid verification procedure is performed using

specific purpose equipment for network monitoring.

This paper refers to general purpose equipment, in this context it means commodity

stations with standard networking capabilities. On the other hand, specific purpose

equipment refers to high end equipment using network monitoring cards.

In traffic monitoring platforms such as (Barlet et al. 2006), the real goal is to capture

and evaluate the whole bulk of used bandwidth on a link. In such scenarios, specific

purpose equipment is mandatory. This work differs from these given that in QoS

environments it is not mandatory to capture and analyze all packets on the link. In these

environments computing QoS parameters in a non intrusive way only involves the

flows with guaranteed service, which does not mean to capture all the traffic, moreover,

when reporting QoS information the goal is to indicate the upper bounds permitted for

the offered services, not the specific network parameter values. Hence this work is

centred in the architecture description, its deployment in a real scenario, and the

evaluation and feasibility analysis of the solution. Due to space limitations, issues like

stress testing and capturing limits of the provided solution are left out of the scope of

this paper.

The rest of the paper is structured as follows. The next section discusses the proposed

Network Parameter Acquisition System. After this discussion, the paper addresses the

deployment model of the whole structure. Section 4 introduces the system where the

proposal is applied, i.e., the EuQoS System (IST-EuQoS). After such description the

paper focuses on the validation of the solution. The paper finishes with the conclusions

and further studies related to this work.

2. Network Parameter Acquisition System (NPAS)

This section defines the NPAS as well as its internal structure. NPAS is an infrastructure

to provide structured network parameter information of the network traffic. It is worth

highlighting that the solution proposed as the NPAS is independent of the underlying

network technology as its infrastructure enables the system to: i) extract the QoS

parameters and ii) centralise and store the information gathering.

It is worth highlighting that NPAS supports a hierarchical structure facilitating the inter-

domain extraction and analysis of QoS parameters over heterogeneous networks.

System Architecture

NPAS is divided into two main entities: Monitoring Entities and Processing Entities.

Monitoring Entity (ME): ME is a traffic acquisition interface located in the network

across the different domains. It reports traffic parameter values using generic purpose

equipment; this is done independently of the underlying network technology. The ME’s

goal is to identify the flows that traverse the monitored networks, and to extract on-line

QoS parameters on a per-packet basis. Such parameters are timestamps, and general

packet information available in the packet’s header as will be discussed later.

Processing Entity (PE): PE extracts the minimal required flow information reported by

the ME. Such information will be used for analysing the QoS parameters of the flows in

the selected points of the network.

ME only sends the captured traffic information to the registered PEs. The PEs do not

have to register all the available ME, but only those having relevant traffic information

for the needed analysis.

Figure 1 shows the main building blocks of the NPAS. The MEs have two different

parts, namely, the Parameter Acquisition Module (PAM) and the Filtering Module

(FM). The former is in charge of taking the needed information from the network. The

PAM is dependent on underlying network technology and abstracts such information in

a common generic format.

The acquired information comes from the selection function (F) provided by the FM. F

is flow based and permits to select single flows or aggregates through the Type of

Service. Such selection might be modified by traffic sampling techniques for

performance improvements. This work focuses on basic filtering, leaving improvements

such as traffic sampling for further study.

The MEs are spread out along the monitored network, and their amount of depends on

the desired level of detail. Two is the strict lower bound of MEs, which is the minimum

for computing one-way delays, packet losses and jitter of the traffic between the

different pairs of MEs that lay in the path of the selected flows. The PEs in this

architecture are introduced to provide flexibility to the system. The scenario in Figure 1

shows a simple hierarchy of PEs, which share information with common MEs. This

sharing can be performed at different aggregate levels and facilitates different reporting

granularities. Even in the case of having different destination domains, PEs are able to

detect the flows, as long as there is a suitable ME (or PE) to connect to. This is

accomplished by smart identification of packets and flows.

In the proposed NPAS structure of Figure 1, packet detection and matching is an issue

because of the distributed nature of MEs. The monitoring stations must be able to report

the traffic parameters to the PE, which must match each bit of information with all the

other ME’s. For efficiently reporting the matched packets processing the whole payload

is not an option. The solution to this issue is to use the technique proposed by (Zseby et

al. 2001) which is based on a fast CRC computation. This CRC computation is

performed using strictly the minimum set of data which guarantees proper packet and

flow identification. Packets matching this CRC in various ME are identified as the same

packet. Flows are detected by an identifier taken from the proper fields in the packet’s

header.

With this approach it is only required to process a small portion of the packet.

Moreover, this is robust with packet losses, even when they are dropped along the path

between two different MEs. In such case, the lost packet is reported as lost, but ignored

in the rest of the statistics.

PE2

Filtering Filtering

PAPAME1 MEn

PE1 PEn

Filtering

PA

Filtering

PA

Figure 1.- NPAS structure

Reporting granularity

Centralizing the information in PEs is an issue. Depending on the operator requirements

it can be performed in different ways. i) On a per-domain basis. ii) Among peer

domains. iii) On an end-to-end basis.

Each approach permits to deliver different specific information. In the case of intra-

domain reporting, each operator obtains status information of the internal network

conditions. This can be useful for detecting link failures, bottlenecks and to report

network efficiency.

When dealing with peer domains, the information in heterogeneous QoS systems is

related with Service Level Agreements (SLAs) compliance among peers, as well as a

way to detect misbehaviours with the delivered quality assessed in the contractual

agreement.

Finally, on an end-to-end basis, the information is mostly useful for end users, hence

they can verify the fulfilment of the contracted services. It is also useful for service

providers allowing them to determine whether the required service is properly

delivered.

Each granularity has associated constraints which are tightly related with the amount

and location of the ME. Moreover, their interconnection with the different PE is a

design decision which may limit such reporting.

Control protocol

Until now the discussion has focused on the behaviour of the different parts of the

system. This subsection deals with the different protocols used for the QoS reporting.

There are two different protocols according to the following requirements:

- Reporting Protocol: is used from the ME to the PE for network parameters reporting,

as will be seen later this protocol is also used among PEs.

- Control Protocol: Is performed from the PE to the ME for setting up the monitoring

options. It can also be used between PEs for hierarchical connection of PEs.

The goal of each protocol is different, PEs send the filtering options to MEs, and MEs

are in charge of reporting the network parameters. In order to keep the system as simple

as possible, all the commands are sent uniquely from the PE to the ME. The control

protocol uses the following messages:

- connect: Connection establishment between PE and ME.

- disconnect: Stops the connection between PE and ME.

- filter: Sets the capture filter, it needs a flow or aggregate selection function.

- start: Starts the capture using the specified filter. If there is no filter all the packets will

be captured.

- stop: Stops the capture.

On the other direction, ME encapsulates the required data in a per-packet basis and

these are sent to all the connected MEs. More details about the transmitted data are

provided in the coming sections.

The other required protocol runs between PE, it actually enables the hierarchical

connectivity of PE. The protocol presented here assumes that the monitored network

topology is known, which also enables to have the list of associated PE for each needed

network segment. This issue has already been treated in recent literature (Huffaker et al.

2002).

The control protocol is very similar to the one provided between the PE and the ME.

The main difference is that the network information will reach a PE after being

processed by another PE. This allows reporting when needed, only aggregated results

per class of service. The protocol with the above constraints is the same than that used

for communicating PEs to MEs, but with a new message:

- aggregation: specifies the aggregation level, values can be, per flow, per packet or per

aggregate. With the goal of avoiding excessive inter-domain management traffic, it has

to be possible to specify a refresh rate for controlling the reporting interval.

In the specific case of inter-domain measurements, a control and reporting protocol has

been designed. The objective of Inter-domain Measurement, Control and Reporting

protocol (IMCR) is to collect that information about the QoS level offered on end-to-

end paths, which is based on local measurements performed by each transited domain.

For that purpose in a given domain, the PE periodically advertises to its neighbours the

information about measured values of QoS parameters, such as delay, jitter or packet

losses, that corresponds to the same particular destinations reachable from this domain.

For the destinations that are located inside the domain, each PE advertises the locally

measured QoS values, while for the destinations that are located outside the domain, the

PE advertises aggregated QoS values. The aggregated values are assembled using the

QoS parameters received from the neighbouring PE as well as the local QoS

contributions introduced between the corresponding pairs of border routers.

Figure 2 shows an example illustrating how the IMCR protocol collects information

about the QoS. The QoS is offered in end-to-end paths across three domain scenario,

from domain AS1 towards domain AS3. We assume that each domain allocates one

deployed PE that periodically performs measurements representing the offered QoS

inside the domain (Q1, Q2 and Q3) as well as on adequate inter-domain links (Q1-2, Q2-1,

Q2-3, Q3-2).

Q3 Q2 Q1->2 Q1

Q2->1 Q3->2

PE1 PE2 PE3

IMCR

Q2->3

Aggregated QoS value collected by IMCR protocol equal to: Q3 ⊕ Q2->3 ⊕ Q2 ⊕ Q1->2 ⊕ Q1

IMCR

AS1 AS2 AS3

Figure 2. Example of IMCR operation.

Then, PE3 propagates information about its computed QoS parameters, denoted as Q3,

to PE2. The PE2 checks whether domain AS3 is a peer (which means that it is available

in its BGP routing table) and, in such case advertises aggregated information to PE1

about the offered QoS toward the destinations located in domain AS3. This value takes

into account QoS contributions of domain AS3 and AS2 (Q3, Q2) as well as the inter-

domain link between domain AS2 and AS3 (Q2->3) so it equals to Q3 ⊕ Q2->3 ⊕ Q2. The

operator ⊕ denotes an aggregation function which is specific for each particular QoS

parameter (e.g. a sum for additive QoS constraints such as the one-way delay, a product

for multiplicative QoS constraints such as losses, or the minimum for QoS constraints

such as bandwidth). Finally, following the same scenario in domain AS2, PE1 receives

information of the QoS offered on the path toward domain AS3.

Assuming that each PE periodically advertises information about the offered QoS

toward available destinations, each PE will create a QoS map that shows updated QoS

values offered towards any reachable destination. Therefore, each PE has to maintain a

table (similar to a BGP routing table) that contains available destinations jointly with

the corresponding QoS parameters and associated PE.

The accuracy of the gathered end-to-end QoS values depends on the precision of QoS

aggregation functions as well as on how frequently PEs send updates to their

neighbours. Updates might be triggered for reporting special network conditions when

needed.

3. Deployment Model

This section defines the deployment options along with some important decisions

related with the setup of a NPAS. The presented methodology will be used in later

sections for a real deployment in a real testbed. The proposed solution complies with

the system provided in the previous section.

The reporting granularity and its different options is an important decision when

deploying NPAS, and hence several concerns have to be addressed. One refers to

administrative issues, i.e., sometimes it is not possible to deploy a ME on all the hops

over the traffic path. In this regard, NPAS is designed with such flexibility in mind. The

MEs can be spread in different points with one or more PE to report to. Depending on

this deploying density more detailed information can be extracted.

Once the parameters are gathered, the system delivers them to higher layers. The

specific way how the QoS is actually provided or enforced is out of the scope of this

paper and of NPAS.

Depending on the location of the ME it can be possible to compute from the range of

intra-domain up to end-to-end traffic with the same infrastructure. PE will be in charge

of reporting the desired metrics. In this context, end-to-end is always referred as

network level (from ingress to egress nodes of the whole path). The users experience as

well as applications specific metrics are out of the scope of this paper, as they need to

be handled by higher layer entities.

On-line QoS monitoring system

In this section we discuss three approaches for performing on-line QoS monitoring in a

multi-domain network scenario. The main objective of on-line QoS monitoring is to

provide the updated network information about the offered QoS level. The information

is about the established paths between its own domains and other destination domains.

The information provided by the on-line QoS monitoring function constitutes a base for

(i) providing “certificates” for users about assured QoS level and (ii) OAM (Operation,

Administration and Maintenance) process that triggers fault management actions in the

case when QoS level falls below the expectations.

Designing an “on-line” monitoring system for multi-domain network is a complex task

as shown in (Matray et al. 2006 and in INTERMON 2004). The main problem comes

from the fact that domains are usually under autonomous administration and as a

consequence their operators have their own policies for performing measurements. On

the other hand an on-line monitoring system also requires some co-operation between

domains. The main requirements for the measurement system deployed in a particular

domain are the following:

- The measurements performed by a particular domain should correspond to the same

metric and have to be measured at the same technology layer, usually IP level.

- The measurements should cover the whole path so they have to be performed between

clearly defined hops of the path, e.g. borders of domains where the limits of network

service of a particular domain are.

- The measurement system should be able to interoperate with other systems.

- The measurement system should allow for scalability when networks with large

number of domains are investigated.

The on-line QoS monitoring system may be implemented in different ways. Among

them we distinguish three main categories:

- Centralised solution that assumes a single entity controlling all measurements

performed inside entire network, including several domains.

- Fully distributed solution where each domain has its own control entity responsible for

performing measurements inside domain, while measurements corresponding to end-to-

end paths are collected with the aid of the control and reporting protocol. The base

information for this protocol comes from in results of local measurements performed

independently by each domain.

- Semi-centralised solution where each domain has a control entity responsible for

performing measurements on the path between its domain and any destination domain.

In the following sections we present discussed approaches in more detail.

Centralised on-line monitoring system

This approach assumes that centralised measurement controllers (represented in NPAS

by PE) are responsible for managing of all measurements preformed inside each

domain. It can be extended to any arbitrary pair of domains, see Figure 3. The PE

controls all MEs located in the domain borders as well as on the borders of access and

core networks.

PE

AS1 AS2

ME1b

ME1c

ME1d ME2

a

AS3

ME2d ME1

a ME3a ME3

d

ME2c

ME2b ME3

b ME3

c

- Border router

- ME for inter-domain measurements

- ME for intra-domain measurements

- Controlling the MEs

- Centralised Measurement Controller

- Exemplary possible measurements

PE

All measurement results are collected by the Central Measurement Controller

Figure 3.- Architecture of centralised on-line monitoring system.

The main advantage of this approach is its simplicity of implementation. However,

because of assuming a centralised point of control, this approach is difficult to deploy in

a network consisting of many autonomous domains. Moreover this method is hardly

scalable. As a consequence this approach may be applied for small networks under a

common administration, e.g. in case of trial networks, or dedicated networks. As an

example, such approach was implemented in the ETOMIC project (Matray et al. 2006)

where measurement probes were deployed in different sites that were connected to the

Géant academic network (Geant).

Fully distributed on-line monitoring system

This solution assumes that monitoring of end-to-end paths is performed in a distributed

way as presented in Figure 4.

- Inter PE control and reporting protocol

- Domain Measurement Controller

- Possible measurements

PE1

AS1 AS2

ME1b

ME1c

ME1d ME2

a

AS3

ME2d ME1

a ME3a ME3

d

ME2c

ME2b ME3

b ME3

c

- Border router



- Controlling the MEs

PE

PE2 PE3

Measurement results collected by PE1



Figure 4.- Architecture of distributed on-line monitoring system.

For that purpose each domain implements its own domain measurement controller (also

represented by the PE) that is responsible for performing on-line measurements inside

the domain. Those measurements are performed in a continuous way between any pair

of points where the service is offered. In a typical domain, measurements should be

performed between all pairs of border routers. With the goal to measure QoS offered for

traffic crossing the domain, between all pairs of access networks to measure QoS

offered for local traffic, as well as all pairs of access network and border router to

measure QoS offered for traffic originating or terminating in this domain. For obtaining

end-to-end measures, the intra-domain results are periodically advertised to the

neighbouring PE’s. For that purpose, a specialised inter-domain measurement, control

and reporting protocol has been proposed earlier in the paper as IMCR protocol.

The main advantage of a distributed on-line monitoring approach corresponds to the

independency of measurements performed in particular domains. The only requirements

are that the measurement systems deployed in a particular domain should measure the

same QoS metrics and implement the same measurement and control protocol. Due to

the distributed operation, this approach can be easily applied in a multi-domain

network. On the other hand, the main drawback corresponds to the accuracy of end-to-

end results. In fact the results are not directly measured as they are based on the results

measured by local systems. However, by properly designing of the assembling

functions used for calculations of cumulative values of QoS parameters, e.g. as

proposed in (Batalla 2005, Brazio et al. 2006), we may approximate the upper bound

for a particular QoS metric.

Semi-centralised on-line monitoring system

This approach assumes that each domain has its own PE that is responsible for

performing measurements on all paths from its own domain towards any required

destination domain. For instance on Figure 5, the PE1 can control the ME located in all

domains, hence we can measure paths between domains AS1 and AS3, as well as

between AS1 and AS2. Unfortunately, in this approach it is not possible to collect

measurements between AS2 and AS3.

PE1

AS1 AS2

ME1b

ME1c

ME1d ME2

a

AS3

ME2d ME1

a ME3a ME3

d

ME2c

ME2b ME3

b ME3

c

- Border router



Measurement results collected by the PE1

PE2 PE3

- Controlling the MPs

- Domain Measurement Controller

- Exemplary possible measurements

PE

Figure 5. Architecture of semi-centralised on-line monitoring system for performing measurements

from AS1.

This approach allows us to get measurements on all end-to-end paths, however it

requires that MEs are controlled remotely by PEs belonging to different domains. This

constitutes the main limitation of semi-centralised approach, as it may be applied only

between domains that trust each other.

Although Figure 5 presents a solution where PE can directly access any ME outside of

its own domain, it is possible to restrict such communication using the appointed PE.

Such approach is investigated by the IST-INTERMON project (INTERMON) where the

function of PE was performed by Global Controller entity that communicates using the

Specification of Monitoring Service interface.

4. The EuQoS System

The term EuQoS stands for “End-to-end Quality of Service support over heterogeneous

networks” (IST-EuQoS). This European research project is building a complete QoS

system aimed at supporting stringent QoS services among heterogeneous access

technologies, both within and between different routing domains. EuQoS customers

will be able to subscribe to this system, for which we are developing a portfolio of

novel QoS mechanisms and protocols that build upon the state of the art. Among these

we have security, admission control, signaling, monitoring measurements and reporting,

QoS Routing (QoSR), fault management, and Traffic Engineering (TE).

The focus of this paper resides on the Monitoring and Measurement System (MMS) and

the Monitoring Measurements and Fault Management (MMFM).

MMS and MMFM

The EuQoS MMS functionality is twofold: to validate of the QoS delivered by the

EuQoS system (in performance trials) and to support other EuQoS functionalities. Such

functionalities include topology acquisition, network resource usage and delivered QoS

levels in order to support traffic engineering.

The MMFM is the EuQoS function in charge of network resource monitoring and

network topology discovering, fault management and QoS monitoring. The Network

Monitor inside the MMFM manages different threads in charge of interfacing with the

different MMS tools in charge of supporting the EuQoS system.

In order to validate the EuQoS capabilities several tools have been developed, the most

relevant for this work being (OreNETa), which implements the NPAS system described

in section 2. OreNETa is in charge of performing passive monitoring of different QoS

parameters for each CoS defined in the system. The tool is composed by three main

parts:

- meter: which is in charge of actually capturing the traffic specified. This part

corresponds to ME. The meter in this implementation is currently based on the libpcap

library.

- analyzer: being the PE or PEs in the scenario, it is the part in charge of configuring

and gathering meter’s data.

- client: it is the interface towards higher layers. This part in EuQoS is formed by

the MMS and MMFM.

The interfaces to OreNETa are defined in order to support periodic monitoring. MMFM

periodically asks the tools for measurements and event notifications.

The MMFM gets the measurements from the different MMS parts, which form the

whole monitoring system, processes them and commit the changes to a database for

persistent storage. This database is used by different EuQoS modules to get information

about the network status, the QoS performance information and the usage of the link

bandwidth. This permits later queries and historical data repositories for off-line

processing.

Testbed

The EuQoS project, besides developing a solid infrastructure for end-to-end QoS

provision, providing a European wide testbed which enables to test different

technologies in the trials of the project, it also permits the deployment of NPAS in a real

scenario.

There are currently eleven different local testbeds which are interconnected via the

Géant network and the National Research and Education Networks (NRENs) through

private tunnels which form a full mesh connectivity among them. On its turn, each

partner has a local testbed with different network technologies, such technologies

include: UMTS, xDSL, Ethernet, Gigabit Ethernet and WiFi (802.11). The second

phase of the project will include G/MPLS and Satellite to the list. NPAS is deployed in

the whole heterogeneous system.

This testbed permits to perform end-to-end tests over heterogeneous networks

depending on the technology of the testbed involved. Moreover, with the QoS

provisioning infrastructure provided by the project and the MMS it is a straightforward

task to deploy and test the scenarios presented in this work.

5. Performance Evaluation

This section is devoted to the evaluation of the proposed system using the EuQoS

testbed, altogether with the methodology specified in previous sections. The actual

NPAS implementation used for the tests is EuQoS’s MMS, more specifically, OreNETa

which includes both the ME and PE entities. The whole system complies with the

requirements and constrains of the proposal. The trials will be done in the EuQoS’s

testbed where the required hardware for the validation is available. The tests will be

performed with the goal of validating NPAS’s framework. It is not the goal of the paper

to exploit the limits of software monitoring but its resilience and well behaviour under

controlled traffic loads. Such task will be left as future work for deploying the system

under tight stress conditions.

Validation principle

Validating a measurement system requires to use another measurement system we can

trust. Such validation structure is found in specific purpose networking equipment with

already validated framework. In this analysis, the NPAS system under development is

specifically designed for measuring and monitoring the parameters required by EuQoS

components in a way respecting the EuQoS systems constraints.

As depicted in Figure 6, the principle of validation in this environment deals with

generating probe traffic on an experimental network which has to cross both the NPAS

deployment scenario and the trustable monitoring tool. For easing the comparisons,

controlled traffic will be injected to the network; such traffic will be analyzed by NPAS

and with the trusted system. The tool used for actively testing the network will be

NetMeter (NetMeter).

Figure 6.- Validation principle of the EuQoS MMS system

For selecting the trustable monitoring system, it is essential to consider several

properties:

- First, it has to be transparent, i.e. it must not introduce any change in the monitored

traffic. The outgoing traffic has to be exactly the same as the incoming one, no loss, no

jitter, etc. has to be introduced by the trustable monitoring system.

GEANT

NRN 2 NRN 1 GRE tunnel

Testbed 1

Testbed 2

NetMeter

DAG system

Splitter

OreNETa

NetMeter

DAG system Splitter

OreNETa

Foreground UDP flow

GPS

GPS

- It has to be very well provisioned in order not to miss packets, even in case of a large

peak of traffic.

- It has to be very accurate in terms of packet timestamping.

To cope with such constraints, it seems that the most known solution (maybe the only

capable of guaranteeing the 3 preceding properties) is the DAG based system from

Endace.

The DAG card has been designed for monitoring traffic on links. The principle of these

boards consists in capturing a trace of all packets propagating on a fiber or copper link.

The advantage of this card is that the capture is performed using hardware components

at wire speed. The system is then able to capture complete traffic traces from high speed

links. These cards can extract either the head or the full payload for all packets passing

on the link, timestamping them with a very accurate GPS clock and store them on a

hard drive (Cleary et al. 2000). DAG is completely transparent and addresses links

whose capacity can go up to OC192. In addition, the GPS timestamping is performed in

the DAG card, using its hardware, thus leading to a very high timing accuracy.

In addition, note that transparency of DAG systems is achieved thanks to the use of

optical or electrical splitters on the monitored link. The splitter lets 80 % of the signal

power continue on the normal link (then without introducing any delay, error or loss)

and 20 % are sent to the DAG system where the capture is done.

On the other hand, for providing similar characteristics to NPAS the system will be

equipped with:

- Serial pulse per second (PPS) signal provided by the same GPS source as the one used

in DAG environment.

- GPS time source for off testbed synchronization.

- Generating and monitoring the traffic from the same machine at the end-points.

For simplicity this validation will focus in end-to-end tests, but it would be

straightforward task to upgrade the testing to inter and intra-domain monitoring. This is

left as future work given it doesn’t provide more information for the validation.

Generic purpose equipment evaluation

The study done for validating the performance of the NPAS has two main parts, first the

focus resides in resources consumption of the solution, in terms of memory, bandwidth

and accuracy. The other study is performed comparing the results obtained by testing

the platform in the EuQoS testbed and comparing the results with the reliable and

validated specific purpose cards developed by ENDACE.

Resource consumption

Previous sections dealt with the internals of a generic NPAS system. This section will

describe an estimation of the resources usage of the actual implementation used for the

validation.

Bandwidth resources

The ME is collecting packet information according to criteria set by the NPAS. A

flowID is generated for each received packet and if there is no state information for this

flow, a flow descriptor of 22 bytes is created by PE to identify the flow. ME will send

information about each packet received within that flow, including the flowID, a

packetID, a sequence number, the timestamp and size of the selected packet. All this

information is packed into 24 bytes, and sent to PE. In this way the bandwidth required

by the data depends on the new flow rate and the packet rate for each monitored flow,

which follows the following expression:

( )nPRPRNFR=BW +++ . . .24·22 1 Equation 1

Where NFR represents the rate at which new flows per second arrive. PRN holds the

packet rate of flow N. The total bandwidth, when in stationary state (all selected

ongoing flows have been identified) grows linearly with the packet rate.

Even with this high bandwidth consumption, the goal of NPAS is to provide very fast

on-line monitoring, in this sense, any compression algorithms or further bandwidth

optimizations are not the focus of this work.

Memory resources

Monitoring high-speed backbone links using generic purpose equipment poses a high

demand on optimizing memory usage, because one needs to track parameters of tens if

not hundreds of thousands of simultaneous flows. In (Barlet et. al 2006) as a network

example, they show that the average number of packets per second for a 1Gbps link is

around 300000. It’s not the goal of this infrastructure to process such big amounts of

data. Minimizing storage of redundant data is the only way to have a performing

scenario, as detailed in the previous section most redundant information is already

dropped at MEs. In their case memory consumption is not an issue, as they immediately

forward all relevant information to PEs, which then will analyze and process the

received data.

PEs will store information for each flow detected by each ME, even if the flow is not

detected on all of them. The flows will be stored in a data structure that facilitates quick

lookups but easy updates as well such as a binary tree. A flow is considered active if

packets with its descriptor have arrived within the last 30 seconds, after that the flow

will be expired and removed from the data structures.

The information kept about each flow includes the flowID computed using a CRC

function from the type of service field, source and destination addresses and ports in

order to optimize communications with MEs. The latter five fields however are stored

in the flow data structure, in order to have the details as well. Other fields stored are the

counters needed to monitor the QoS parameters, and to perform internal computations

of detecting and synchronizing the order of packets between the distributed MEs. These

are responsible for the bulk of the memory requirements but they are needed for

detecting the direction of traffic and for seeing if a flowID represents a flow that passes

through all monitored points.

Full memory analysis is out of the scope of this paper, it can be summarized in a total of

423 bytes per flow plus the control structures of the binary tree, which are fixed for the

whole system per PE.

Related to the memory consumption per packet, the system only consumes 18 bytes,

supposing the aforementioned 300000 packets per second this result around 6Mbytes of

memory per second. Knowing that for statistical purposes, a window of as much as 8

seconds is kept this means around 48Mbytes per meter which is perfectly bearable in

current systems.

Accuracy

Measurement precision is often limited by the operating system used for measurements,

in highly demanding environments, where microseconds, or even nanoseconds are

important, generic purpose operating systems are not suited for the task. In monitoring

environments, such limits are important, but when the goal of the results resides in

verifying if the QoS is properly delivered, the important aspects are the user’s

perception, which has resolution thresholds within milliseconds. Moreover, nowadays

all the operating systems bring enough precision for several milliseconds resolution

needed for proper QoS reporting. This whole issue is directly related with the

timestamping precision.

Even with proper timestamping, it is mandatory in distributed testing environments,

specially if dealing with one way delays, to have proper synchronization among the

different entities (ME in this case). In this environment, the EuQoS testbed provides a

good framework given its stable testbed deployment and proper time sources in the

involved testbeds through GPS stations which guarantee clock offsets lower than

several microseconds.

Experimental validation

This section will show the actual tests performed for the validation of NPAS

deployment in the EuQoS testbed. For achieving this, the tests will be carried between

two partners.

As it is not the goal of this work to show the limits of software based capture platforms,

the tests are not intensive in the resources consumption.

The first site that is involved in the presented tests is located in UPC’s (Universitat

Politècnica de Catalunya) premises and it is connected to Géant network using 1 Gbps

link. The latter site is a laboratory of WUT (Warsaw University of Technology) and it

operates on the 100 Mbps access link to Géant network.

In Figure 6 we showed the network topology and locations of tools that are used for the

tests. According to NPAS architecture the role of ME is performed by the meter module

of OreNETa while PE’s functions are performed by the analyzer module. In the

validation process the role of trusted measurement system is put on DAG cards coupled

with GPS clocks. As OreNETa utilizes passive measurement method, it is necessary to

introduce some foreground traffic that can be measured. Therefore we use (NetMeter)

for the purpose of emitting CBR (Constant Bit Rate) traffic. NetMeter is not part of

NPAS, it is used only for controlled traffic generation purposes.

As a method of validation the used tools and methods we have decided to compare a

number of metrics defined in (ITU-T Rec Y-1541), that are IPTD (mean IP Packet

Transfer Delay), IPDV (IP Packet Delay Variation) and IPLR (IP Packet Loss Ratio).

Consequently in Table 1 we provide with the description of all the performed tests.

Tests performed are done with different rates and packet sizes, all test used periodic

traffic patterns, and they are repeated several times each for achieving statistical

soundness. All the tests have lasted for 10 minutes each with the goal of avoiding

transient network states.

Test Rate (pkt/sec) UDP size (bytes) IP size (bytes) Bandwidt

h

Test1 20 60 88 14Kbps

Test2 96 1420 1468 1.1Mbps

Test3 897 160 208 1.4Mbps

Table 1.- Set of performed tests

IP Transfer Delay analysis

Results for the tests are summarized in Table 2 where the delays for OreNETa and DAG

are shown. The table also depicts the standard deviation of the traffic. The obtained

results of each test depending of the capturing method are different. This is caused

because they are different physical machines. Moreover, DAG timestamping is done

directly in the card, with the obvious gain in accuracy, while OreNETa timestamps the

packets at kernel level, which is prone to operating system issues and less accuracy in

the timestamping.

UPC - WUT DAG OreNETa

Avg. (ms) St. Dev. Avg. (ms) St. Dev.

Test 1 35.21 ~0 35.01 0.39

Test 2 36.60 0.35 36.15 0.15

Test 3 35.33 0.24 35.11 0.23

WUT - UPC DAG OreNETa

Avg. (ms) St. Dev. Avg. (ms) St. Dev.

Test 1 35.63 0.96 35.01 0.39

Test 2 36.9 ~0 37.43 0.67

Test 3 34.74 4.2 35.58 5.2

Table 2.- DAG and OreNETa one way delay results

It is possible to see in the table that the results are comparable in all the cases. For

showing this more clearly, Figure 7 shows the instantaneous one way delay of both

DAG and OreNETa tests. X-Axis holds the packet’s sequence number since the

beginning of the test. Y-Axis represents the delay in milliseconds. The figure shows

results for a particular test from UPC testbed towards WUT’s. The test corresponds to

the first run of Test 2 shown previously. As can be noted the results are not equal. The

reason is because the capture stations are physically separated in both end points.

Moreover, DAG timestamps at hardware level while OreNETa does the same at kernel

level.

Regardless of the above difference, for proving that the results are really comparable,

the computed correlation coefficient of these two traces is 0.9926 which is really

accurate given the inherent problems of generic purpose equipment.

Given that the goal of NPAS is to report QoS levels in the communication, the above

differences in the results are not relevant.

Figure 7.- Test 2 One Way Delay between UPC and WUT

Packet losses analysis

The methodology used for comparing the packet loss accuracy of our NPAS

implementation is to suppose that DAG will always capture 100% of the traffic, this

supposition permits to compute application level packet losses provided by our traffic

generation tool in one hand and OreNETa’s results on the other. Table 3 summarizes

this. The table shows the worst case packet loss ratio in the performed tests.

UPC WUT DAG OreNETa

Test1 0 0

Test2 4.1·10-4 4.1·10-4

Test3 9.2·10-6 3.2·10-2

WUT UPC DAG OreNETa

Test1 0 0

Test2 0 0

Test3 9.9·10-4 9.9·10-4

Table 3.- Worst case packet loss ratio

Looking at the results a first supposition could be that at high packet rates the generic

purpose equipment doesn’t behave correctly. But, investigating further, we found out

that the problem is that the current implementation doesn’t handle properly out of order

packets. Packet reordering is quite common in Géant network, it is caused by load

balancing and redundancy links as we showed in (Serral-Gracià et. al 2005). In a

sample of Test 3 there are a total of 12417 out of order packets. However, WUT to UPC

flow doesn’t have the packet losses issue. Looking at the out of order packets we

discovered no such packets in this direction.

6. Conclusions & Future Work

The main goal of this paper is to present an inter-domain QoS parameter reporting

system which has on-line reporting over heterogeneous networks. Such infrastructure

has been deployed in a European wide testbed used in the IST EuQoS project. This

testbed has also been used for validating the proposed solution using Endace’s trustable

specific purpose equipment. The obtained results show the soundness of the proposal

for reporting the QoS under controlled traffic loads.

Several issues remain wide open. For broadly deployment of the system more tests are

needed for stressing the whole system. Following these series of tests an analysis of the

performance degradation derived from this stress would be made. Related to that, the

number of MEs is an issue, modelling its proper amount and positioning in the network

using cost effective functions could help to improve the whole system deployment

costs.

Another issue is the bandwidth used for carrying the control traffic. This can be reduced

with compression algorithms, or also by using traffic sampling techniques. The latter

can also be used to overcome the capturing capabilities of the standard equipment used

by the solution. It is worth noticing though, that distributed traffic sampling is not a

straightforward task in an inter-domain scenario, hence more research in this topic is

needed.

7. References

- Pere Barlet-Ros, Josep Solé-Pareta, Javier Barrantes, Eva Codina, Jordi Domingo-

Pascual – “SMARTxAC: A Passive Monitoring and Analysis System for High-Speed

Networks” Terena conference. Journal Campus-Wide Information Systems – Volume 23

Issue 4, Pages 283-296. ISSN: 1065-0741 2006.

- J. M. Batalla, Calculating end-to-end QoS parameters from QoS objectives in

Autonomous Systems, 12th Polish Teletraffic Symposium PSRT2005, June 2005

- J. Brazio, P. Tran-Gia, N. Akar, A. Beben, W. Burakowski, M. Fiedler, E. Karasan, M.

Menth, P. Olivier, K. Tutschku, S. Wittevrongel (Eds.), "Analysis and Design of

Advanced Multiservice Networks Supporting Mobility, Multimedia and

Internetworking - COST Action 279 Final Report", Springer, 2006, ISBN: 0-387-

28172-X

- J. Cleary, S. Donnelly, I. Graham, A. McGregor and M. Pearson - "Design principles

for accurate passive measurement" - PAM 2000, Hamilton, New Zealand, April 2000.

- CoMo – Continuous Monitoring – Intel Research. http://como.intel-research.net

- Geant – Webpage: http://www.geant.net

- IST-EuQoS Webpage: http://www.euqos.org

- ITU-T Rec Y-1541: Internet protocol data communication service - IP packet transfer

and availability performance parameters. January 2005.

- B. Huffaker, D. Plummer, D. Moore and K. Claffy. Topology discovery by active

probing. In Proceedings of the Symposium on Applications and the Internet, Nara,

Japan, January 2002.

- IST-INTERMON, INTERMON Deliverable 15: "Final Architecture Specification”,

2004

- P. Matray, G. Simon, J. Stéger, I. Csabai, G. Vattay, Large Scale Network Tomography

in Practice: Queueing Delay Distribution Inference in the ETOMIC Testbed, IEEE

Infocom 2006, April 23-29, Barcelona Spain, 2006

- NetMeter – Webpage: http://www.ccaba.upc.edu/netmeter

- OreNETa – Webpage: http://www.ccaba.upc.edu/oreneta

- René Serral-Gracià, L. Jakab and J. Domingo-Pascual - Out of Order Packets Analysis

on a Real Network Environment. 2nd Conference on Next Generation Internet Design

and Engineering (EuroNGI), 2006. ISBN: 0-7803-9455-0

- Zseby, Tanja, Sebastian Zander, Georg Carle - “Evaluation of Building Blocks for

Passive One-Way Delay Measurements”, GMD FOKUS, 2001

http://www.ccaba.upc.edu/oreneta

http://www.ccaba.upc.edu/netmeter

http://www.euqos.org/

http://como.intel-research.net/

Documents

Distributed Monitoring and Measurement System for