15
Ž . Computer Networks 31 1999 2293–2307 www.elsevier.comrlocatercomnet MEHARI: a system for analysing the use of the internet services Pedro J. Lizcano a, ) , Arturo Azcorra b , Josep Sole-Pareta c , Jordi Domingo-Pascual c , ´ Manuel Alvarez-Campana d a ( ) Comision Interministerial de Ciencia y Tecnologıa CICYT , Rosario Pino, 14-16, 28020 Madrid, Spain ´ ´ b ( ) ( ) UniÕersidad Carlos III de Madrid UC3M , Butarque 15, 28911 Leganes Madrid , Spain ´ c ( ) UniÕersitat Politecnica de Catalunya UPC , Jordi Girona 1-3, 08034 Barcelona, Spain ` d ( ) UniÕersidad Politecnica de Madrid UPM , ETSI Telecomunicacion, Ciudad UniÕersitaria, 28040 Madrid, Spain ´ ´ Abstract Ž . This paper describes the MEHARI system for monitoring and analysing the traffic on National R & D Networks NRN . The main design requirement of the MEHARI system was to cope with the main challenges currently faced by most of these networks: Service usage, charging schemes, network dimensioning and auditing procedures according to a given Acceptable Ž . Use Policy AUP . The resulting MEHARI system consists of a low cost hardware traffic capture platform, and several traffic analysis software modules running on top of this platform. These modules can report information on the usage of the Internet services, the main traffic origin and destinations, etc. The MEHARI system has been tested by running a field trial Ž . on the Spanish NRN RedIRIS . Some relevant data on the performance and efficiency of the system configuration used in the field trial is presented in this paper. q 1999 Elsevier Science B.V. All rights reserved. Keywords: Internet services monitoring; Internet traffic analysis; National R & D networks; Acceptable use policy auditing 1. Introduction The full development of the Information Society paradigm has led to the development of new IP infrastructures worldwide, driven by commercial in- terest and by different government initiatives be- cause of strategic reasons. Even the EU has played a leading roll in this matter with the TEN-34 and TEN-155 initiatives to foster the trans-national col- laboration among the different National NRN re- searches. But these deployments are faced with two major problems: Ø In most cases, NRNs are funded by public money, with budgets that do not follow the traffic increas- ) Ž . Corresponding author. E-mail: [email protected] ing rates that user demands. Therefore, a new funding model or an use-responsible approach is needed. Ø Market forces, in current full competitive scenar- ios, demand from these public funded networks a Ž . transparent and acceptable use policy AUP to assure no distortions in the emerging Internet- based services business. To cope with these open issues, auditing AUP tools are required for monitoring the actual use of the public funded networks. Furthermore, the possi- ble collaborations of European or NRNs networking initiatives with their leading counterparts in North Ž . America i.e., Internet 2 and Canarie depend on having these tools available to ensure the coherence of the Americans and Europeans AUPs concerning the traffic to be exchanged among them. 1389-1286r99r$ - see front matter q 1999 Elsevier Science B.V. All rights reserved. Ž . PII: S1389-1286 99 00106-1

MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

Ž .Computer Networks 31 1999 2293–2307www.elsevier.comrlocatercomnet

MEHARI: a system for analysing the use of the internet services

Pedro J. Lizcano a,), Arturo Azcorra b, Josep Sole-Pareta c, Jordi Domingo-Pascual c,´Manuel Alvarez-Campana d

a ( )Comision Interministerial de Ciencia y Tecnologıa CICYT , Rosario Pino, 14-16, 28020 Madrid, Spain´ ´b ( ) ( )UniÕersidad Carlos III de Madrid UC3M , Butarque 15, 28911 Leganes Madrid , Spain´c ( )UniÕersitat Politecnica de Catalunya UPC , Jordi Girona 1-3, 08034 Barcelona, Spain`

d ( )UniÕersidad Politecnica de Madrid UPM , ETSI Telecomunicacion, Ciudad UniÕersitaria, 28040 Madrid, Spain´ ´

Abstract

Ž .This paper describes the MEHARI system for monitoring and analysing the traffic on National R&D Networks NRN .The main design requirement of the MEHARI system was to cope with the main challenges currently faced by most of thesenetworks: Service usage, charging schemes, network dimensioning and auditing procedures according to a given Acceptable

Ž .Use Policy AUP . The resulting MEHARI system consists of a low cost hardware traffic capture platform, and severaltraffic analysis software modules running on top of this platform. These modules can report information on the usage of theInternet services, the main traffic origin and destinations, etc. The MEHARI system has been tested by running a field trial

Ž .on the Spanish NRN RedIRIS . Some relevant data on the performance and efficiency of the system configuration used inthe field trial is presented in this paper. q 1999 Elsevier Science B.V. All rights reserved.

Keywords: Internet services monitoring; Internet traffic analysis; National R&D networks; Acceptable use policy auditing

1. Introduction

The full development of the Information Societyparadigm has led to the development of new IPinfrastructures worldwide, driven by commercial in-terest and by different government initiatives be-cause of strategic reasons. Even the EU has played aleading roll in this matter with the TEN-34 andTEN-155 initiatives to foster the trans-national col-laboration among the different National NRN re-searches.

But these deployments are faced with two majorproblems:Ø In most cases, NRNs are funded by public money,

with budgets that do not follow the traffic increas-

) Ž .Corresponding author. E-mail: [email protected]

ing rates that user demands. Therefore, a newfunding model or an use-responsible approach isneeded.

Ø Market forces, in current full competitive scenar-ios, demand from these public funded networks a

Ž .transparent and acceptable use policy AUP toassure no distortions in the emerging Internet-based services business.To cope with these open issues, auditing AUP

tools are required for monitoring the actual use ofthe public funded networks. Furthermore, the possi-ble collaborations of European or NRNs networkinginitiatives with their leading counterparts in North

Ž .America i.e., Internet 2 and Canarie depend onhaving these tools available to ensure the coherenceof the Americans and Europeans AUPs concerningthe traffic to be exchanged among them.

1389-1286r99r$ - see front matter q 1999 Elsevier Science B.V. All rights reserved.Ž .PII: S1389-1286 99 00106-1

Page 2: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–23072294

The combination of all the above problems com-plicates the dimensioning, management, and opera-tion of Internet backbones. These tasks cannot besuccessfully performed without precise and up-to-date knowledge about what is going on in the net-work. There are two different dimensions to thisproblem: One is from the traffic perspective, and theother is from the purpose usage angle. The formermay need contents analysis in cases in which appli-cation identification by port is not possible, and itsresults are reasonably objective. The latter definitelyneeds contents analysis, and its results are based onheuristics and subjective techniques.

On the traffic analysis aspect, it is worth mention-ing some recent studies on the characterisation of

w xInternet traffic 1,2 . The traffic measurements ob-tained in these studies have revealed some interest-ing results about the Internet traffic patterns on

w xIPrATM backbones 3 . The main conclusion is,however, that Internet traffic is highly variable andunpredictable and the results of traffic analysis aretherefore only valid in the short-term.

On the purpose usage aspect, different heuristictechniques may be combined according to the natureof the traffic classification desired. As content analy-sis is costly in terms of CPU, a balance has to beestablished between confidence in the objectivity ofthe results and the processing power required. It isrecommended that heuristics be manually checkedfrom time to time in order to make the analysis moretight or relaxed, depending on the deviations be-tween real traffic and the diagnosis performed by thetool.

Data provided by MEHARI may serve differentpurposes, some of which are:Ø Network engineering to adjust capacity to traffic

matrix and hourly profiles.ŽØ Characterisation and analysis of user or user

.group traffic.Ø Identification of the information services, infor-

mation servers and client sites available on thenetwork.

Ø Classification of the network traffic by applica-tion and also by purpose usage.

Ø Validating the appropriate usage policy.Ø Billing and charging.Ø Detection and identification of security threats

w xand attacks 4 .

This paper describes the design of the MEHARIsystem, a high performance traffic processor foranalysing the headers and contents of Internet traffic.The system characteristics are modularity, scalabil-ity, high-performance, low-cost, flexibility andadaptability. The MEHARI hardware is based onlow-cost PC components, and the application pro-cessing may be performed on any UNIX machine.The MEHARI platform consists of PCs with STM-1cards for traffic capture and pre-processing. Thepre-processed traffic is fed from the platform to theMEHARI traffic analysis software modules. TheMEHARI capture and processing capabilities can beexpanded in modules by increasing the number ofhardware elements in the system configuration. TheMEHARI analysis functionality can be extended ormodified both by configuration and by adding onnew analysis software modules. Three examples ofthe type of information that can be obtained from thecurrent implemented traffic analysis modules are the

.following: 1 A heuristic classification of the trafficin academic, commercial, leisure and undetermined

.categories; 2 A classification of the traffic according.to its origin and destination; and 3 The verification

of port assignment of applications. The system in-cluding these three has been field tested and theresults of the trials are also described in the article,both in terms of functionality and performance.

The rest of the paper is structured as follows.Section 2 presents the functional architecture of theMEHARI system. Section 3 describes the applicationmodules currently supported by the system and thereasoning behind its development. Section 4 providessome sample results of the field trial of the MEHARIsystem in a real network scenario, the Spanish NRNŽ .RedIRIS . Finally, Section 5 summarises the mainconclusions of our work.

2. MEHARI functional architecture

Fig. 1 shows the functional architecture of theMEHARI system, which consists of the followingsubsystems:

( )Ø Traffic Capture Subsystem TCSS : This func-tional block is responsible for capturing the traffic

Page 3: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–2307 2295

Fig. 1. Functional architecture of the MEHARI system.

samples, which are subsequently analysed by otherblocks of the MEHARI system. The TCSS can beconfigured to capture ATM cells either in a givenlist of VPIrVCI pairs or in promiscuous modeŽ .all the VPIrVCI pairs . Cells are reassembledinto AAL5 frames, which are periodically dumpedto disk for further processing. Depending on thetype of analysis to be performed, the user canselect either dumping the whole AAL5 frame or

Ž .just part of it e.g., the first 48 bytes .It should be noted that, although the MEHARIsystem was initially conceived for monitoringIPrATM traffic, the TCSS does not make anyassumption about the content of the AAL5 frame.This way, the system could be used in the futureto analyse other protocols encapsulated overATM.The TCSS has been designed so that it can bephysically implemented on one or several capturemachines, thus allowing the capture ratio to beincreased as required. The capture platforms arestandard PCs running Free BSD. The captureitself is performed by two ATM Fore network

Žinterface cards one for each transmission direc-. Ž w x.tion using a special firmware OC3MON 3 .

Because of the use of standard hardware compo-

nents, the total cost of the system is quite lowŽ .approximately 6000 Euros per capture platform .

( )Ø Traffic Analysis Subsystem TASS : This func-tional block is responsible for analysing the traf-fic samples generated by the TCSS subsystem.The TASS contains the Pre-processing ModuleŽ . Ž .PPM and the Application Modules APMs .Because of the high traffic volume that can betransported on the ATM links that are monitored,the TASS must discard the traffic samples assoon as possible. This is precisely the reason forintroducing the PPM, which pre-processes thetraffic samples on the fly so that capture files aredeleted once the parameters of interest have beenextracted.Note that the rate at which the traffic samples arepre-processed is crucial for the overall perfor-mance of the system. In the MEHARI project we

Žconcentrated on analysing IP flows see SubSec-.tion 3.1 , which led us to develop a somewhat

CPU-intensive pre-processing of the traffic sam-ples. Other applications of the MEHARI systemwill probably require less processing capacity. Inany case, the TASS subsystem was designed sothat it can be physically implemented on one orseveral machines. This approach allows the pro-

Page 4: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–23072296

cessing capacity to be increased as required byadding more PCs or Workstations. In any case,note the existence of a self-regulation mechanism,which adapts the capture ratio to the TASS pro-cessing capacity.The traffic analysis is actually performed by theAPMs. Different types of APMs, correspondingto different types of analysis, can be present inthe TASS. Section 3 describes some of the APMsdeveloped for the MEHARI system. It should bepointed out that the MEHARI system has beendesigned so that new APMs can easily be added.Besides, there is the possibility of using severalPCs to perform a distributed analysis, so that theprocessing capacity can be matched to the re-quirements of the different APMs considered.

3. Application modules

This section describes the Application ModulesŽ .APMs specifically developed for the MEHARI pro-ject. As mentioned in Section 2, these APMs areonly an example of the different types of analysisthat can be carried out by the MEHARI system. Ourmain purpose in describing the following APMs is to

show the capabilities of the system and to provide asample of the wide range of possibilities for which itcould be considered.

3.1. Traffic classification module

The MEHARI project was conceived as the con-tinuation of the traffic measurements and networkusage characterisation studies carried out on the

Ž .Spanish NRN RedIRIS in the CASTBA project.The results of CASTBA revealed a number of diffi-culties in characterising Internet traffic by usingconventional traffic analysers. We found out that asimple analysis based on protocol and TCPrUDP

w xport numbers did not provide sufficient accuracy 5 .The reason is the existence of applications makinguse of unregistered ports and registered ports usedfor purposes other than those asigned. This resulted

Ž .in a large percentage of traffic 10–20% whosenature could not be determined or, what it is worse,an undetermined amount of traffic classified underthe wrong categories. One of the main motivations ofthe MEHARI project was precisely to develop anInternet traffic monitoring and analysis tool to solvethese uncertainties. What started as a modest attemptto overcome the limitations of TCPrUDP port traffic

Ž .Fig. 2. Traffic classification module TCM .

Page 5: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–2307 2297

Ž .Fig. 3. Traffic originrdestination analysis module TODM .

analysis turned out to form the Traffic ClassificationModule, which has been proved to be highly effec-tive for characterising Internet traffic and services.

Fig. 2 shows the structure of the MEHARI TrafficŽ .Classification Module TCM . This APM analyses

the data coming from the PPM, which is based on

the concept of IP flows and pattern recognition. ThePPM performs the statistical aggregation of all thepackets belonging to a same IP flow during a config-urable period of time. An IP flow is defined as theaggregation of packets sharing the quadruple IPsource address, TCPrUDP source port, IP destina-

Ž .Fig. 4. Internet header analysis module IHM .

Page 6: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–23072298

Fig. 5. Trial network configuration.

tion address, and TCPrUDP port. Furthermore, thePPM explores the contents of the packets trying todetermine the presence of specific patterns, whichcan be programmed by the user. As a result, thePPM periodically generates a file containing a sum-mary of the IP flows detected and the number oftimes that a given pattern has been detected in it.

The IP flows are further analysed by the TCM,which performs the correlation of the data corre-sponding to incoming and outgoing traffic. The re-

Žsult is a collection of bi-directional IP flows bi-.flows with a weighted list of symptoms for each

direction. The next step consists in applying a set ofheuristics, which can easily be configured by the

Ž .Fig. 6. Traffic classification by usage per RedIRIS regional nodes .

Page 7: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–2307 2299

Ž .Fig. 7. Traffic classification by usage global distribution .

user. The application of the heuristics causes a bi-flow to be classified under a particular traffic cate-gory, which has been previously defined by the user.For example, in the field trial described in Section 4of this paper, we considered four traffic categories,namely academic, commercial, leisure and undeter-mined. Finally, the TCM generates a summary filereporting the traffic volume under each traffic cate-gory considered. Section 4 shows an example of the

traffic classification summary reports that can beobtained with the TCM.

3.2. Traffic originrdestination analysis module

The Traffic OriginrDestination Analysis ModuleŽ .TODM performs a traffic classification based on

Ž .the originrdestination Autonomous System AS ofthe bi-directional IP flows identified by the TCM

Ž .module see Section 3.1 . The functional diagram ofthe TODM module is represented in Fig. 3.

To determine the AS to which an IP sourcerde-stination address belongs, the TODM makes use ofthe information stored in the Internet Routing Regis-

Ž .ter IIR databases. This information is periodicallydownloaded by the IIR Processor Block, which gen-erates a local AS database that is used by the ASIdentification Block. The AS database includes thename of the organisation responsible for the differentASs as well as the identification of the subnetworksbelonging to each AS.

The AS Identification Block analyses the IP ad-dresses by looking them up in the local AS database.As a result, it periodically generates a statistics

Ž .Fig. 8. Main origins of the RedIRIS traffic per RedIRIS regional nodes .

Page 8: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–23072300

ŽFig. 9. Main origins of the RedIRIS input traffic for the whole.RedIRIS network .

report file with the traffic volume exchanged be-tween each subnetwork belonging to the network

Ž .under study e.g., RedIRIS and the different ASs

registered in the IIR databases. The TODM applica-tion module can be used, for example, to determinethe fraction of traffic exchanged with a specificacademic or commercial network, or to obtain a listof the most visited ASs. Other possible applicationsof the TODM module could be the following:Ø Assessment of the external links usage: The sys-

tem provides a set of reports showing the use ofresources per sub-network of the monitored aca-demic network. This information is essential forfuture network re-design, billing schemes, etc.

Ø Billing and charging for the use of network re-sources: The distribution of traffic according tosub-network or AS allows supporting billing andcharging models based on destination where tar-iffs rely on source and destination IP address ormask, autonomous system number or chainŽ .route .

3.3. Internet headers analysis module

Ž .The Internet Headers Analysis Module IHMprovides a traffic classification based on the analysisof the three basic headers of the Internet protocolarchitecture: the IP packet header, the TCPrUDPsegment headers, and the service PDU header. Thestructure of the IHM module is shown in Fig. 4.

Ž .Fig. 10. Percentage of academic traffic found in the USA to RedIRIS link according to the IRR description .

Page 9: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–2307 2301

The main objective of the IHM is to differentiateŽas many traffic flows packet sequences with the

.same source and destination as possible, and then tocheck the coherence between the TCPrUDP portand the application within each flow. It differentiatesbetween the traffic that performs according to the

Žstandards the TCP or UDP port corresponds to.standard assignment and the traffic that does not. It

Žprovides statistics on the most visited local inside. Žthe academic network and remote outside the aca-

.demic network servers.The classification is a result of the verification of

the services contained in the routed traffic. Thisverification is carried out by checking the coherencebetween the TCPrUDP ports and the service PDUheader. In order to achieve this verification, weproduced a dictionary of the main well-known ser-vices containing a set of patterns extracted from eachservice specification. The verification is server-ori-ented, which provides better performance and lessconsumption of computing resources. Some possibleapplications of the IHM module could be billing or

auditing network usage, verification of the servicesrouted by the network, recording unknownrunusualtraffic, etc.

Some possible applications of the IHM modulecould be the following:Ø Verification ratio: Traffic distribution classified

Ž . Žas: verified checked ok , pending patterns not. Žfound in the dictionary , unknown ports not in-. Žcluded in the dictionary and rejected transport

.protocol different from TCPrUDP .Ø Identification of the main local and remote

serÕers: The verified traffic is classified accord-ing to the local or remote server addresses. Thisclassification allows the main remote and localservers to be highlighted, either according to theservice they provide or not.

Ø Classification of unknown traffic: Some of thefollowing heuristics can be useful for classifyingthe main servers and services that remain un-known for the network managers: Reports onremote and local addresses sorted by bytes served.Report of possible servers detected sorted by

Ž .Fig. 11. The 25 most visited commercial sites in Catalonia from input traffic captures .

Page 10: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–23072302

number of possible clients connected to. All thesereports highlight addresses, volumes, transportprotocols and application ports that enable systemauditors to inspect traffic and guess applications.

4. MEHARI system tests

The MEHARI system has been successfully testedon a real network scenario, the IPrATM backbone

Ž .of the Spanish NRN RedIRIS . Two MEHARI sys-tem units were deployed at two different monitoringpoints of the RedIRIS backbone. One of the MEHARIunits was installed in the RedIRIS central node inMadrid, according to the configuration shown in Fig.5. The other MEHARI unit was placed on Catalonia’sRedIRIS access network, with a network scenariovery similar to the one used in the central node.

It must be stressed that the results presentedhereinafter aims at providing some examples of theMEHARI reporting capabilities. The figures includedin these reports do not necessarily correspond to theactual value of the involved traffic parameters in theSpanish NRN. However, these values may corre-spond to a monitoring process in a given and limitedtime period, and therefore are not fully representa-tive.

Fig. 6 shows some examples of the results ob-Ž .tained by the Traffic Classification Module TCM at

the central node site in Madrid. For each of the 17regional links the histogram represents the percent-age of input traffic corresponding to the four cate-gories of traffic considered in the MEHARI projectŽ .academic, leisure, commercial and undetermined .The results correspond to the average obtained dur-ing a capture period of approximately four and a half

Ž .Fig. 12. The 25 most visited TEN-155 ASs in Catalonia from input traffic captures .

Page 11: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–2307 2303

Fig. 13. Traffic verification results.

Žmonths from 15 September 1998 to 2 February.1999 .

Fig. 7 shows the overall input traffic distributionfor the 17 regional nodes of RedIRIS as a whole forthe same period of capture than in Fig. 6.

For an example of the results provided by theTraffic OriginrDestination Analysis ModuleŽ .TODM , see Figs. 8–12. Fig. 8 shows the maintraffic origins of the RedIRIS traffic obtained from

an input traffic capture made during a period ofŽapproximately four and a half months from 15

.September 1998 to 2 February 1999 . The graphicsshow the percentage of input traffic to each of the 17RedIRIS regional nodes and the following origins:other RedIRIS sites, the Spanish commercial Inter-

Ž .net, the European research networks TEN-34r155 ,Žand the rest of Internet through the RedIRIS to USA

.link .Note then, in both cases Figs. 6 and 8, the fact

that the distribution is given in percentage of cap-tured traffic hides the differences in the amount oftraffic handled by the different regional nodes ofRedIRIS.

Fig. 9 shows the overall input traffic distributionfor the 17 RedIRIS regional links for the samecapture period as in Fig. 8.

Fig. 10 shows an attempt to measure the percent-age of academic traffic in the link between RedIRIS

Fig. 14. The 25 most visited servers from Catalonia.

Page 12: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–23072304

and USA. Note that Fig. 9 is based on the pes-simistic approach, and only the traffic from USAŽ .input traffic to RedIRIS originating at an institutiondeclared as being an academic or research centre inthe Internet Routing Register is considered academic.The rest is considered commodity traffic. This result

Žcorresponds to one month’s capture between.September and October 1998 .

As regards traffic classification by commercialdestinations, the TODM module provides separatestatistics for each of the Spanish NRN regionalnodes. Fig. 11 shows the main commercial origins ofthe input traffic to the Catalan node. The captureperiod in this case goes from January to February1999. In this figure only the 25 most visited sites are

Ždepicted: the rest 958 sites for the input traffic.graphic and 990 for the output traffic graphic are

compiled in the last column. Note that the 25 mostvisited sites collect around 60% of the capturedtraffic.

Another interesting result is the traffic classifica-tion by the most visited ASs of the European aca-

Ž .demic network TEN-155 . As for the main commer-cial origins, the TODM module provides separatestatistics for each of the Spanish NRN regionalnodes. Again, as an example, Fig. 12 includes thestatistics on the main visited ASs of the Catalan node

Žfor the same capture period as above January and.February 1999 .

Figs. 13–16 show some examples of the resultsŽ .provided by the Internet Header Module IHM . Fig.

13 shows the percentage of verified traffic, pendingŽ . Žtraffic no pattern found , unknown traffic ports not

. Žregistered and rejected traffic other protocols than.TCPrUDP , for the traffic captured in the Catalan

sub-network in January and February 1999.Figs. 14 and 15 present one application of the

tool: finding the most relevant servers. Host andapplication are determined for the 25 most visitedremote and local servers respectively from the traffic

Fig. 15. The 25 most visited servers of Catalonia.

Page 13: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–2307 2305

Fig. 16. Statistics of the unknown traffic.

captured during January and February 1999. In thesegraphics the input and output traffic are computedtogether. Note that remote servers collected 74% of

Ž .the total amount of captured traffic see Fig. 14 .Local servers collected only 26%, the remainder of

Ž .the captured traffic see Fig. 15 . Also note that inboth remote and local server graphics the last col-umn represents the traffic collected by the rest of theservers, other than the 25 most visited. The numberof these servers is 865 811 and 153 900 respectively.

Fig. 16 shows an example of the possible reportson the unknown traffic that can be provided by theIHM. It is an attempt to find possible servers by the

Žnumber of their possible clients number of sources.sending traffic to that destination . The graphics of

the Fig. 16 corresponds to a traffic capture of asingle day.

5. Conclusions

The flexibility of the IP protocol combined withbroadband networks creates different difficulties forthe dimensioning, management, and operation of thistype of infrastructure. The permanent availability ofdetailed and updated information about network traf-fic is critical for network tuning, accounting, en-

forcement of acceptable use policy and security pur-poses.

The MEHARI platform allows data capture rateson STM-1 lines with pricerperformance ratios tentimes better than commercial protocol analysers, alsohaving a much higher range on the number of VCIssnooped. As capture is at the AAL5 level, it isindependent of the higher level protocol, and may becustomised to capture only part of the packets, orcertain packet types. Furthermore, its modular designallows the traffic capture probes to be distributedthroughout the network, consolidating pre-processedflow information to a central site.

MEHARI traffic processing modules currentlyprovide information about conventional traffic statis-tics, the autonomous systems’ traffic matrix-by-usergroup, validation of the port-to-application assign-ment and heuristic detection of unacceptable trafficflows. They can be customised andror extended tosuit the changing traffic information requirements ofnetwork administrators.

The system field trial has been running 24 hours aday, 30 days a month for several months. The fieldtrial results have been cross-checked to verify thecorrectness of objective measurements and the statis-tical validity of heuristic results on traffic types.These results give great confidence in the resilienceand accuracy of the MEHARI system.

Page 14: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–23072306

Current work is expected to improve the capabili-ties of the capture and pre-processing platform, inorder to obtain better pricerperformance captureratios. More traffic processing modules are also be-ing developed, mainly targeted at security and charg-ing applications.

Two practical applications of the MEHARI sys-tem are monitoring the traffic to support billing andcharging mechanisms for Academic and CorporateNetworks and auditing the National R&D Networks

Ž .AUP Acceptable Use Policy

Acknowledgements

The MEHARI project was funded by the CICYTŽ .Comision Interministerial de Ciencia y Tecnologıa´ ´under contracts TEL97-1897-E and TEL97-TEL97-1893-E. The authors thank all those participating inthe MEHARI project along with the authors for theircontribution to this work. These contributors are:Xavier Martınez, Carles Veciana, Albert Renom and´Sergi Sales of UPC, David Larrabeiti of UC3M andJulio Berrocal and Ana B. Garcıa of UPM. The´authors also thank the staff at RedIRIS, Telefonica´

4 ŽIqD and C Centre de Computacio i Comunica-´.cions de Catalunya , who have been following the

work done in this project, for their valuable com-ments and suggestions. Finally, many thanks to theOC3-MON development team for all their help inadapting the modified FORE firmware to theMEHARI capture modules.

References

w x1 K. Thompson et al., Wide-area Internet traffic patterns andŽ .characteristics, IEEE Network NovemberrDecember 1997 .

w x2 M. Alvarez-Campana et al., CASTBA: medidas de trafico´sobre la red acedemica espanola de banda ancha, TELECOM´ ˜IqD, Madrid, October 1998.

w x3 J. Aspirdof et al., OC3MON: flexible, affordable, high-perfor-mance statistics collection, INET’97, Malasya, June 1997.

w x4 D. Newman et al., Intrusion detection systems, suspiciousŽ .finds, Data Communications August 1998 .

w x5 M. Alvarez-Campana, A. Azcorra, J. Berrocal, D. Larrabeiti,J.I. Moreno, J.R. Perez, CASTBA: Internet traffic measure-´

ments over the Spanish ATM network, HP-OVUA Workshop,Rennes, France, April 1998.

Ž .Pedro J. Lizcano [email protected] a Telecommunication EngineeringMaster degree from the Universitat Po-

Žlitecnica de Catalunya UPC, Barcelona,`.in 1983 . From 1983 to 1988 he was

with the R&D Department at Telefonica´de Espana, involved in advanced com-˜munications systems development suchas a multi-access radio system for rural

Žapplications MARD fully industrialised.by Telettra and deployed worldwide ,

and a second generation X.25 switchingŽpacket system TESYS-B, a major joint development between

Telefonica industrial group companies and Fujitsu, deployed na-´.tionwide, in which he led the hardware platform development . In

1988 he joined Telefonica R&D as the Packet Switching Division´manager. From 1991 to 1993 he worked with AT&T Bell Labora-

Ž .tories in Denver CO, USA as Resident Visitor in a forward-look-ing work project dealing with SONETrSDH high speed broad-band interfaces. In 1993, back at Telefonica R&D, he led the´ATM Systems Division, fully involved in the development of a

Ž .B-ISDN platform RECIBA for benchmarking the emerging mul-timedia services. He also worked on a number of EURESCOM

Ž .projects concerning network synchronisation and in RACE IIŽand ACTS EU-programme projects dealing with advanced net-

.working and services . Since 1994 he has been working asexternal expert and adviser with the EU Commission in pro-gramme definition, proposal selection and technical audit pro-cesses. From 1996 to now he has held the positions of head of theareas dealing with International Development, Basic Technologiesand Product Engineering. He is currently leading the Technologi-cal Innovation Programme area. Since 1995 he has been a technol-ogy adviser at the Interministerial Commission of Science andTechnology, leading a National R&D Programme on TelematicServices and Applications. He is the national representative in the

ŽEuropean Networking Policy Group ENPG, currently as Vice.President . He is also member of the Board of Trustees of the

Ž . ŽInternational Computer Science Institute ICSI in Berkeley CA,.USA . He has been a member of the IEEE for over 20 years and a

member of the Planetary Society.

Ž .Arturo Azcorra [email protected] awarded an M.Sc. degree inTelecommunications from the TechnicalUniversity of Madrid in 1986 and aPh.D. from the same university in 1989.In 1990 he became an associate profes-sor at the Department of Telematics En-gineering, Technical University ofMadrid, Spain. Since 1998 he has beenlecturing at Universidad Carlos III deMadrid, Spain. Current research projectsinclude broadband networks, multicast

teleservices, intelligent agents, and IPrATM convergence.

Page 15: MEHARI: a system for analysing the use of the internet servicesmac/publications/1999/cn99-mehari.pdf · 2001-03-08 · Computer Networks 31 1999 2293–2307 . MEHARI: a system for

( )P.J. Lizcano et al.rComputer Networks 31 1999 2293–2307 2307

Ž .Josep Sole-Pareta [email protected]´was awarded his Master’s degree inTelecommunication Engineering in1984, and his Ph.D. in Computer Sci-ence in 1991, both from the Universitat

Ž .Politecnica de Catalunya UPC . In 1984`he joined the Computer ArchitectureDepartment of UPC. Since 1992 he hasbeen an Associate Professor with thisdepartment. He spent the summers of1993 and 1994 at the Georgia Instituteof Technology. He has participated in

the Spanish R&D Programme for the development of the Broad-Ž .band Communications in Spain PLANBA . He is a member of

the Advanced Broadband Communications laboratory of UPCŽ .http:rrwww.ac.upc.esrCCABA . His current research interestsare in Broadband Internet, ATM Networks and Optical PacketNetworks, with emphasis on traffic engineering, traffic characteri-sation, traffic management and QoS provisioning. He is a member

Ž .of the IEEE and the ACM Sigcomm .

ŽJordi Domingo-Pascual [email protected] is an Associate Professor of

Computer Science and Communicationsat the Universitat Politecnica de`

Ž .Catalunya UPC in Barcelona. There,he received the engineering degree in

Ž .Telecommunication 1982 and thePh.D. Degree in Computer ScienceŽ .1987 . In 1983 he joined the ComputerArchitecture Department. Since 1994 heis co-founder and researcher at the Ad-vanced Broadband CommunicationsŽ .Center of the University CABA which participates in the Span-

ish National Host and in the PLANBA demonstrator. He wasvisiting reseacher at the International Computer Science Institute

Ž .in Berkeley CA for six months. His research topics are Broad-band Communications and Applications. Since 1988 he has partic-

Ž .ipated in RACE projects Technology for ATD and EXPLOIT , inŽseveral Spanish Broadband projects PLANBA: AFTER, TR1 and

. Ž .IRMEM , ACTS projects INFOWIN, MICC, IMMP , and span-Ž .ish research projects CASTBA, MEHARI, SABA . A more

detailed information may be found in http:rrwww.ccaba.upc.es.

ŽManuel Alvarez-Campana [email protected] received his M.S. and Ph.D.

degrees in Telecommunication Engi-neering from the Universidad Politecnica´de Madrid. In 1989 he joined the Depar-tamento de Ingenierıa Telematica at the´ ´Universidad Politecnica de Madrid´Ž .DIT-UPM , where he is an associateprofessor since 1996. He has partici-pated in several projects within theframework of European and Spanish R&D programs. His current research in-

terests include design and analysis of broadband networks, inte-gration of services, traffic characterisation, and network perfor-mance evaluation.