7
International Journal of Advanced Computer Science, Vol. 3, No. 8, Pp. 388-394, Aug. 2013. Manuscript Received: 15,Apr., 2011 Revised: 5,May, 2011 Accepted: 21,Jun., 2011 Published: 15,Jul., 2011 Keywords Network Traffic, Traffic Classification, Multifractal Analysis, Multiplicative Cascade Abstract In this work, we present a traffic classifier based on the theory of multifractal network traffic. We use precisely the concept of multiplicative binomial cascades to get a feature vector to be used in the classification scheme. This vector is obtained by the multiplier variances of the multiplicative cascade traffic view. We analyze the performance of the technique proposed by a popular ML Software-based and the results showed viability classification rates of traffic over 90%. 1. Introduction The global data traffic on the internet has grown rapidly and must quadruple over the next few years as Cisco predicts [1], mainly driven by the greater number of devices (tables and smartphones), the growth of users, the highest speed broadband and the increment of video on the network. In this scenario the correct classification of types of traffic plays an important role. The network management tasks, such as workload characterization, capacity planning, provision of routes, traffic control and policing depends on the identification and classification of network traffic [2]. Network operators need to know what is flowing through their networks in real time so they can react quickly to avoid many problems and achieve their business goals. Thus if these operators want to block incoming traffic to a protocol on your network or if any IPS (Internet Server Provider) tries to process different types of connections with different priority, e.g. limiting the delay of the data in real time, the identification of the protocol in use is key [3]. Therefore the precise classification of network traffic is essential for various activities related to networks, from security monitoring to accounting, and from Quality of Service to providing operators with useful forecasts for long-term provisioning [4]. Increasingly, new applications are being deployed on the internet, e.g. P2P, voip, video, applications that have Yulios Zavala, Jeferson Wilian de Godoy Stênico and Lee Luan Ling are with School of Electrical and Computer Engineering, State University of Campinas Unicamp, PO Box 6101-13.083-970, Campinas, SP, Brazil, e-mail: {yulíos,jeferson,lee}@-decom.fee.unícamp.br. become popular quickly, and that increase the use of ports unpredictable. Thus, with this evolution of the traffic the traditional classification techniques, such as those based on the well-known port numbers or payload of the packet analysis [5], are not effective for all types of network traffic, or are unable to be deployed because of concerns about security or privacy for the data. Because of this great importance, different techniques have been studied and used to classify network traffic. In this paper we propose a new multifractal technique based in the use of cascades multiplicative to development a reliable internet traffic classifier. After extract characteristics of a group of records with multifractal theory we use them as inputs of machine learning algorithm to determine the performance and feasibility of the use of multifractal analyze. The paper is organized as follows: Section 2 reviews related work in this field. Section 3 overview the basic concepts of multifractal and multiplicative cascades. Section 4 gives the technique experiments and results. At last, we conclude the paper. 2. Related Work There are a considerable number of works that study classification techniques of network traffic and internet. This section provides an overview of these techniques and systems that are related to our work. The classical techniques using the well-known port numbers defined by IANA(Internet Assigned Numbers Authority) to identify internet traffic (e.g. Domain Name Service applications commonly use port 53) but currently this technique is ineffective because there are applications that use dynamic port numbers to mask their traffic to be recognized as known applications (e.g. the current generation of P2P applications). The work of Karaginnis et al. shows the classic techniques problems [6]. Another technique used is the analysis of packet payloads as in [7]. This technique searches in payloads features to differentiate one application from other but can be difficult or maybe impossible when analyzed applications using proprietary protocols or encrypted traffic. The need to work with traffic patterns, large sets of multi-dimensional data and various types of traffic attributes are the reasons for the introduction of ML (Machine Learning) techniques in this field. Nguyen et al. survey and compare the complete literature in the field of ML-based traffic classification in [8]. Moore et al. [4] proposed 249 flow discriminators and used machine Internet Traffic Classification Using Multifractal Analysis Approach Yulios Zavala, Jeferson Wilian de Godoy Stênico & Lee Luan Ling

Internet Traffic Classification Using Multifractal Analysis Approach

Embed Size (px)

DESCRIPTION

Yulios Zavala Huaman, Jeferson Wilian de Godoy Stênico, Lee Luan LingInternational Journal of Advanced Computer Science, Vol. 3, No. 8, Pp. 388-394, Aug. 2013.

Citation preview

Page 1: Internet Traffic Classification Using Multifractal Analysis Approach

International Journal of Advanced Computer Science, Vol. 3, No. 8, Pp. 388-394, Aug. 2013.

Manuscript Received:

15,Apr., 2011

Revised:

5,May, 2011

Accepted:

21,Jun., 2011

Published: 15,Jul., 2011

Keywords

Network

Traffic,

Traffic

Classification,

Multifractal

Analysis,

Multiplicative

Cascade

Abstract In this work, we present a

traffic classifier based on the theory of

multifractal network traffic. We use

precisely the concept of multiplicative

binomial cascades to get a feature vector to

be used in the classification scheme. This

vector is obtained by the multiplier variances

of the multiplicative cascade traffic view. We

analyze the performance of the technique

proposed by a popular ML Software-based

and the results showed viability classification

rates of traffic over 90%.

1. Introduction

The global data traffic on the internet has grown rapidly

and must quadruple over the next few years as Cisco

predicts [1], mainly driven by the greater number of devices

(tables and smartphones), the growth of users, the highest

speed broadband and the increment of video on the network.

In this scenario the correct classification of types of traffic

plays an important role.

The network management tasks, such as workload

characterization, capacity planning, provision of routes,

traffic control and policing depends on the identification

and classification of network traffic [2]. Network operators

need to know what is flowing through their networks in real

time so they can react quickly to avoid many problems and

achieve their business goals. Thus if these operators want to

block incoming traffic to a protocol on your network or if

any IPS (Internet Server Provider) tries to process different

types of connections with different priority, e.g. limiting the

delay of the data in real time, the identification of the

protocol in use is key [3].

Therefore the precise classification of network traffic is

essential for various activities related to networks, from

security monitoring to accounting, and from Quality of

Service to providing operators with useful forecasts for

long-term provisioning [4].

Increasingly, new applications are being deployed on

the internet, e.g. P2P, voip, video, applications that have

Yulios Zavala, Jeferson Wilian de Godoy Stênico and Lee Luan Ling

are with School of Electrical and Computer Engineering, State University

of Campinas – Unicamp, PO Box 6101-13.083-970, Campinas, SP, Brazil,

e-mail: {yulíos,jeferson,lee}@-decom.fee.unícamp.br.

become popular quickly, and that increase the use of ports

unpredictable. Thus, with this evolution of the traffic the

traditional classification techniques, such as those based on

the well-known port numbers or payload of the packet

analysis [5], are not effective for all types of network traffic,

or are unable to be deployed because of concerns about

security or privacy for the data.

Because of this great importance, different techniques

have been studied and used to classify network traffic. In

this paper we propose a new multifractal technique based in

the use of cascades multiplicative to development a reliable

internet traffic classifier. After extract characteristics of a

group of records with multifractal theory we use them as

inputs of machine learning algorithm to determine the

performance and feasibility of the use of multifractal

analyze.

The paper is organized as follows: Section 2 reviews

related work in this field. Section 3 overview the basic

concepts of multifractal and multiplicative cascades.

Section 4 gives the technique experiments and results. At

last, we conclude the paper.

2. Related Work

There are a considerable number of works that study

classification techniques of network traffic and internet.

This section provides an overview of these techniques and

systems that are related to our work.

The classical techniques using the well-known port

numbers defined by IANA(Internet Assigned Numbers

Authority) to identify internet traffic (e.g. Domain Name

Service applications commonly use port 53) but currently

this technique is ineffective because there are applications

that use dynamic port numbers to mask their traffic to be

recognized as known applications (e.g. the current

generation of P2P applications). The work of

Karaginnis et al. shows the classic techniques problems [6].

Another technique used is the analysis of packet

payloads as in [7]. This technique searches in payloads

features to differentiate one application from other but can

be difficult or maybe impossible when analyzed

applications using proprietary protocols or encrypted traffic.

The need to work with traffic patterns, large sets of

multi-dimensional data and various types of traffic attributes are the reasons for the introduction of ML

(Machine Learning) techniques in this field. Nguyen et al.

survey and compare the complete literature in the field of

ML-based traffic classification in [8]. Moore et al. [4]

proposed 249 flow discriminators and used machine

Internet Traffic Classification Using Multifractal

Analysis Approach Yulios Zavala, Jeferson Wilian de Godoy Stênico & Lee Luan Ling

Page 2: Internet Traffic Classification Using Multifractal Analysis Approach

Zavala et al.: Internet Traffic Classification Using Multifractal Analysis Approach.

International Journal Publishers Group (IJPG) ©

389

learning to select those best to classify new flows [9].

Similar strategies were applied in [10-14] to determine the

class or protocol type of traffic analyzed. Although exist an

extensive work in the field of traffic classification, there are

some important issues remain unresolved and consequently

the majority of ML-based techniques are not used by the

network operators [15].

3. Methodology

The traffic on communications networks is analyzed

using probabilistic processes that represent the impose users

utilization on network resources. So are considered

variables such as inter-arrival time of packets, time between

connections, length of connections, packet length, and

duration between sessions. In the beginning researches was

thought that inter-arrival times were independent of each

other, and the amount of demand. Subsequently was

necessary including the effect of correlation between these

variables. So they began using Poisson traffic models where

the correlation falls exponentially over time.

Important theoretical concepts to the analyst of

networks appear in 1941 with Kolmogorov who introduced

the concept of self-similarity to describe scaling process

without changes in their statistical properties [16] and in

1977 Mandelbrot proposed the term fractal to describe

irregular objects [17]. With these concepts in 1993 Leland et.

al. [18], using Ethernet traffic collected in the network of

Bellcore Morristown Research and Center Engineering,

demonstrated that traffic traces of modern high speed data

networks exhibit fractal properties, such as self-similarity

and long-range dependence (LRD). It was found that these

properties, especially the long-range dependence, have a

strong influence on network performance [19], however not

being adequately modeled by Poisson processes or more

generically, Markov models.

In contrast to the self-similar or monofractal behavior,

some recent studies suggest that the measured TCP/IP and

WAN ATM traffic flows exhibit a more complex scaling

behavior, which is consistent with multifractals [20, 21].

Multifractal based traffic modeling is more general than the

monofractal based (e.g., self-similar and long range

dependent), and provides a more accurate and detailed

description of network traffic series in different time scales

[22].

Many different multifractal traffic models have been

proposed. Most and widely studied ones include: MWM –

( Multifractal Wavelet Model) [23], AWMM - (Adaptive

Wavelet Based Multifractal Model) [24], MMNB –

(Multifractal Model based in Newton Binomial) [25],

Multi-scaling Models with Lognormal [26] and Pareto [27]

distributed traffic loads, and VVGM (Variable Variance

Gaussian Multiplier) [28].

This section explains the procedure of constructing

conservative multiplicative cascade and presents the

construction of inverse cascade, as a method to verify that a

given set of data is consistent with a conservative cascade

construction.

A. Multifractal

The concept of multifractal process was introduced by

Mandelbrot in the context of turbulence [16]. Currently the

multifractal theory has found applications in several areas

that need to describe non-linear phenomena which have

multiplicative structure, such as stock prices [29],

geophysical phenomena [30], evolution of DNA [15], traffic

modeling [23-25],[28], and others. The network traffic to be

considered multifractal means it has a strong dependence on

the inherent structure, with an incidence of bursts at various

scales. These characteristics make the network performance

be worse than that estimated using Gausssian and

short-dependency models [31].

The simplest multifractal is typically constructed by an

iterative procedure called multiplicative cascade [17]. In

this study we use the concept of multiplicative cascades to

building our internet traffic classifier.

B. Multiplicative Cascades

Definition 1: A multiplicative cascade is an iterative

process that fragments a given set into smaller pieces

according to some geometric rule and at the same time the

total mass distribution in the given set according to another

rule.

The cascade called binomial, i.e., where the division of a

given set occurs every two, is the simplest way to obtain a

multifractal process. Given a closed set [0,1] it generates a

multiplicative cascade as follows:

Let m0 = r and m1 = 1 − r , two multipliers for

cascade generation, possibly with random r. At stage n = 0

of the cascade iteration, we have the unit measure denoted

by μ0 uniformly distributed on interval [0, 1]. At stage

N = 1 the initial measure is divided into two parts, m0 on

the subinterval [0,1/2] and mass m1 on [1/2, 1] . At

stage N = 2, the interval [0,1/2] is again divided into two

subintervals [0,1/4] and [1/4, 1/2] and the procedure is

repeated for interval [1/2,1], with the following measures

[17]:

μ2[0,1/4] = m0m0 μ2[1/4,1/2] = m0m1

μ2[1/2,3/4] = m1m0 μ2[3/4,1] = m1m1 (Equ. 1)

This process is iterated for k levels, and at each stage

it can be seen that total measure is preserved. Considering

the kth stage of the cascade, the mass is fragmented over

the dyadic sub-intervals of type [t, t + 2−k] with their

corresponding measures μ′s. Let φ0 and φ1 denote the

relative frequencies of 0′s and 1′s, respectively, in the

cascade development. The measure μ in the dyadic interval

[t, t + 2−k] is given by:

μ[t, t + 2−k] = μ[∆k] = m0kφ0m1

kφ1 (Equ. 2)

In Figure 1 illustrates the formation of this cascade for

two stages.

Page 3: Internet Traffic Classification Using Multifractal Analysis Approach

International Journal of Advanced Computer Science, Vol. 3, No. 8, Pp. 388-394, Aug. 2013.

International Journal Publishers Group (IJPG) ©

390

Fig.1. Binomial Multiplicative Cascade

Conservative cascades are a type of multiplicative

cascade that conserves mass in all its stages. Conservative

cascades are arise naturally in the data network context and

the inverse-cascade construction provides a simple heuristic

for checking whether or not a given data set conforms to an

underlying conservative cascade construction [32]. So the

main objective of building the inverse cascade is verify or

not the conservative rule in the mass redistribution of an

initial range for the two subintervals, and if so it, infer the

relevant statistical properties of the cascade conservative

generator.

For example, we use the data set WWW (web traffic)

and bitTorrent. So we take the arrival time traffic data at

stage (N − 1). The traffic series at cascade stage N can be

obtained by adding consecutive values of the later stage in

non-overlapping blocks of size 2. Similarly, given the

number on the scale (N − j), XiN−j

, (i = 1,… , 2N−j), obtain

data on the scale by adding (N − j − 1) , consecutive

values of stage (N − j) as follows:

XiN−j−1

= X2i−1N−j

− X2iN−j

(Equ. 3)

For i = 1,… , 2N−j−1. This procedure terminates when

the aggregation value is a single point on the final stage of

the cascade. An estimate rj(i)

multipliers can be obtained by

the following equation adapted [32]:

rj(i)=

XiN−j

X2i−1N−j−1 (Equ. 4)

For i = 1,… , 2N−j−1 . We can consider rj(i)

as

samples of the distribution of multipliers fRj(r) in stage

j. The multiplier distribution at scale j, can be obtained

from the histogram of rj(i)

.

Figure 2 and 3 shows the levels 6 and 9 of the inverse

cascade construction using the WWW traffic trace and

Figure 4 shows the histograms for bitTorrent traffic

trace, to stages N = 5 and N = 8. It can be observed that the

distribution of the multipliers (generator of the

multiplicative cascade) is approximately Gaussian, with

mean 0.5.

From the distributions obtained, we estimate the

variance at each stage of the inverse cascade, as seen in

Figure 5.

Fig. 2. Inverse Cascade – Stage 6

Fig. 3. Inverse Cascade – Stage 9

(a) (b)

Fig. 4 (a) Stage N = 5, (b) Stage N = 8

Fig.5. Measured Variance

C. Proposed Approach The main reason of this work is to show the

performance and viability of our multifractal Internet traffic classifier.

Consider the variable arrival time of a set of packets that belongs to the same type of traffic. We use the construction process of inverse cascade explained in Section above for obtain the multipliers of the variable considered.

The variance values of the multipliers obtained are

placed in a vector called the “feature vector”. For example,

if we analyze a set of 256 packages will get a cascade of

eight levels and our feature vector was formed by the

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

Mul

tiplie

r0 50 100 150 200 250 300 350 400 450 500

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

TimeM

ulti

plie

r

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

25

30

35

r

fR

(r)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

70

80

90

100

rfR

(r)

1 2 3 4 5 6 7 8 9 10

0

0.02

0.04

0.06

0.08

0.1

0.12

Stage

Variance

WWW

bitTorrent

Page 4: Internet Traffic Classification Using Multifractal Analysis Approach

Zavala et al.: Internet Traffic Classification Using Multifractal Analysis Approach.

International Journal Publishers Group (IJPG) ©

391

variance values of the multipliers in each level, thus, this

vector will have eight values.

We build the training dataset that consists of pairs

{features, label}, where features are represented as a vector

of traffic features and the label is a name that identifies the

network application that generated the traffic. So in the

training phase we decided to use the C4.5 supervised ML

method [33] given its high accuracy and low overhead

compared to other ML techniques.

The C4.5 algorithm generates a decision tree from data,

performed recursively by partitioning. The algorithm

considers all possible tests that can divide the data set and

select the test that results with the highest information gain.

For each discrete attribute, it is considered a test with n

outcomes, where n is the number of possible values that can

take the attribute. For each continuous attribute, binary test

is performed on each of the attribute values. At each node,

the system must decide which test chooses to split de data.

This C4.5 algorithm builds in an offline phase a model from

a pre-classified training dataset build before. In the

identification phase we first extract the feature vector of the

set of traffic records to analyze. This vector is used by the

trained model, built in the previous phase, to generate the

application prediction.

Figure 6 shows the scheme used in the process of

classifying an unknown traffic.

Fig 6. Classification Technique.

4. Experimental Results

In this section we explain how the experiments have

been configured, the datasets used and the categories of

traffic analyzed.

We tested the proposed technique on three traffic

datasets. So the first one dataset consisted of anonymized

payload traces collected at two edge links located in Italy,

Spanish and USA. The UNINA trace [34] was captured at a

link with 200Mbps network connection of the University for

the Rest of the Internet. These traces are in tcpdump format.

We used as the www traffic category Trace1 that is the

traffic to TCP port 80, generated by clients inside the

network at the University of Napoles Federico II - Unina -

reaching the outside world. As a traffic sample for the

category Mail was used Trace1, this is traffic to TCP port

25 generated by clients inside the network at Unina

reaching the outside world. Table 1 shows this packet

dataset.

TABLE 1

PACKET DATA SET

Class Port Instances

www 80 70000

Mail 25 60000

The second dataset was the evaluation dataset used in the

paper [35] consists of seven traces collected at the Gigabit

access link of the Universitat Politècnica de Catalunya

(UPC), which connects about 25 faculties and 40

departments (Geographically Distributed in 10 campuses) to

the Internet through the Spanish Research and Education

network (RedIRIS). We only use four application types

(Bittorrent, Domain Name Service - Dns, http and voip) of

the UPC-II. Table 2 shows this flow dataset.

TABLE 2

FLOWS DATA SET

Application Instances

BitTorrent 70000

DNS 60000

HTTP 60000

Skype 70000

The last data set used was the NSL-KDD [37] that is a

improvement Data Set of KDD-99 used for The Third

International Knowledge Discovery and Data Mining Tools

Competition, the competition task was to build a network

intrusion detector, a predictive model capable of

distinguishing between „bad‟ connections, called intrusions

or attacks, and „good‟ normal connections. Table 3 shows

this anomalies dataset.

TABLE 3

ANOMALIES DATA SET

Application Instances

Normal 67343

Attacks 58630

We performed our experiments on an Intel Pentium Dual

Core 1.86GHz and 2.00GB of RAM. The inverse cascade

construction algorithm was implemented in MATLAB 7. In

this paper, we use WEKA ML software [28] to build the J48

decision tree, an open source java extension of the original

C4.5. This software is also used in the work of Moore et al.

to perform their analysis [4].

To evaluate the performance of classification algorithm

were used metrics: Detection rate (DR), True positive rate

(TPR), False positives rate (FPR). For each traffic class is

defined metrics TPR and FPR as:

=

(Equ. 5)

=

(Equ. 6)

Page 5: Internet Traffic Classification Using Multifractal Analysis Approach

International Journal of Advanced Computer Science, Vol. 3, No. 8, Pp. 388-394, Aug. 2013.

International Journal Publishers Group (IJPG) ©

392

where TP is the number of correctly classified positives

samples, TN is the number of correctly classified negatives

samples, FP is the number of incorrectly classified positives

samples and FN is the number of incorrectly classified

positives samples.

We can observe that the metric TPR is the ratio of the

number of positive samples correctly classified and the total

number of positive samples and the metric FPR is the ratio

between the number of negative incorrectly classified

samples and the total number of negative samples. The

metric DR is defined as:

=

(Equ. 7)

Tables 4 and 5 shows classification rates metrics of

Packets DataSet using seven (7n) and eight (8n) cascade

stages with two traffic features (Inter-arrival Time and

Packet Size).

TABLE 4

PACKETS DATASET-TRAIN RESULTS

Class 7 Stages 8 Stages

TPR(%) FPR(%) TPR(%) FPR(%)

www 90.7 30.1 97.4 15.8

Smtp 69.9 9.3 84.2 2.6

TABLE 5

PACKETS DATASET-TEST RESULTS

Class 7 Stages 8 Stages

TPR(%) FPR(%) TPR(%) FPR(%)

www 84.7 34.6 97.6 28.2

Smtp 65.4 15.3 71.8 2.4

Tables 4 and 5 show that with seven stages of the

cascade we obtain an accuracy of 90% and 97% for eight

stages so that we can observe that with greatest number of

stages used, the feature vector will be larger which implies a

better classification of traffic. This is confirmed by

observing the detection rates shown in the Table 6.

TABLE 6

PACKETS RESULTS

7 Stages 8 Stages

DR (%) DR (%)

Train 81.06 91.32

Test 75.46 85.18

Tables 7 and 8 shows classification rates metrics of

Flows DataSet using seven (7n) and eight (8n) cascade

stages with three traffic features (Packets Number, Flow

Bytes and Flow Time).

TABLE 7

FLOWS DATASET-TRAIN RESULTS

Application 7 Stages 8 Stages

TPR(%) FPR(%) TPR(%) FPR(%)

Bittorrent 86.6 2.4 93.4 0.7

Dns 99.8 4.4 98.7 1.7

Http 93.6 0.1 99.8 0.6

Skype 98.2 0.5 98.9 0.1

TABLE 8

FLOWS DATASET-TEST RESULTS

Application 7 Stages 8 Stages

TPR(%) FPR(%) TPR(%) FPR(%)

Bittorrent 72.9 5.8 73.8 5.8

Dns 82.1 8.9 87.2 0.6

Http 98.7 0.1 99.9 0.1

Skype 92.9 3.3 90.5 0.1

We can see in the rates of classification for the tables 7

and 8 increased due to the use of more features for our

analysis in this case three. Thus the use of more features

improves the performance of our technique, as also shown in

the Table 9 rates.

TABLE 9

FLOWS RESULTS

7 Stages 8 Stages

DR (%) DR (%)

Train 94.43 97.63

Test 86.50 87.65

Tables 10 and 11 shows classification rates metrics of

Anomalies DataSet using seven (6n) and seven (7n) cascade

stages with four traffic features (Count, srv_count,

dst_host_coun, dst_host_srv_count).

TABLE 10

ANOMALIES DATASET-TRAIN RESULTS

Class 6 Stages 7 Stages

TPR(%) FPR(%) TPR(%) FPR(%)

Normal 99.4 0.5 99.9 0.2

Attacks 99.5 0.6 99.8 0.1

Fig. 7 True Positive Rate of levels 5 to 10 for Anomalies Train Dataset.

TABLE 11

ANOMALIES DATASET-TEST RESULTS

Class 6 Stages 7 Stages

TPR(%) FPR(%) TPR(%) FPR(%)

Normal 98.7 22.5 99.9 17.0

Attacks 77.5 1.3 83.0 0.1

5 6 7 8 9 1096

97

98

99

100

Stages

Tru

e P

ositiv

e R

ate

Normal

Attacks

Page 6: Internet Traffic Classification Using Multifractal Analysis Approach

Zavala et al.: Internet Traffic Classification Using Multifractal Analysis Approach.

International Journal Publishers Group (IJPG) ©

393

Fig. 8 True Positive Rate of levels 5 to 10 for Anomalies Test Dataset

For the last dataset analyzed we can observe in the

tables 10, 11 and 12 a good classification rate. So this

demonstrates the viability of our classification technique that

can achieve more than traditional techniques for traffic flow

classification that are often no-more accurate that 50-70%

[4].

TABLE 12

ANOMALIES RESULTS

6 Stages 7 Stages

DR (%) DR (%)

Train 94.44 99.89

Test 86.61 90.28

5. Conclusions

In this work, we have evaluated three datasets for

classifying traffic application employed with packets, flows

and anomaly records. In ours experiments, the classification

technique based on multifractal multiplicative cascades can

archive detection rates above 90% .

The multifractal classification technique extract traffic

features to build a model on offline phase, which is later

used to identify network traffic online. The approach

showed a good performance in classification task of records

traces studied in experiments with six, seven and eight

cascade stages. We believe that the performance of this

technique can be refined thought the selection of optimum

numbers of cascades levels used for the analysis.

References

[1] Cisco. “Global Internet Traffic Projected to Quadruple by

2015,” (2013). Available: http://newsroom.cisco.com/home1

(last accessed April 2013)

[2] R. Alshammari & A. N. Zincir-Heywood, “A Flow Based

Approach for SSH Traffic Detection,” (2007) In Systems,

Man and Cybernetics. ISIC. IEEE International Conference

on, pp. 296-301.

[3] J. Hurley, E. Garcia-Palacios & S. Sezer, “Classifying

Network Protocols: A 'Two-Way' Flow Approach,” (2011)

Communications, IET, vol. 5, pp. 79-89.

[4] A. W. Moore & D. Zuev, “Internet Traffic Classification

Using Bayesian Analysis Techniques,” (2005) SIGMETRICS

Perform. Eval. Rev., vol. 33, pp. 50-60.

[5] J. Erman, A. Mahanti & M. Arlitt, “Internet Traffic

Identification Using Machine Learning,” (2006) In Global

Telecommunications Conference, GLOBECOM '06. IEEE,

2006, pp. 1-6.

[6] T. Karagiannis, A. Broido, M. Faloutsos & K. Claffy,

“Transport Layer Identication of P2P Trfc,” (2004) In

Proc. Of IMC'04.

[7] S. Sen, O. Spatscheck, & D. Wangccurte, “Scalable

In-Network Identication P2P Trfc Using Application

Signatures,” (2004) In WWW200S, New York, USA, May

17-22.

[8] T. T. T. Nguyen, & G. Armitage, “A Survey of Techniques

for Internet Traffic Classification Using Machine Learning,”

(2008). IEEE Communications Surveys & Tutorials, vol.

10, pp. 56-76.

[9] A. Moore, D. Zuevd, & M. Crogan, “Discriminators for Use

in Flow-Based Classification,” (2005) Technical Report, Intel

Research.

[10] D. Zuev & A. Moore, “Traffic Classification Using a

Statistical Approach,” (2005) lect.notescomput.sci., 3431,pp.

321–324.

[11] G.P.S. Junior, J.E.B. Maia, R. Holanda & J.N. Sousa,“P2P

Traffic Identification Using Cluster Analysis,”(2007). Global

Information Infrastructure Symp., giis, pp. 128–133.

[12] S. Zander, T. Hguyen & G. Armitage “Automated Traffic

Classification and Application Identification Using Machine

Learning,” (2005). Proc. IEEE Conference. on local

Computer Networks 30th Anniversary, lcn.

[13] M. Roughan, S. Sen, O. Spatscheck & N. Duffield,

“Class-of-Service Mapping for QoS: A Statistical

Signature-Based Approach to IP Traffic Classification,”

(2004). Proc. Fourth ACM Sigcomm Conference on Internet

Measurement, pp. 135–148.

[14] A. Mcgregor, M. Hall, P. Lorier & J. Brunskill, “Flow

Clustering Using Machine Learning Techniques,”(2004)

Proc. Fifth Passive and Active Measurement Workshop, Pam.

[15] D. R. Bickel & B.J. West, “Multiplicative and Fractal

Processes in DNA Evolution,”(1998) Fractals, 6, 211–217.

[16] A.N. Kolmogorov,“A Refinement of Previous Hypothses

Concerning The Local Structure of Turbulence in a Viscous

Incompressible Fluid a High Reynolds Number,”(1962)

J.Fluid Mech., 13, 82–85.

[17] A. Feldmann, A. Gilbert, W. Willinger & T.G. Kurtz, “The

Changing Nature of Network Traffic: Scaling

Phenomena,”(1998) ACM Computer Communication

Review, v.28, p.5-29. Group, Tech.1 Rep.Disas-STP-93-30.

[18] W. Leland, M. Taqqu, W. Willinger & D. Wilson, “On The

Self-Similar Nature of Ethernet Traffic,” (1994) (Extended

Version), IEEE/ACM Transactions on Networking, v.2,n.1,

pp 1-15.

[19] I. Norros, “A Storage Model with Self-Similar Input,” (1994)

Queueing Systems,16, pp.387-396.

[20] M. S. Taqqu, V. Teverovsky & W. Willinger, “Is network

traffic self-similar or multifractal?,” (1997) Fractals, vol. 5,

pp. 63-74.

[21] A. Feldman, A. C. Gilbert & W. Willinger, “Data Network As

Cascades: Investigating The Multifractal Nature of Internet

WAN Traffic,” (1998). Computer Communication Review.

[22] J. Vinay Ribeiro, R. H. Riedi, M. S. Crouse & R. G.

Baraniuk, “Multiscale Queuing Analysis of

Long-Range-Dependent Network Traffic,” (2000). IEEE

INFOCOM 2000.

5 6 7 8 9 10

70

80

90

100

Stages

Tru

e P

ositiv

e R

ate

Normal

Attacks

Page 7: Internet Traffic Classification Using Multifractal Analysis Approach

International Journal of Advanced Computer Science, Vol. 3, No. 8, Pp. 388-394, Aug. 2013.

International Journal Publishers Group (IJPG) ©

394

[23] R.H Riedi, M.S. Crouse, V.J. Ribeiro & R.G. Baraniuk, “A

Multifractal Wavelet Model with Application to Network

Traffic,” (1999) IEEE Transactions on Information Theory.

[24] F.H.T.Vieira & L.L. Lee, “Adaptive Wavelet Based

Multifractal Model Applied to the Effective Bandwidth

Estimation of Network Traffic Flows,” (2009). IET

Communications.

[25] J.W.G. Stenico & L.L. Lee, “A New Binomial Conservative

Multiplicative Cascade Approach for Network Traffic

Modeling,” (2013). In 27th IEEE International Conference on

Advanced Information Networking and Applications – IEEE

AINA 2013.

[26] J.W.G. Stenico & L.L. Lee, “A Multifractal Based Dynamic

Bandwidth Allocation Approach for Network Traffic Flows,”

(2010) IEEE International Conference on Communications

(ICC), 23-27 May 2010, pages 1 – 6.

[27] J.W.G. Stenico & L.L. Lee, “A New Approach for Buffer

Queueing Evaluation under Network Flows with Multi-Scale

Characteristics,” (2012) In: International Joint Conferences

on Computer, Information, and Systems Sciences, and

Engineering (CISSE 12), 2012, University of Bridgeport -

EUA.

[28] P.M. Krishna, V.M. Gadre & U.B. Desai, “Multifractal Based

Network Traffic Modeling,” (2003) Kluwer Academic

Publishers.

[29] B. B. Mandelbrot, L. Calvet & A. Fisher, “Large Deviations

and The Distribution of Price Changes,”(1997). Discussion

paper No 1165 of the Cowles Foundation for Economics at

Yale University.

[30] V. Gupta & E. Waymire, “A Statistical Analysis of

Mesoscale Rainfall as a Random Cascade” (1993) Journal of

Applied Meteorology, 32, 251–267.

[31] T.D. Dang, S. Molnár & I.Maricza,“Queuing Performance

Estimation for General Multifractal Traffic,”(2003). Int. J.

Commun. Syst., vol 16 no 2, pp 117–136.

[32] A.C. Gilbert, W. Willinger & A. Feldmann, “Scaling

Analysis of Conservative Cascades, with Applications to

Network Traffic,” (1999) AT&T Labs.-Res., Florham Park,

NJ.Information Theory, IEEE Transactions on, vol. 45, pp.

971-991.

[33] J.R. Quinlan, “C4.5: Programs for Machine Learning,” (1993)

Morgan Kaufmann. Publishers Inc. San Francisco, CA, USA.

[34] Unina. “Network Tools And Traffic Traces,” (2013)

Available:

,http://www.grid.unina.it/Traffic/Traces/ttraces.php (last

accessed April 2013).

[35] V. Carela-Español, P. Barlet-Ros, A.C. Aparicio & S. Pareta,

“Analysis of The Impact of Sampling on NetFlow Traffic

Classification,” (2001) Computer Networks, vol. 55, pp.

1083-1099.

[36] G. Holmes, A. Donkin & H. Witten, “WEKA: A Machine

Learning Workbench,” (1994) In Intelligent Information

Systems. Proceedings of the Second Australian and New

Zealand Conference on, Brisbane, Qld., Australia, pp.

357-361.

[37] M. Tavallaee, E. Bagheri, W. Lu & A. Ghorbani, “A Detailed

Analysis of the KDD CUP 99 Data Set,” (2009) IEEE

Symposium on Computational Intelligence for Security and

Defense Applications (CISDA), pp. 1-6.

Yulios Zavala Huaman Has graduation

at Ingeniería de Sistemas e Informática

by Universidad Nacional Mayor de San

Marcos (2005) is a master's student in

Laboratory of Pattern Recognition and

Communication Networks, Universidade

Estadual de Campinas - Unicamp. Has

experience in the area of Electrical

Engineering, with emphasis on

telecommunications. His research is specialized in application and

service identification. He interests span network monitoring,

machine learning, data mining.

Jeferson Wilian de Godoy Stênico

received the B.S. in Mathematics from

Universidade Estadual Paulista Júlio de

Mesquita Filho – UNESP, Brazil (2006)

and M. Sc. in Electrical Engineering

from State University of Campinas –

Unicamp, Brazil (2009), is currently

Ph.D. student at Electrical Engineering

from State University of Campinas –

Unicamp. His current research interests

include network traffic modeling, network design, performance

analysis and communications system.

Lee Luan Ling received the B.S. and M.

Sc. Degrees in electrical Engineering

from University of São Paulo (1980) and

State University of Campinas (1984),

respectively, in São Paulo, Brazil. In

1991 he received a Ph.D. degree in

Electrical Engineering from Cornell

University, Ithaca, USA. In 1984 he

became a faculty member at School of

Electrical and Computer Engineering,

State University of Campinas where currently he is a Full

Professor. His current research interests include pattern

recognition, handwriting recognition, biometrics, image

processing, artificial intelligence, video monitoring and

surveillances, network traffic modeling and network design and

performance analysis.