4
Data mining based wireless network traffic forecasting Cristina Stolojescu-Crisan Electronics and Telecommunications Faculty “Politehnica” University of Timisoara Timisoara, Romania [email protected] Abstract— In this paper, we propose an approach for predicting time series. This approach is based on the Stationary Wavelet Transform (SWT) and two types of forecasting models, such as based on Auto-Regressive Integrated Moving Average (ARIMA) and based on Artificial Neural Networks (ANNs). The forecasting performance of these models was evaluated using three well- known evaluation criteria: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE). Results show that ANN performs better than ARIMA based forecasting technique for small future time intervals. However, ARIMA models can capture the behavior of the time series and is suitable for long term prediction. We present two applications for wireless networks traffic forecasting, the prediction of the moment when a specified Base Station (BS) will saturate (long term prediction) and the prediction of traffic anomalies (short term prediction). Keywords- Data mining, communication system traffic, forecasting, time series analysis, wavelet transform. I. INTRODUCTION One of the most successful developments in communications is represented nowadays by the wireless networks implemented in WiMAX technology. The performance of such a network can be improved by traffic analysis. Each traffic trace is a time series. It is a sequence of values measured at equal time intervals. Collecting data in time-series is useful because data analysts are often concerned with discovering patterns in time such that they may be capable of predicting future patterns. Generally, the traffic traces are very long and difficult to be analyzed with traditional tools. This is the objective of data mining. In recent years, there has been an explosion of interest in data mining. Mining time series data refers to all of the methods employed to understand such data, with the purpose of predicting future values. Despite the proliferation of data mining techniques, it is difficult to find references about data mining techniques applied for wireless network traffic analysis. The prediction of future values, when knowing the observed values of the time series, requires a model. The choice of the prediction model is based on the desired prediction interval (long/ short term), prediction error and computational cost. In the last two decades, Auto-Regressive Integrated Moving Average (ARIMA), sometimes called Box-Jenkins models [1], has been widely used for long term time series forecasting [2]. The method based on Artificial Neural Networks (ANN) has recently shown a great applicability in short term time-series analysis and forecasting. One advantage of an ANN is that it learns from the past experiences. This means that the ANN is trained with past data and then, it is used to predict future values. This advantage makes the ANN appropriate for various applications. In the recent years, the wavelet transform has been frequently used for time series forecasting. Wavelets can localize data in time-scale space. At high scales wavelets have a small time support and can "catch" discontinuities or singularities, while at low scales wavelets have a larger time support and can identify periodicities. Wavelets [3] are able to characterize the physical properties of data. The application of wavelet transforms makes data sparse and reduces the amount of computation required by the signal processing methods. More, the algorithms implementing the wavelet transform are very fast. The goal of this paper is to compare two forecasting methods, applied in the wavelet domain, for the analysis of wireless network traffic. We suggest as potential applications the prediction of the moment when a specified BS will saturate (long term prediction) and the prediction of traffic anomalies (short term prediction) [4]. Based on the prediction of the moment when a specified BS will saturate (prediction of the BS risk of saturation), the WiMAX network’s administrator can prepare future upgrading. The prediction of traffic anomalies will allow the network administrator to update the security policy of the network. Inspired by [5], this paper compares two methodologies to build forecasting models for WiMAX traffic. The first methodology [6] is based on ARIMA modeling and it is appropriate for long term prediction. The second methodology is based on ANNs [7] and it is appropriate for short term prediction. Forecasting WiMAX traffic is a new application of data mining and follows CRISP-DM [8] phases presented in Fig.1. Fig.1 CRISP-DM phases. The forecasting methodologies adopted in this paper [7] are based on statistical data processing in the field of wavelets. The 978-1-4673-1176-2/12/$31.00 ©2012 IEEE

[IEEE 2012 10th International Symposium on Electronics and Telecommunications (ISETC) - Timisoara, Timis, Romania (2012.11.15-2012.11.16)] 2012 10th International Symposium on Electronics

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 10th International Symposium on Electronics and Telecommunications (ISETC) - Timisoara, Timis, Romania (2012.11.15-2012.11.16)] 2012 10th International Symposium on Electronics

Data mining based wireless network traffic forecasting

Cristina Stolojescu-Crisan Electronics and Telecommunications Faculty

“Politehnica” University of Timisoara Timisoara, Romania

[email protected]

Abstract— In this paper, we propose an approach for predicting time series. This approach is based on the Stationary Wavelet Transform (SWT) and two types of forecasting models, such as based on Auto-Regressive Integrated Moving Average (ARIMA) and based on Artificial Neural Networks (ANNs). The forecasting performance of these models was evaluated using three well-known evaluation criteria: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (SMAPE). Results show that ANN performs better than ARIMA based forecasting technique for small future time intervals. However, ARIMA models can capture the behavior of the time series and is suitable for long term prediction. We present two applications for wireless networks traffic forecasting, the prediction of the moment when a specified Base Station (BS) will saturate (long term prediction) and the prediction of traffic anomalies (short term prediction).

Keywords- Data mining, communication system traffic, forecasting, time series analysis, wavelet transform.

I. INTRODUCTION One of the most successful developments in

communications is represented nowadays by the wireless networks implemented in WiMAX technology. The performance of such a network can be improved by traffic analysis. Each traffic trace is a time series. It is a sequence of values measured at equal time intervals. Collecting data in time-series is useful because data analysts are often concerned with discovering patterns in time such that they may be capable of predicting future patterns. Generally, the traffic traces are very long and difficult to be analyzed with traditional tools. This is the objective of data mining. In recent years, there has been an explosion of interest in data mining. Mining time series data refers to all of the methods employed to understand such data, with the purpose of predicting future values. Despite the proliferation of data mining techniques, it is difficult to find references about data mining techniques applied for wireless network traffic analysis. The prediction of future values, when knowing the observed values of the time series, requires a model. The choice of the prediction model is based on the desired prediction interval (long/ short term), prediction error and computational cost.

In the last two decades, Auto-Regressive Integrated Moving Average (ARIMA), sometimes called Box-Jenkins models [1], has been widely used for long term time series forecasting [2]. The method based on Artificial Neural Networks (ANN) has

recently shown a great applicability in short term time-series analysis and forecasting. One advantage of an ANN is that it learns from the past experiences. This means that the ANN is trained with past data and then, it is used to predict future values. This advantage makes the ANN appropriate for various applications.

In the recent years, the wavelet transform has been frequently used for time series forecasting. Wavelets can localize data in time-scale space. At high scales wavelets have a small time support and can "catch" discontinuities or singularities, while at low scales wavelets have a larger time support and can identify periodicities. Wavelets [3] are able to characterize the physical properties of data. The application of wavelet transforms makes data sparse and reduces the amount of computation required by the signal processing methods. More, the algorithms implementing the wavelet transform are very fast.

The goal of this paper is to compare two forecasting methods, applied in the wavelet domain, for the analysis of wireless network traffic. We suggest as potential applications the prediction of the moment when a specified BS will saturate (long term prediction) and the prediction of traffic anomalies (short term prediction) [4]. Based on the prediction of the moment when a specified BS will saturate (prediction of the BS risk of saturation), the WiMAX network’s administrator can prepare future upgrading. The prediction of traffic anomalies will allow the network administrator to update the security policy of the network.

Inspired by [5], this paper compares two methodologies to build forecasting models for WiMAX traffic. The first methodology [6] is based on ARIMA modeling and it is appropriate for long term prediction. The second methodology is based on ANNs [7] and it is appropriate for short term prediction. Forecasting WiMAX traffic is a new application of data mining and follows CRISP-DM [8] phases presented in Fig.1.

Fig.1 CRISP-DM phases.

The forecasting methodologies adopted in this paper [7] are based on statistical data processing in the field of wavelets. The

978-1-4673-1176-2/12/$31.00 ©2012 IEEE

Page 2: [IEEE 2012 10th International Symposium on Electronics and Telecommunications (ISETC) - Timisoara, Timis, Romania (2012.11.15-2012.11.16)] 2012 10th International Symposium on Electronics

last operation performed in the Data Preparation block in Fig. 1 is the wavelets transform. It can be done using the algorithm of Mallat [3], which corresponds to the computation of the Discrete Wavelet Transform (DWT). However, the prediction methods analyzed in this paper are based on the redundant "à trous" algorithm of Shensa [9], implementing the Stationary Wavelet Transform (SWT). The advantage of SWT over the DWT is the translation invariance of SWT.

II. DATA UNDERSTANDING AND PREPARATION Data understanding phase implies collecting, describing,

and exploring data and analyzing its quality. The data used in this study was obtained from 66 BSs composing a real wireless network. The values were recorded every 15 minutes, during eight weeks.

The next phase is data preparation, which implies selecting data for analysis, data clearing, treating missing values, and searching periodicities. The simple plot of the traffic curves proved the existence of periodicities in the traffic. By applying the Fast Fourier Transform to all the 66 traces, and by representing their power spectral density, we found that the most dominant period for all series is the period of one day (24 hours). These periodicities are better observed if the sampling interval is modified from 15 minutes to 1.5 hours. The traces contain specific overall trends, which are important for the estimation of the risk of saturation for a given BS. The overall trend can be extracted using a multi-scale analysis. In the case of the forecasting methodology based on the ARIMA model, we have predicted these overall trends. First, we have implemented a temporal decimation with a factor of 6. The new series were labeled ).t(x d At low scales, wavelets identify the long-term trend of the data. In order to be able to apply the SWT, if decomposition at level n is needed, n2 must evenly divide the length of the signal. In the case of the forecasting methodology based on ARIMA modeling, we have chosen a value n = 6. The prediction of the moment when a given BS will saturate requires the estimation of the overall trend of its traffic and the estimation of the variability of the traffic around the overall trend. To extract the overall trend of the traffic, we applied the multi-resolution analysis (MRA) on the series, using temporal resolutions between 1.5 and 96 hours. The 6th level of decomposition gives us seven signals for processing: one approximation signal, corresponding to the current level and six detail sequences, corresponding to each of the six decomposition levels. The equation describing the MRA already mentioned is:

∑=

+=6

1pp6d )t(d)t(a)t(x , (1)

where )t(a 6 represents the 6th approximation sequence, while )t(d p represents the details corresponding to time resolutions between 1.5 hours (p=1) and 96 hours (p=6).

For the prediction of the overall tendency of the traffic, we have used the sequence ).t(a 6 For the prediction of the

variability of the traffic around the overall tendency, we have used only two sequences of detail coefficients, )t(d 4 and

)t(d3 . We considered all the other detail coefficients equal to zero. This way, we obtained a great reduction of the data volume, obtaining a fast prediction algorithm. In the case of the forecasting methodology based on ANNs, the ANN optimization is done for each of the signal’s decomposition level.

III. MODELING Modeling phase involves the selection of modeling

technique and the estimation of model's parameters. We considered two different approaches: ARIMA based models and ANNs.

A process, tX , is said to follow an ARIMA model [11] if:

ttd Z)B(X)B1)(B( θ=−ϕ , ),0(WN~Z 2

t σ (2) where φ(•) corresponds to the autoregressive part of the model, and θ(•) corresponds to the moving average part of the model. φ(•) and θ(•) are polynomials of the degree p and q, d is the number of differencing operations and B is the backward shift operator.

Box-Jenkins methodology involves the following steps [5]: checking stationarity and determine d, the number of differencing steps needed to remove non-stationarity, identify the orders p and q, estimate the polynomials φ and θ, and check the resulting model. Stationarity can be detected from an autocorrelation plot. Non-stationarity is often indicated by an autocorrelation plot with very slow decay. The Autocorrelation function (ACF) and Partial Autocorrelation function (PACF) are used to analyze the stationarity of a time series and to estimate the orders p and q. The ACF and PACF plots are compared to the theoretical behavior of these plots , when the order is known. The goal of the Box-Jenkins methodology is to find a model, so that the residuals are as small as possible and exhibit no pattern. The residuals represent all the influences on the time series which are not explained by other of its components (trend, seasonal component, trade cycle). The steps involved to build the model are repeated, in order to find a specific multiple times formula that copies the patterns in the series, as closely as possible, and produces accurate forecasts.

In the case of the saturation risk prediction, we modeled separately the overall tendency and the variability of the traffic around the overall trend using linear time series models. After computing the energy of all detail coefficients, we chose to keep in our MRA only the approximation coefficients sequence from the sixth decomposition level, and the sequences of detail coefficients from the third and the fourth decomposition levels [2]. The new statistical model is:

)t(d)t(d)t(a)t(x 436d γ+β+= , (3)

Page 3: [IEEE 2012 10th International Symposium on Electronics and Telecommunications (ISETC) - Timisoara, Timis, Romania (2012.11.15-2012.11.16)] 2012 10th International Symposium on Electronics

where the constants β and γ can be computed [2]. The approximation, 6a (t), explains the overall trend of the temporal series, while )t(d)t(d 43 γ+β explain the deviation of the time series around its overall trend (variability). We applied Box-Jenkins methodology separately for the tendency and for the variability.

Artificial Neural Networks (ANNs) represent a class of flexible nonlinear models which consist of a large number of simple units (also called artificial neurons) [12]. ANNs can be viewed as 'computational models' with some particular properties: the ability to learn, to generalize, or to cluster and organize data. These computational models operate based on parallel processing. An ANN is characterized by: its architecture (the pattern of nodes and connections between them), the learning algorithm or training method (the method used to perform the learning algorithm), the model of the artificial neuron (the activation function - the function that defines the output of a neuron, based on the input values received by a node), the type of data, and the presence of memory. Generally, we can identify two classes of architectures of a neural network (NN) [13]: feed-forward NNs (single layer feed-forward NN and multilayer feed-forward NN), and recurrent NNs. Feed-forward NNs have neurons organized in a layered structure [14]. They are characterized by the fact that the data flows, from input to output units, are strictly feed-forward. The network receives inputs through neurons in the input layer, and the output of the network is given by the neurons in the output layer. There may be one or more intermediate hidden layers. We have chosen feed-forward NNs because, according to [14], recurrent networks forecasting performance is lower than the performance of feed-forward based models. This may be caused by the fact that recurrent networks pass the data from back to front as well as from front to back, and may become "confused" or unstable. Designing a feed-forward network imposes the establishment of the number of layers and the number of neurons in each layer. According to [15], feed-forward NNs with one hidden layer are the most popular and flexible configurations for time series forecasting. For the output layer, taking into consideration that we want to predict a single element (data from one week), we used only one neuron. For the input layer, we considered the concept of the inputs of a time delayed NN described in [16]. So, after several tests and computations, the number of input neurons was set to 18, and the number of neurons in the hidden layer was set to 12. We used the Adaptive Learning Rate with Momentum Training as training algorithm [17].

IV. EVALUATION AND DEPLOYMENT In the case of the forecasting method based on ARIMA, we

used the Maximum Likelihood Estimation to identify model’s parameters p and q. The best model chosen was the one that provided the smallest Akaike Information Criterion Corrected

(AICC), Bayesian Information Criterion (BIC), and Final Prediction Error (FPE) measures, and the smallest Mean Square Error (MSE) for the prediction of a number of weeks ahead. The forecasting performance is evaluated using the following criteria: MAE, MAPE and SMAPE.

In [10], we evaluated the traffic prediction accuracy by using different mother wavelets families. The best results were obtained using the Haar mother wavelets, db1.

We show in Fig.2 a comparison between the original time series and the simulated model (obtained applying two times the Box-Jenkins methodology) for the overall tendency of a BS’s traffic.

Figure 2: Overall tendency model.

The coefficients used to estimate the variability of the

traffic were treated following a similar procedure, based on the Box-Jenkins methodology.

In Fig.3 the estimated overall trend (in the middle) and the estimated variability of a BS’s traffic (the first and the third line) are presented.

Figure 3: The trajectories for the long-term forecasts.

We have adapted the method presented in [5] to wireless traffic. After several tests, we arrived at the conclusion that the method obtained can be used for long term forecasts and can identify the moment when the traffic of a BS exceeds its saturation threshold.

Page 4: [IEEE 2012 10th International Symposium on Electronics and Telecommunications (ISETC) - Timisoara, Timis, Romania (2012.11.15-2012.11.16)] 2012 10th International Symposium on Electronics

In the case of ANN based forecasting technique, the procedure is a bit different. The entire information is divided into training and testing parts. After the raw traffic data was initially decomposed into different timescales using SWT, we trained the ANN once for each decomposition level. We took the first 6 weeks as inputs, the 7th week as target and the last week for tests. The forecasted signals were obtained independently for each decomposition level. The inverse wavelet transform was applied to all these forecasted signals and the final result was compared to the real traffic data from the 8th week.

In the following, we present the comparison between ARIMA and ANN based traffic forecasting methodologies for one week prediction. The results are showed in Table I. The three quality evaluation criteria: MAE, MAPE and SMAPE were computed for each of the 66 BSs. In Table I we presented the mean values obtained on the entire set of BSs. Despite their simplicity, these mean values are very appropriate for the comparison of the forecasting methodologies.

TABLE I. FORECASTING TECHNIQUES COMPARISON

Forecasting model SMAPE MAPE MAE Computation time

ARIMA 0.812 0.0016 0.7327 Minutes ANN 0.472 0.0011 0.4428 Hours

It can be observed that ANN performs better than ARIMA based forecasting technique for small future time intervals.

V. CONCLUSIONS

In this paper, we compared two time series prediction algorithms applied on the same data base composed by WiMAX traffic traces [7]. The first algorithm was initially proposed in [5] for wired networks and uses ARIMA models. We have adapted this algorithm to wireless traffic [2, 6], and the results presented in this paper confirm its efficiency for the long term forecasting of wireless traffic. It permits the precise estimation of the risk of saturation of each BS composing a WiMAX network. This information is very useful for the network’s administrator because it can collaborate with the service providers to establish the number of users allocated at each BS in the next month. Based on this information, a future upgrade of the BS can be scheduled. We have applied the same algorithm for the forecasting of other time series with different nature and we have obtained good prediction results as well [6, 18]. The second algorithm is based on feed-forward ANNs. It has been shown in Table I that it outperforms the algorithm based on ARIMA model for short term forecasting. It could be applied by the WIMAX network administrator to predict the traffic anomalies which will appear in the following week. It is not recommended for long term

prediction because the performance of the ANN decreases with the increasing of the time interval spent from the last training session. It requires a higher computational volume than the algorithm based on the ARIMA model operating on a higher volume of data. It cannot make the distinction between the overall tendency of the traffic and the variability around the overall tendency. We have unified the presentation of these forecasting algorithms as variants of the same data mining approach. We have proposed applications of both algorithms for WiMAX traffic forecasting purposes highlighting as principal difference between them the interval of time in which the prediction performed is accurate.

REFERENCES [1] G. Box, G. M. Jenkins, G. Reinsel. Time series analysis: forecasting and

control. Prentice Hall; 3rd edition, 1994. [2] C. Stolojescu, “A Wavelets Based Approach for Time-Series Mining”.

PhD Thesis, "Politehnica" University of Timisoara, 2012, PhD. advisors Prof.A Isar, Prof. P. Lenca.

[3] S. Mallat. A wavelet tour of signal processing (second edition). Academic Press, 1999.

[4] M. Salagean, “Non-Stationary Signal Description by Non-Parametrical Method”. PhD. Thesis, “Politehnica” University of Timisoara, 2011, PhD. Advisor-Prof. I. Nafornita.

[5] K. Papagiannaki et al, “Long-Term Forecasting of Internet Backbone Traffic: Observations and Initial Models”, IEEE INFOCOM, 2003.

[6] C. Stolojescu et al, “Forecasting WiMAX BS Traffic by Statistical Processing in the Wavelet Domain”, Proc. of ISSCS, 2009, Iasi, Romania, pp.177-183

[7] I. Railean, S. Moga, M. Borda, “Forecasting by neural networks in the wavelet domain”, Acta Tehnica Napocensis, vol. 50, 2009, pp. 15-27.

[8] P. Chapman et al, “CRISP-DM 1.0 Step-by-step data mining guide”, The CRISP-DM consortium, 2000, [Online]. Available: (http://www:crisp-dm.org/CRISPWP-0800.pdf).

[9] M. J. Shensa, "Discrete Wavelet Transform. Wedding the a trous and Mallat algorithms," IEEE Trans. and Signal Processing, no.40, pp. 2464-2482, 1992

[10] C. Stolojescu et al, „Comparison of Wavelet Families with Application to WiMAX Traffic Forecasting”, Proc. of OPTIM, 2010, Brasov, Romania, pp.932-937.

[11] C. Chateld. Time-series forecasting. Chapman and Hall, 2001. [12] B. Krose, P. van der Smagt, „An Introduction to Neural Networks”, The

University of Amsterdam, 8th edition, November 1996. [13] S. Haykin, “Neural Networks: A Comprehensive Foundation”, IEEE

Press, McMillan College Publishing Co., 1994. [14] M. S. Boyd, et al, “Feed-forward versus recurrent neural networks for

forecasting monthly japanese yen exchange rates”, Financial Engineering and the Japanese Markets, Kluwer Academic Publishers, vol.3, no. 1, 59-75, 1996.

[15] N. R. Swanson, H. White, “Forecasting economic time series using flexible versus fixed specification and linear versus nonlinear econometric models”, International Journal of Forecasting, no.13, 1997. pp. 439–461.

[16] T. Taskaya-Temizel, M. C. Casey, “Configuration of Neural Networks for the Analysis of Seasonal Time Series”, Proceedings of the 3rd International Conf. on Advances in Pattern Recognition, ICAPR, Lecture Notes in Computer Science 3686, vol. 1, pp. 297-304, 2005,

[17] M. Moreira, E. Fiesler , “A Neural Networks with Adaptive Learning Rate and Momentum Terms”, Tech. report, 1995. [Online]. Available: http://publications.idiap.ch/downloads/reports/1995/95-04.pdf

[18] C. Stolojescu et al, “A wavelet based prediction model for time-series”, Proc. of SMTDA, 2010, Chania, Crete.