[IEEE Integration (2010 IRI) - Las Vegas, NV, USA (2010.08.4-2010.08.6)] 2010 IEEE International Conference on Information Reuse & Integration - k-NN based LS-SVM framework for long-term

k-NN Based LS-SVM Framework for Long-Term Time Series Prediction

Zifang Huang and Mei-Ling ShyuDepartment of Electrical and Computer EngineeringUniversity of Miami, Coral Gables, FL, [email protected], [email protected]

Abstract

Long-term time series prediction is to predict the futurevalues multi-step ahead. It has received more and more at-tention due to its applications in predicting stock prices,traffic status, power consumption, etc. In this paper, a k-nearest neighbors (k-NN) based least squares support vec-tor machine (LS-SVM) framework is proposed to performlong-term time series prediction. A new distance function,which integrates the Euclidean distance and the dissimilar-ity of the trend of a time series, is defined for the k-NN ap-proach. By selecting similar instances (i.e., nearest neigh-bors) in the training dataset for each testing instance basedon the k-NN approach, the complexity of training an LS-SVM regressor is reduced significantly. Experiments on twotypes of datasets were conducted to compare the predictionperformance of the proposed framework with the traditionalLS-SVM approach and the LL-MIMO (Multi-Input Multi-Output Local Learning) approach at the prediction horizon20. The experimental results demonstrate that the proposedframework outperforms both traditional LS-SVM approachand LL-MIMO approach in prediction. Furthermore, exper-imental results also show the promising long-term predic-tion ability of the proposed framework even when the pre-diction horizon is large (up to 180).

1. Introduction

Time series prediction, in particular, long-term time se-ries prediction has attracted increasing attention recently inreal applications, such as transportation prediction [2] andelectrical load prediction [6]. A time seriesX is a sequenceof data points xt, usually collected at uniform time inter-vals consecutively. Each data point is the observation attime t. Long-term time series prediction is to make the pre-dictions multi-step ahead. Due to the lack of information ofthe future trend, such prediction becomes more challeng-ing as the prediction horizon becomes larger. To tacklethis challenge, research work has been done to improve

the prediction horizon based on some traditional time se-ries prediction approaches, such as exponential smoothing,linear regression, autoregressive integrated moving average(ARIMA), support vector machines (SVM), artificial neuralnetworks (ANN), and fuzzy logic.Both least squares support vector machine (LS-

SVM) [12] and ANN [13] approaches have been widelyused for nonlinear classification and function estimation,and successfully applied to time series prediction. Theirvariations have been developed for long-term time seriesprediction [3][4][6][7][10]. Input feature selection wasintegrated with LS-SVM [11] and ANN [9] to improvelong-term prediction performance, reduce the computa-tional complexity of the predictor, and provide high levelinformation of the time series. While these approaches putmore computational load on the input feature selection stageother than training the prediction model, most of the exist-ing approaches used a fixed range of the time series as thetraining dataset. There is very limited research effort thatfocuses on reducing the training dataset and utilizing thetraining data selectively.The existing algorithms for long-term time series predic-

tion can be generally categorized into two trends: recursiveapproach and direct strategy [3]. Recursive approaches traina one-step ahead prediction model, and then iterates it bytaking the predicted values as a part of the input. Intuitively,this approach suffers from error propagation, because it uti-lizes its own output repeatedly to realize a multi-step aheadprediction. On the other hand, direct strategies train oneprediction model for each prediction horizon based on thehistorical data, which puts more effort in the training stage,and there is no error accumulation problem. In addition,a multi-input multi-output local learning (LL-MIMO) ap-proach [1] considers the relation among the future values,which generates the predicted values at all horizons simul-taneously. However, it is a direct prediction strategy inessence.In this paper, the direct strategy is adopted in our pro-

posed k-NN based LS-SVM framework for long-term timeseries prediction. The strategy selects those instances in the

69IEEE IRI 2010, August 4-6, 2010, Las Vegas, Nevada, USA978-1-4244-8099-9/10/$26.00 ©2010 IEEE

training dataset that are the closest (i.e., nearest neighbors)to the input testing instance using a new distance functionwhich integrates the Euclidean distance and the dissimilar-ity of the trend of a time series. These selected nearestneighbors form a reduced training dataset whose size is sig-nificantly smaller than the original training dataset, and arefurther reused to train an LS-SVM model. Different fromsome existing work [11], k-NN component here is appliedto reduce the size of the training dataset, rather than per-forming input feature selection.The rest of the paper is organized as follows. Section

2 presents the proposed k-NN based LS-SVM frameworkwhich defines the distance function for the k-NN method,briefly summarizes the LS-SVM methodology, and intro-duces how k-NN is combined with LS-SVM to improve theperformance of the long-term time series prediction. In Sec-tion 3, comparison and experimental results are shown toevaluate the performance. The conclusion is given in Sec-tion 4.

2. Methodology

An n-step ahead time series prediction is to predict thenext n values based on the past p observations. Accordingto the direct prediction strategy, one model fi is trained foreach prediction horizon i (as shown in Equation 1).

xt+i = fi(xt−p+1, . . . , xt−1, xt), 1 ≤ i ≤ n. (1)

Let the length of the time series for training d beT length. The sliding window approach is used to createthe training datasetD by realigning a one-dimensional timeseries into a matrix. The size of the sliding window is p+n.Thus, the training dataset D is a (T length − p − n + 1) ×(p + n) matrix. The first p columns of D are the input fortraining all the models, and the (p + i)th column is the out-put for model fi. In other words, each row of D is consid-ered as an instance. The first p values are the input, and thelast n values are the outputs for models at each predictionhorizon, respectively. In the following subsections, a novelk-NN based LS-SVM framework is introduced to tackle then-step ahead time series prediction problem effectively andefficiently.

2.1. k-Nearest neighbors

k-NNwas initially introduced as an instance based learn-ing algorithm for classification based on the closest traininginstances. It is also used for regression by returning theaverage value of the k nearest neighbors. For the similarinputs, it is assumed that their mapping relationships withtheir outputs should be similar as well. Hence, the k-NNmethod is employed to reduce the training dataset by select-ing k instances in the training dataset which are the closest

to the input testing instance. Such a reduced training datasetis then used to train the time series prediction model.In order to measure the similarity of the instances,

Euclidean distance is usually used as the distance metric.However, for a time series segment, the trend of thechanging values should be also considered. Here, the firstorder difference is used to describe the trend of a timeseries. Let the timestamps for the training dataset startfrom t + 1 to t + T length. Then the jth row in D is(xt+j , xt+j+1, . . . , xt+j+p−1, xt+j+p, . . . , xt+j+p+n−1),where 1 ≤ j ≤ (T length − p − n + 1), and the corre-sponding input vector is (xt+j , xt+j+1, . . . , xt+j+p−1).The first order difference of the input vector is(dt+j , . . . , dt+j+p−2) = (xt+j+1 − xt+j , . . . , xt+j+p−1 −xt+j+p−2), whose size is 1 × (p − 1).Given a testing input vector starting at time point T + 1

with length p, (xT+1, . . . , xT+p−1, xT+p), we first calcu-late its Euclidean distance with each instance in the trainingdataset, denoted as E(j) using Equation 2.

E(j) =

q(xT+1 − xt+j)

2 + . . . + (xT+p − xt+j+p−1)2. (2)

The first order difference of the testing input vectoris (dT+1, . . . , dT+p−1) = (xT+2 − xT+1, . . . , xT+p −xT+p−1), whose size is 1×(p−1) as well. Calculate the Eu-clidean distance between the differential testing input vectorand each differential training input vector, denoted asD(j),using Equation 3.

D(j) =

q(dT+1 − dt+j)

2 + . . . + (dT+p−1 − dt+j+p−2)2.(3)

Both E and D are vectors with length (T length − p −n + 1). A combination of the normalized E and D is usedas the distance metric for the k-NN method. The distanceDis is defined by Equation 4.

Dis(j) = E(j)−MIN(E)MAX(E)−MIN(E)

+ D(j)−MIN(D)MAX(D)−MIN(D) , (4)

where MAX(E), MIN(E), MAX(D), and MIN(D)are the maximum and minimum values for E and D, re-spectively. k instances corresponding to the smallest dis-tance measures are selected to generate a reduced trainingdataset for LS-SVM.

2.2. Overview of LS-SVM

LS-SVM is used here as a nonlinear regressionmodel [12]. Consider a model in the primal weight spacein the following form:

y(x) = wT ϕ(x) + b, (5)

70

where x ∈ Rp, y ∈ R, and ϕ(x) : R

p → Rph is a function

which maps the input x to a high dimensional feature space.Given a set of training data {xj , yj}N

j=1, the optimizationproblem can be formulated as follows.

minw,b,e

JP (w, e) =12wT w + γ

12

N∑j=1

e2j

subject to yj = wT ϕ(xj) + b + ej (6)

where j = 1, . . . , N and ej is an error variable. When wbecomes infinite dimensional, it is necessary to constructthe Lagrangian and solve its dual problem. The solution tothe optimization problem is given in Equation 7.

y(x) =N∑

j=1

αjK(x, xj) + b, (7)

whereK(x, xj) is a kernel function defined asK(xj , xl) =ϕ(xj)

Tϕ(xl). In this paper, the RBF kernel, K(x, xj) =

exp(−‖x − xj‖22/σ2), is used. There are two additional

tuning parameters in the case of RBF kernel, which are γand σ. The values of these two parameters are selected em-pirically in the experiments.

2.3. k-NN based LS-SVM

Based on the observation that similar inputs commonlyshare the same model to correlate with the correspond-ing outputs, rather than using the whole available trainingdataset to train an LS-SVMmodel, it would be more preciseand prudent to train a prediction model from the instancesin the training dataset that are close to the testing instance.In this case, one prediction model is trained for each testinginstance, which makes it more adaptive than using a con-stant model. Meanwhile, by selecting only those k traininginstances that are closest to the testing instance, it reducesthe size of the input data for LS-SVM, which dramaticallydecreases the complexity of building the LS-SVM regres-sor. Fig. 1 shows the system architecture of our proposedk-NN based LS-SVM framework.Our proposed framework contains four steps. First, for

each instance in the testing dataset T , its k nearest neighborsare selected from the training dataset D using the distancemeasure in Equation 4 defined in Section 2.1. The train-ing dataset D is formed by the most recent T length datapoints. Second, a reduced training dataset D′ is then usedas the input set to train an LS-SVM regressor for each pre-diction horizon. n regressors would be trained for an n-stepahead time series prediction problem. Third, take the test-ing instance as the input to the obtained LS-SVM regres-sors, which would return n predicted values. Finally, the

Training Dataset D

Testing Dataset T

k-NN

Reduced Training Dataset D�

LS-SVMRegressor

Predicted Values

Boundary Constraint

Figure 1. System architecture of the proposedk-NN based LS-SVM

last step is to validate the n predicted values with a bound-ary constraint, and the validated values are the n-step aheadprediction values for that testing instance.For the time series data collected from real world appli-

cations, which contain some physical meanings, their val-ues would fall within a certain reasonable range, and we donot expect sharp changes during a limited time period. Dueto the nature of the long-term time series prediction, thepredicted values might go beyond the margins, especiallywhen n is large. Thus, a Boundary Constraint componentis developed to post-process the predicted values renderedby LS-SVM regressors. In order to make it more system-atic and domain knowledge free, we set the upper and lowerbounds, denoted as UpB and LowB respectively, based onthe values in the training time series data d.

UpB = MAX(d) + 0.02 × STD(d);LowB = MIN(d) − 0.02 × STD(d),

whereMAX(d) andMIN(d) are the maximum and mini-mum values of the training time series data d, and STD(d)is the standard deviation of d. If a predicted value is largerthan the upper bound or smaller than the lower bound,we reset the value to the upper bound or lower boundvalue, respectively. Otherwise, the predicted value is ren-dered as the final output. Therefore, the final output ofthe system is limited to the range of [MIN(d) − 0.02 ×STD(d), MAX(d) + 0.02 × STD(d)].

71

In our proposed framework, a set of parameters shouldbe determined, including the length of the training time se-ries T length, prediction horizon n, the length of the in-put vector p, parameter k in the k-NN approach, and theother two parameters used in LS-SVM with the RBF kernel(namely, γ and σ). The values of the parameters are se-lected empirically, which will be explained in details in thefollowing section.

3. Experiments and Results

To evaluate the performance of the proposed k-NN basedLS-SVM framework, we conducted various experiments tocompare its prediction results with the traditional LS-SVMapproach [8][12] and the LL-MIMO algorithm [1]. The ex-periments were conducted on an Intel Core 2 machine withtwo 2.66 GHz CPUs and 3.25 GB of RAM.

3.1. Datasets

The comparative experiments utilized two types ofdatasets, including the Mackey-Glass time series bench-mark and four time series provided by NNGC1 competition.TheMackey-Glass time series [5] refers to the following de-layed differential equation:

dx(t)dt

=ax(t − τ)

1 + x(t − τ)10− bx(t). (8)

The generated data are commonly used for evaluatingand comparing time series prediction approaches [3][4][7].2201 data points were generated with an initial valuex(0) = 1.2, a = 0.2, b = 0.1, and τ = 17 by using the 4thorder Runge-Kutta method. The last 2000 data points out ofthe generated time series were used in our experiments.The time series data provided by NNGC1 competi-

tion [2] contain diverse non-stationary, heteroscedastictransportation data, which exhibit different structures andfrequencies, grouped into homogeneous datasets. The fourtime series used in our experiments are the four longest se-ries collected hourly, and the length of each time series is1742.

3.2. Error measures

Let X be the m true values for a testing time seriesdataset, and X be the m predicted values obtained n-stepahead. Two error measures are used to evaluate the pre-diction performance. One is the root mean squared error(RMSE), which is the square root of the variance as de-fined in Equation 9.

RMSE =

√∑mt=1 (xt − xt)

2

m. (9)

The other error measure is the symmetric mean absolutepercentage error (SMAPE), which is based on relative er-rors and is defined in Equation 10.

SMAPE =1m

m∑t=1

|xt − xt|(xt + xt)/2

. (10)

SMAPE is the mean value of the difference between xt

and xt divided by their average, which ranges from 0 to2. Comparing to RMSE, SMAPE is less relevant to theabsolute values of the time series data.

3.3. Experimental results

As mentioned in Section 2.3, a group of parameters needto be tuned empirically, including the length of the train-ing time series T length, the length of the input vector p,parameter k in the k-NN approach, and γ and σ used in LS-SVM with the RBF kernel, for a specific prediction horizonn. Due to the high complexity of the traditional LS-SVMapproach (one of the methods used in the performance com-parison), the prediction horizon n is set to 20 in the compar-ative experiments, which means we are predicting the futurevalues 20 steps ahead. Grid searching was done for tuningeach parameter one by one within a preset value range. Thevalue ranges are listed in Table 1. It takes time cost intoconsideration to set the ranges.

Table 1. Value ranges for parameter tuning

T length p k γ σ[500,1000] [3,30] [50,150] [0.02,100] [0.02,50]

The values which were able to give the lowest RMSE areselected as the preset values for the parameters. The se-lected parameter values for each time series dataset is shownin Table 2.

Table 2. Preset parameters values

Dataset T length p k γ σMackey-Glass 700 25 80 30 50NNGC1-1 600 20 70 10 10NNGC1-2 600 20 60 5 10NNGC1-3 600 25 110 10 10NNGC1-4 600 20 70 5 10

The combination of the parameter values for each time se-ries dataset varies due to the distinct characteristics of eachdataset. We compare the proposed framework with both thetraditional LS-SVM [8][12] and the multi-input and multi-output local learning (LL-MIMO) approach [1]. For thecomparison purpose, the parameters that are required to be

72

preset in these two approaches are set to be the same valuesas the ones used in the k-NN based LS-SVM approach for aspecific time series. For the traditional LS-SVM approach,it has parameters T length, p, γ, and σ. In the case of theLL-MIMO approach, T length, p, and k need to be preset.Figure 2 shows some of the 20 steps ahead prediction

results for time series NNGC1-4 by all three approaches.Due to the limited space, only the first 100 predicted datapoints out of the 1123 testing data are shown in the plot to-gether with the corresponding real values. From the figure,it is easy to see that the predicted values by our proposedk-NN based LS-SVM approach follow the real values quiteclosely, and it performs better than the traditional LS-SVMand LL-MIMO approaches.

�

��

��

��

��

��

��

��

��

��

��

� ��

��

��

� ��

��

��

��

Figure 2. Prediction results for NNGC1-4

The LL-MIMO approach failed to predict and preservethe trend of the time series dataset, especially for the last20 points. Figure 3 is an enlarged plot of the last 40 pointsin Figure 2 to show the comparison more clearly. Thoughthe traditional LS-SVM approach generated fair predictionresults, it consumed a much longer time with a large trainingdataset, which makes it infeasible to do prediction in realtime. On average, it took 28.76 seconds for the traditionalLS-SVM approach to execute a 20-step ahead prediction,while the k-NN based LS-SVM approach only requires 0.51seconds.The comparative experiments were conducted on all five

time series at prediction horizon n = 20. The resultsare reported in terms of the error measures, RMSE andSMAPE, in Table 3 and Table 4, respectively. The resultsshow that the proposed approach can always achieve thelowest prediction error. In Table 3, the values of RMSEdiffer widely for these five time series datasets, because thevalue ranges of these five time series datasets are diverse. Toreduce the influence of the absolute value range of a timeseries dataset on the error measurement, we calculate therelative errors, SMAPE, which are shown in Table 4. Ascan be observed from these two tables, the Mackey-Glass

�

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Figure 3. Prediction results for NNGC1-4

time series dataset is a synthetic data series without anynoise, and thus the prediction errors are much smaller inthe Mackey-Glass time series dataset than the errors of anyreal world dataset provided by NNGC1. A column plot ofTable 4 is presented in Figure 4 to show the comparisonresults more intuitively.

Table 3. Performance in terms of RMSE

Dataset LL-MIMO LS-SVMk-NNbasedLS-SVM

Mackey-Glass 0.0703 0.0087 0.0016NNGC1-1 6594.2 4039.9 3608.4NNGC1-2 155.84 113.1 104.52NNGC1-3 7828.1 4771.5 4446.8NNGC1-4 2396.6 1683 1491

Table 4. Performance in terms of SMAPE

Dataset LL-MIMO LS-SVMk-NNbasedLS-SVM

Mackey-Glass 0.0698 0.0077 0.0013NNGC1-1 0.4323 0.1978 0.1686NNGC1-2 0.3913 0.2764 0.2188NNGC1-3 0.3909 0.1677 0.154NNGC1-4 0.2936 0.1604 0.1347

More experiments have been conducted to study the pre-diction ability of the proposed framework when the predic-tion horizon n is large. In the experiments, n is set to rangefrom 20 to 180, and the results in terms of SMAPE areshown in Figure 5.For the Mackey-Glass time series, the SMAPE pre-

diction errors increase slightly when n is larger. For the

73

�

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Figure 4. Performance in terms of SMAPE

�

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

Figure 5. Performance of the k-NN based LS-SVM on long-term time series prediction

four time series fromNNGC1, which are transportation datacollected hourly, the error measures drop down when n isaround 130. It is because the four time series datasets ex-hibit a roughly periodic pattern with a period of about 130.The overall prediction errors are steady for all time series.

4. Conclusion

Due to the need and challenge of long-term time seriesprediction, in this paper, we presented a k-NN based LS-SVM framework for the multi-step ahead prediction. Ak-NN component with a new defined distance measure isused to reduce the size of the training dataset by select-ing the instances which are similar to the testing instance.Meanwhile, a Boundary Constraint component is added topost-precess the predicted values to make sure that the pre-diction results have a sound physical meaning. A reducedtraining dataset leads to a shorter time for training the pre-diction model, and the experimental results have shown that

the proposed framework performs better than the traditionalLS-SVM approach and the LL-MIMO approach with lowerprediction errors.

References

[1] G. Bontempi. Long term time series prediction with multi-input multi-output local learning. In Proceedings of the 2ndEuropean Symposium on Time Series Prediction, pages 145–154, February 2008.

[2] S. F. Crone. Artificial neural network & computational in-telligence forecasting competition. February 2010. www.neural-forecasting-competition.com/.

[3] L. J. Herrera, H. Pomares, I. Rojas, A. Guilln, A. Prieto,and O. Valenzuela. Recursive prediction for long term timeseries forecasting using advanced models. Neurocomputing,70(16-18):2870–2880, May 2007.

[4] P. Liu and J. Yao. Application of least square support vec-tor machine based on particle swarm optimization to chaotictime series prediction. In Proceedings of the IEEE Interna-tional Conference on Intelligent Computing and IntelligentSystems, 4:458–462, November 2009.

[5] M. Mackey and L. Glass. Oscillation and chaos in physio-logical control systems. Science, 197(4300):287–289, July1977.

[6] M. Maralloo, A. Koushki, C. Lucas, and A. Kalhor. Longterm electrical load forecasting via a neurofuzzy model. InProceedings of the 14th International CSI Computer Con-ference, pages 35–40, October 2009.

[7] K. Meng, Z. Dong, and K. Wong. Self-adaptive radial ba-sis function neural network for short-term electricity priceforecasting. Generation, Transmission Distribution, IET,3(4):325–335, April 2009.

[8] K. Pelckmans, J. A. K. Suykens, T. V. Gestel, J. D. Braban-ter, L. Lukas, B. D. Moor, and J. Vandewalle. Ls-svmlabtoolbox user’s guide. ESAT-SCD-SISTA Technical Report,pages 1–106, February 2003.

[9] W. J. Puma-Villanueva, E. dos Santos, and F. Von Zuben.Long-term time series prediction using wrappers for variableselection and clustering for data partition. In Proceedingsof the International Joint Conference on Neural Networks,pages 3068–3073, August 2007.

[10] A. Sfetsos and C. Siriopoulos. Time series forecasting witha hybrid clustering scheme and pattern recognition. IEEETransactions on Systems, Man and Cybernetics, Part A: Sys-tems and Humans, 34(3):399–405, May 2004.

[11] A. Sorjamaa, J. Hao, N. Reyhani, Y. Ji, and A. Lendasse.Methodology for long-term prediction of time series. Neu-rocomputing, 70(16-18):2861–2869, May 2007.

[12] J. A. K. Suykens, T. V. Gestel, J. D. Brabanter, B. D. Moor,and J. Vandewalle. Least Squares Support Vector Machines.World Scientific, Farrer Road, Singapore, 2002.

[13] B. Yegnanarayana. Artificial Neural Networks. Prentice-Hall of India Pvt.Ltd, 2004.

74

Documents

[IEEE Integration (2010 IRI) - Las Vegas, NV, USA (2010.08.4-2010.08.6)] 2010 IEEE International Conference on Information Reuse & Integration - k-NN based LS-SVM framework for long-term