1-s2.0-S0957417410009930-main

8/21/2019 1-s2.0-S0957417410009930-main

1/9

An intelligent forecasting model based on robust wavelet m-support vector machine

Qi Wu a,b,⇑, Rob Law b

a Jiangsu Key Laboratory for Design and Manufacture of Micro–Nano Biomedical Instruments, Southeast University, Nanjing 211189, Chinab School of Hotel and Tourism Management, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

a r t i c l e i n f o

Keywords:

Support vector machineWavelet kernel

Robust loss function

Particle swarm optimization

Forecast

a b s t r a c t

Aiming at the problem of small samples, season character, nonlinearity, randomicity and fuzziness in

product demand series, the existing support vector kernel does not approach the random curve of thedemands time series in the L2( R n) space (quadratic continuous integral space). The robust loss function

is also proposed to solve the shortcoming of e-insensitive loss function during handling hybrid noises.A novel robust wavelet support vector machine (RW m-SVM) is proposed based on wavelet theory andthe modified support vector machine. Particle swarm optimization algorithm is designed to select the

optimal parameters of RW m-SVM model in the scope of constraint permission. The results of applicationin car demand forecasts show that the forecasting approach based on the RW m-SVM model is effectiveand feasible, the comparison between the method proposed in this paper and other ones is also given

which proves this method is better than RW m-SVM and other traditional methods. 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Application of time series prediction can found in the areas of

economic and business planning, inventory and product control,

weather forecasting, signal processing and many other fields

(Box & Jenkins, 1994; Engle, 1984; Hornik, Stinchcombe, & White,

1989; Hill, Connor, & Remus, 1996; Tuan & Lanh, 1981; Tong, 1983;

Tang, Almedia, & Fishwick, 1991; Zhang, 2001). Product demand

forecasting as an application of time series forecasting is a complex

dynamic system, and the demandbehavior is affected by many fac-

tors. Many of these factors have the random, nonlinear, seasonal,

and uncertain characteristics. There is a kind of nonlinear mapping

relationship between the influencing factors and demand series,

and it is difficult to describe the relationship by definite mathemat-

ical models.

For the linear series, Box and Jenkins (1994) developed the

autoregressive integrated moving average (ARIMA) methodology

for forecasting time series events. A basic tenet of the ARIMA mod-

eling approach is the assumption of linearity among the variables.

However, there are many time series events for which the assump-

tion of linearity may not hold. Clearly, ARIMA models cannot be

effectively used to capture and explain nonlinear relationships.

When ARIMA models are applied to processes that are nonlinear,

forecasting errors often increase greatly as the forecasting horizon

becomes longer. To improve forecasting nonlinear time series

events, researchers have developed alternative modeling ap-

proaches, which include nonlinear regression models, the bilinear

model (Tuan & Lanh, 1981), the threshold autoregressive model

(Tong, 1983), and the autoregressive heteroscedastic model

(ARCH) (Engle, 1984). Although these methods exhibiting improve-

ment over the linear models for some specific case, tend to be

application specific, lack of generality and harder to implement

(Zhang, 2001).

For the nonlinear series, the artificial neural network (ANN) is a

general purpose model that has been used as a universal functional

approximator. For example, it is supposed to be able to model eas-

ily any type of parametric or non-parametric process including

automatically and optimally transforming the input data. These

claims lead an increasing interest in neural networks (Hornik

et al., 1989). Researchers use ANN methodology to forecast a num-

ber of nonlinear time series events (Hill et al., 1996; Tang et al.,

1991; Tang & Fishwick, 1993). The effectiveness of neural network

models and their performance in comparison to traditional fore-

casting methods have also been a subject of many studies (Gorr,

1994; Zhang, Patuwo, & Hu, 1998). Bell, Ribar, and Verchio

(1989) compare back-propagation networks against regression

models in predicting commercial bank failures. The neural network

model performs well in failure prediction and the expected costs

for misclassification by the neural network models are found to

be lower than those of the logistic regression model. Roy and Cos-

set (1990) also use neural network and logistic regression models

in predicting country risk ratings for economic models andpolitical

indicators. The neural network models have lower mean absolute

error in their predictions and react more evenly to the indicators

0957-4174/$ - see front matter 2010 Elsevier Ltd. All rights reserved.doi:10.1016/j.eswa.2010.09.036

⇑ Corresponding author at: Jiangsu Key Laboratory for Design andManufacture of

Micro–Nano Biomedical Instruments, Southeast University, Nanjing 211189, China.

Tel.: +86 25 51166581; fax: +86 25 511665260.

E-mail addresses: [email protected], [email protected] (Q. Wu), hmro-

[email protected] (R. Law).

Expert Systems with Applications 38 (2011) 4851–4859

Contents lists available at ScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a

http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://dx.doi.org/10.1016/j.eswa.2010.09.036mailto:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.eswa.2010.09.036http://www.sciencedirect.com/science/journal/09574174http://www.elsevier.com/locate/eswahttp://www.elsevier.com/locate/eswahttp://www.sciencedirect.com/science/journal/09574174http://dx.doi.org/10.1016/j.eswa.2010.09.036mailto:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.eswa.2010.09.036http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-

8/21/2019 1-s2.0-S0957417410009930-main

2/9

than their logistic counterparts. Duliba (1991) compare neural net-

work models with four types of regression models in predicting the

financial performance of transportation companies. She has found

that the neural network model outperforms the random-effects

regression model rather than the fixed-effects model. Though neu-

ral networks are more powerful than regression methods for time

series prediction, their drawback is that the design of an efficient

architecture and the choice of the parameters involved require

longer processing time. In fact, learning neural network weights

can be considered as a hard optimization problem for which the

learning time scales exponentially as the problem size grows. To

overcome this disadvantage, a new approach should be explored.

Recently, a novel machine learning technique, called support

vector machine (SVM), has drawn much attention in the fields of

pattern classification and regression forecasting. SVM was first

introduced by Vapnik (1995). Support vector machine (SVM) is a

kind of classifier’s studying method on statistic study theory. This

algorithm derives from linear classifier, and can solve the problem

of two kind classifier, later this algorithm applies in non-linear

fields, that is to say, we can find the optimal hyperplane (large

margin) to classify the samples set. It is an approximate implemen-

tation to the structure risk minimization (SRM) principle in statis-

tical learning theory, rather than the empirical risk minimization

(ERM) method (Kwok, 1999).

Compared with traditional neural networks, SVM can use the

theory of minimizing the structure risk to avoid the problems of

excessive study, calamity data, local minimal value and so on.

For the small samples set, this algorithm can be generalized well.

Support vector machine (SVM) has been successfully used for ma-

chine learning with large and high dimensional data sets. These

attractive properties make SVM become a promising technique.

This is due to the fact that the generalization property of an SVM

does not depend on the complete training data but only a subset

thereof, the so-called support vectors. Now, SVM has been applied

in many fields as follows: handwriting recognition, three-dimen-

sion objects recognition, faces recognition, text images recognition,

voice recognition, regression analysis, and so on Carbonneau,Laframbois, and Vahidov (2008), Trontl, Smuc, and Pevec (2007)

Wohlberg, Tartakovsky, and Guadagnini (2006).

For pattern recognition and regression analysis, the non-linear

ability of SVM can use kernel mapping to achieve. For the kernel

mapping, the kernel function must satisfy the condition of Mercer

theorem. The Gauss function is a kind of kernel function which is

general used. It shows the good generalization ability. However,

for our used kernel functions so far, the SVM cannot approach

any curve in L2( R n) space (quadratic continuous integral space), be-

cause the kernel function which is used now is not the complete

orthonormal base. This character lead the SVM cannot approach

every curve in the L2( R n) space, similarly, the regression SVM can-

not approach every function.

According to the above describing, we need find a new kernelfunction, and this function can build a set of complete base through

horizontal floating and flexing. As we know, this kind of function

has already existed, and it is the wavelet functions. Based on wave-

let decomposition, this paper propose a kind of allowable support

vector’s kernel function which is named wavelet kernel function,

and we can prove that this kind of kernel function is existent.

The Morlet and Mexican wavelet kernel functions are the ortho-

normal base of L2( R n) space. Based on the wavelet analysis and

conditions of the support vector kernel function, Morlet or Mexican

wavelet kernel function for support vector regression machine

(SVM) is proposed, which is a kind of approximately orthonormal

function. This kernel function can simulate almost any curve in

quadratic continuous integral space, thus it enhances the general-

ization ability of the SVR. The papers (Khandoker, Lai, Begg, &Palaniswami, 2007; Widodo & Yang, 2008 research on wavelet

e-support vector machine. Much research indicates the perfor-mance of m-SVM is better than one of e-SVM. According to thewavelet kernel function and the regularization theory, m-supportvector machine on wavelet kernel function (Wm-SVM) is proposedin this paper.

However, the standard SVM encounters certain difficulties in

real application. Some improved SVMs have been put forward to

solve the concrete problems (Kwok, 1999). Though the standard

SVM that adopts e-insensitive loss function has good generaliza-tion capability in some applications. But it is difficult to handle

Gaussian noises and the normal distribution noise parts of series.

Therefore, this paper focuses on the modeling of a new wavelet

SVM that can penalize the Gaussian noise parts of series.

Based on the RW m-SVM, an intelligence forecasting approachfor car demand series with the nonlinear and uncertain character-

istics is proposed in this paper. Section 2 construct an intelligence

forecasting model based on a new m-support vector regression ma-chine on wavelet kernel function and robust loss function (RW m-SVM) and particle swarm optimization algorithm (PSO). Section 3

gives two algorithms to solve the intelligence forecasting problem.

Section 4 gives an application of the intelligence forecasting sys-

tem based on RW m-SVM model. Section 5 draws the conclusions.

2. Robust wavelet m -support vector machine (RW m -SVM)

2.1. Support vector machine

SVM represent a novel neural network technique, which has

gained ground in classification, forecasting and regression analysis.

One of its key properties is that training SVM is equivalent to solv-

ing a linearly constrained quadratic programming problem, whose

solution turns out to be unique and globally optimal. Therefore,

unlike other networks’ training techniques, SVM circumvent the

problem of getting stuck at local minima. Another advantage of

SVM is that the solution to the optimization problem depends only

on a subset of the training data points, which are referred to as thesupport vectors.

Let us consider a set of data points ( x1, y1), ( x2, y2), . . ., ( xl, yl),

which are independently and randomly generated from an un-

known function. Specifically, x i is a column vector of attributes, yiis a scalar, which represents the dependent variable, and l denotes

the number of data points in the training set. SVM approximate

such an unknown function by mapping x into a higher dimensional

space through a function / , and determining a linear maximum-

margin hyper-plane. In particle, the smallest distance to such a

hyperplane is called the margin of separation. The hyper-plane will

be an optimal separating hyper-plane if margin is maximized. The

data points that are located exactly the margin distance away from

the hyper-plane are denominated the support vectors.

Mathematically, SVM utilize a classifying hyper-plane of the

form f ( x) = w x + b = 0, where the coefficients w and b are esti-

mated by minimizing a regularized risk function:

1

2kwk2 þ C

Xli¼1

Leð yiÞ; ð1Þ

where kwk is denoted as the regularized term, Pl

i¼1Leð yiÞ is the

empirical error, and C > 0 is an arbitrary penalty parameter called

the regularization constant. Basically, SVM penalize f ( xi) when is de-

parts from yi by means of an e-insensitive loss function:

Leð yiÞ ¼ 0 if j f ð xiÞ yij

8/21/2019 1-s2.0-S0957417410009930-main

3/9

the margin of separation to the hyper-plane. The e-insensitive lossfunction is illuminated in Fig. 1.

The minimization of expression (1) is implemented by intro-

ducing the slack variables ni and ni . Specifically, the m-support vec-

tor regression (m-SVM) solves the following quadraticprogramming problem:

minw;nðÞ ;e;b sðw; nðÞ; eÞ ¼ 1

2 kwk2

þ C v eþ1

lXli¼1 n

i þ ni ! ð3ÞSubject to ðw xi þ bÞ yi 6 eþ ni; ð4Þ

yi ðw xi þ bÞ 6 eþ ni ; ð5Þn

ðÞi P 0; eP 0: ð6Þ

The solution to this minimization problem is of the form

f ð xÞ ¼Xli¼1

ai ai

K ð xi; xÞ þ b; ð7Þ

where ai and ai are the Lagrange multiplies associated with theconstrains (w xi + b) yi 6 e + ni and yi ðw xi þ bÞ 6 e þ n

i ,

respectively. The function K ( xi, x j) = /( xi)0/( x j) represents a kernel,

which is the inner product of the two vectors xi and x j in the space

/( xi) and /( x j).

Well-known kernel functions are K ð xi; x jÞ ¼ x0i x j (linear),

K ð xi; x jÞ ¼ c x0i x j þ r d

; c > 0 (polynomial), K ( xi, x j) = e x p (ck xi x jk

2), c > 0 (radial basis function), and K ð xi; x jÞ ¼ tanh c x0i x j þ r

(sigmoid). Theradialkernel is a popularchoice in the SVMliterature.

2.2. The conditions of wavelet support vector’s kernel function

The support vector’s kernel function can be described as not

only the product of point, such as K ( x, x0) = K ( x x0), but also the

horizontal floating function, such as K ( x, x0) = K ( x x0). In fact, if a

function satisfied condition of Mercer, it is the allowable support

vector kernel function.

Lemma 1. The symmetry function K (x,x0) is the kernel function of

SVM if and only if: for all function u – 0 which satisfied the conditionof R

Rd u2ðnÞdn < 1, we need satisfy the condition as follows:Z Z

K ð x; x0Þuð xÞuð xÞd xd x0 P 0: ð8Þ

This theorem proposed a simple method to build kernel function.

For the horizontal floating function, because hardly dividing

this function into two same functions, we can give the condition

of horizontal floating kernel function.

Lemma 2. The horizontal floating function is allowable support

vector’s kernel function if and only if the Fourier transform of K (x)

need satisfy the condition follows:

F ½ xðxÞ ¼ ð2pÞn=2Z

Rnexpð jðx: xÞÞK ð xÞdx P 0: ð9Þ

If the wavelet function w(x) satisfied the conditions: w(x) 2L 2(R) \ L1(R), and ŵð xÞ ¼ 0; bw is the Fourier transform of functionw(x). The wavelet function group can be defined as:wa;mð xÞ ¼ ðaÞ

12w

x ma

; ð10Þ

where a is the so-called scaling parameter, m is the horizontal floating

coefficient, and w(x) is called the ‘‘mother wavelet ”. The parameter of translation m 2 R and dilation a > 0, may be continuous or discrete.

For the function f(x), f(x) 2 L 2(R), The wavelet transform f(x) can be de-

fined as:

W ða; mÞ ¼ ðaÞ12Z þ1

1 f ð xÞw x m

a

dx; ð11Þ

where w⁄(x) stands for the complex conjugation of w(x).

The wavelet transform W (a, m) can be considered as functions

of translation m with each scale a . Eq. (11) indicates the waveletanalysis is a time–frequency analysis, or a time-scaled analysis.

Different from the Short Time Fourier Transform, the wavelet

transform can be used for multi-scale analysis of a signal through

dilation and translation so it can extract time–frequency features

of a signal effectively.

Wavelet transform is also reversible, which provides the possi-

bility to reconstruct the original signal. A classical inversion for-

mula for f ( x) is:

f ð xÞ ¼ C 1wZ þ1

1

Z þ11

W ða; mÞwa;mð xÞda

a2 dm; ð12Þ

where

C w ¼ Z 1

1 jŵðw

Þj2

jwj dw

8/21/2019 1-s2.0-S0957417410009930-main

4/9

We can build the horizontal floating kernel function as follows:

K ð x; x0Þ ¼Ydi¼1

w xi x0i

ai

; ð16Þ

where ai is the scaling parameter of wavelet, ai > 0. So far, because

the wavelet kernel function must satisfy the conditions of Theorem

2, the number of wavelet kernel function which can be showed by

existent functions is few. Now, we give an existent wavelet kernelfunction: Morlet wavelet kernel function, and we can prove that

this function can satisfy the condition of allowable support vector’s

kernel function. Morlet wavelet function is defined as follows:

wð xÞ ¼ cosðx0 xÞexp x2

2 : ð17Þ

Theorem 1. Morlet wavelet kernel function is defined as:

K ð x; x0Þ ¼Yni¼1

cos x0 xi x0i

a

exp xi x

0i

22a2

! ð18Þ

and this kernel function is an allowable support vector kernel function.

Proof. According to the Lemma 2, we only need to prove

F ½ xðxÞ ¼ ð2pÞn=2Z

Rnexpð jðx: xÞÞK ð xÞd x P 0; ð19Þ

where K ð xÞ ¼Qn

i¼1w xi

a

¼

Qni¼1 cos

w0 xia

expðk xik

2=2a2Þ, j denote imag-

inary number unit. We haveZ Rn

expð jx xÞK ð xÞd x

¼Z

Rnexpð jx xÞ

Yni¼1

cos w0 xi

a

exp k xik

2

2a2

! !d x

¼Yni¼1

Z 11

expð jxibixiÞ expð jw0 xi=aÞþexpð jw0 xi=aÞ2

exp k xik2

2a2 !

d xi ¼Yni¼1

12

Z 11

exp k xik2

2 þ w0 j

a jxia

xi

!

þexp k xik2

2 w0 j

a þ jxia

xi

!!

¼Yni¼1

jaj ffiffiffiffiffiffiffi

2pp

2 exp ðw0 xiaÞ

2

2

!þexp ðw0 þxiaÞ

2

2

! !:

ð20ÞSubstituting formula (20) into Eq. (19), we can obtain Eq. (21).

F ½ X ðxÞ ¼Yni¼1

jaj2

exp ðw0 xiaÞ

2

2

!þexp ðw0 þxiaÞ

2

2

! !;

ð21

Þwhere a – 0, we have

F ½ xðxÞ P 0: ð22Þ

If we use wavelet kernel function as the support vector’s kernel

function,the regression estimation equationof Wm-SVM is defined as:

f ð xÞ ¼Xli¼1

ai ai Yl

i¼1w

x xia

þ b: ð23Þ

For wavelet analysis and theory, see (Krantz, 1994; Liu & Di,

1992). h

2.3. Robust loss function

However, for standard wavelet m-SVM, it is difficult to deal withthe hybrid noise of time series. To solve the shortage of e-insensi-tive loss of standard wavelet m-SVM, a new hybrid function com-posed of Gaussian function, Laplace function and e-insensitiveloss function is constructed as the loss function of m-SVM, whichis called robust loss function. Then robust loss function can be de-

fined as follows:

LðnÞ ¼0 jnj 6 e12ðjnj eÞ2 e el;

8><>:

ð24Þ

where el = e + l, n are slack variable.The middle part of robust loss function curve is replaced by error

quadratic curve, which is used to inhibit (penalize) the type of noise

with the feature of Gaussian distribution. The linear part is generally

used to inhibit (penalize) singularity points and biggish magnitude

noises of time series. The curve of robust loss function, which is di-

vided into three parts, is illustrated in Fig. 2. The proposed robust

loss function integrates the advantage of Gaussian loss function, La-

place loss function ande-insensitive loss function and makes supportvector machine better robustness and good generalization ability.

2.4. Robust wavelet m-support vector machine

Integrating the wavelet kernel function, robust loss functionand m-support vector machine, a robust wavelet support vectormachine is proposed in this part. The parameter b is taken into ac-

count confidence interval of RW m-SVM, then the new optimalproblem be reformulated as

minw;nðÞ ;e;b

1

2ðkwk2 þb2ÞþC m eþ

Xi2I 1

1

2 n2i þn2i þ1

l

Xi2I 2

l ni þni !

ð25ÞSubjectto ðw xi þbÞ yi 6 eþni; ð26Þ

yi ðw xi þ bÞ6 eþni ; ð27Þn

ðÞi P0; eP0: ð28Þ

Problem (25) is a quadratic programming (QP) problem. By

introducing Lagrangian multipliers, a Lagrangian function can be

defined as follows.

L w; b;aðÞ; b; nðÞ; e;gðÞ ¼ 1

2kwk2 þ 1

2b

2 þ C meþ C Xi2I 1

12 n

2i þ n2i

þ C l

Xi2I 2

l ni þ ni

beXi2I 2

gini þgi ni

Xli¼1

ai eþ ni þ w xi þ b yið Þ

Xl

i¼1 ai eþ ni w xi b þ yi

; ð29Þε − ε +

µ

e

( )e L

µ ε

Fig. 2. Robust loss function.

4854 Q. Wu, R. Law/ Expert Systems with Applications 38 (2011) 4851–4859

http://-/?-http://-/?-http://-/?-http://-/?-

8/21/2019 1-s2.0-S0957417410009930-main

5/9

where aðÞi ;gðÞi ; bP 0 are Lagrangian multipliers. Differentiating the

Lagrangian function (29) with regard to w, b, e, n(⁄), we have

@ L

@ w ¼ 0 ) w ¼

Xli¼1

ai ai

xi; ð30Þ

@ L

@ b ¼ 0 )X

l

i¼

1

ai ai ¼ b; ð31Þ@ L

@ e ¼ 0 ) b ¼ C m

Xli¼1

ai þ ai

; ð32Þ

@ L

@ nðÞ ¼ 0 ) gðÞi ¼ C l=l aðÞi : ð33Þ

By substituting (30)–(33) into (29), we can obtain the corre-

sponding dual form of function (25) as follows:

mina;a2Rl

1

2

Xli¼1

Xl j¼1

ai ai

a j a j

ðK ð xi; x jÞ þ 1Þ Xli¼1

y i ai ai

þ 12C

Xli¼1

a2i þ a2i

s:t: eT

ða þ aÞ 6 C m;0 6 ai; a

i 6 min C m;

C

l l

: ð34Þ

Formula (34) is represented by means of matrix form, we have

mina;a2 R l

1

2 aT ; ðaÞT h i Q þ E

C Q

Q Q þ E C

" # a

a

þ y T ; y T a

a

s:t: eT ða þ aÞ 6 C v ;

0 6 ai; ai 6 min C m;

C

l l

;

ð35Þ

where Q ij = k( xi, x j) + 1 , e = [1, . . ., 1]T . a and a⁄ are lagrangian multi-

pliers, which are nonnegative number.

Transform Eq. (35) into compact formulation as follows:

min 1

2aT H aþ ya

s:t: eT ða þ aÞ 6 C v ;

0 6 ai;ai 6 min C m;

C

l l

;

ð36Þ

where a ¼ a

a

; H ¼

Q þ E =C Q Q Q þ E =C

; y ¼

y y

.

The output regression function of RW m-SVM is as follows:

f ð xÞ ¼Xli¼1

ai ai Yl

i¼1w

x xia

þ 1

!: ð37Þ

It is obvious that RW m-SVM (whose constraint conditions areless than those of the standardWm-SVMby one) has a more concisedual problem. There is no parameter b in the estimation function

Eq. (37), which reduces the complexity of the model.

2.5. The optimization algorithm for the unknown parameters of the

RW m-SVM model

The confirmation of unknown parameters of the RW m-SVM iscomplicated process. In fact, it is a multivariable optimization

problem in a continuous space. The appropriate parameter combi-

nation of models can enhance approximating degree of the original

series Therefore, it is necessary to select an intelligence algorithm

to get the optimal parameters of the proposed models. The param-

eters of RWm-SVM have a great effect on the generalization perfor-mance of RW m-SVM. An appropriate parameter combination

corresponds to a high generalization performance of the RW m-SVM. PSO algorithm is considered as an excellent technique to

solve the combinatorial optimization problems (Krusienski, 2006;

Yamaguchi, 2007). The PSO algorithm, introduced by Kenedy &

Eberhart (1995), is used to determine the parameter combination

of RW m-SVM.Similarly to evolutionary computation techniques, PSO uses a

set of particles, representing potential solutions to the problemun-

der consideration. The swarm consists of m particles; each particle

has a position X i = { xi1, xi2, . . ., xim}, and a velocity V i = {v i1,v i2, . . .,v im},

and moves through a n-dimensional search space. According to the

global variant of the PSO algorithm, each particle moves towards

its best previous position and towards the best particle g in the

swarm. Let us denote the best previously visited position of the

ith particle that gives the best fitness value as p_c i = { p_c i1, p_-

c i2, . . ., p_c im}, and the best previously visited position of the swarm

that gives best fitness as p_ g = { p_ g 1, p_ g 2, . . ., p_ g n}.

The change of position of each particle from one iteration to an-

other can be computed according to the distance between the cur-

rent position and its previous best position and the distance

between the current position and the best position of swarm. Then

the updating of velocity and particle position can be obtained by

using the following equations:

v kþ1id ¼ wv kij þ c 1r 1 p c ij xkij

þ c 2r 2 p g j xkij

; ð38Þ

xkþ1ij ¼ xkij þv kþ1ij ; ð39Þ

where w is called inertia weight and is employed to control theimpact

of theprevioushistoryof velocitiesonthecurrent one. Accordingly, the

parameter w regulates the trade-off between the global and local

explorationabilities of theswarm. A large inertia weightfacilitatesglo-

balexploration,whilea smallonetendsto facilitate local exploration. A

suitable value of the inertia weight w usually provides balance be-

tween global and local exploration abilities and consequently results

in a reduction of the number of iterations required to locate the opti-

mum solution. k = 1, 2,. . ., K max denotes the iteration number, c 1 is

the cognition learning factor, c 2 is the social learning factor and r 1and r 2 are randomnumbers uniformly distributed in [0,1].

Thus, the particle flies through potential solutions towards P kiand pg k in a navigated way while still exploring new areas by the

stochastic mechanism to escape from local optima. Since there

was no actual mechanism for controlling the velocity of a particle,

it was necessary to impose a maximum value V max on it. If the

velocity exceeds the threshold, it is set equal to V max, which con-

trols the maximum travel distance at each iteration to avoid this

particle flying past good solutions.

2.6. The intelligence forecasting system

In the forecasting technique of product demand series, two of

the key problems are how to deal with noise and nonstationarity.A potential solution to the above two problem is to use a mixture

of experts (ME) architecture illuminated by Fig. 3. ME architecture

is generalized into a two-stage architecture to handle the non-sta-

tionary in the data. In the first of the two-stage architecture, a mix-

ture of experts including evolutionary algorithm, partial least

squares, k-nearest neighbors are competed to optimize the model

in the second part of the two-stage architecture. To valuate the

model forecasting capacity of the second stage, the fitness function

of ME architecture is designed as follows:

fitness ¼ 1l

Xli¼1

yi yi yi

2; ð40Þ

where l is thesize of theselected sample, yi denote the forecastingva-lue of the selected sample, yi is original date of the selected sample.

Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 4851–4859 4855

http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-

8/21/2019 1-s2.0-S0957417410009930-main

6/9

3. Intelligent forecasting method based on RW m -SVM and PSO

ME architecture is an intelligence forecasting system that can

handle the noise and nonstationarity of time series and construct

the nonlinearity relation in high dimension space effectively.

According to the above idea, Particle swarm optimization algo-

rithm can be described as following:

Algorithm 1

Step (1) Data preparation: Training and testing sets are repre-

sented as Tr and Te, respectively.

Step (2) Particle initialization and PSO parameters setting: Gener-

ate initial particles. Set the PSO parameters including

number of particles (n), particle dimension (m), number

of maximal iterations (kmax), error limitation of the fit-

ness function, velocity limitation (V max), and inertia

weight for particle velocity (w). Set iterative variable:

k = 0. And perform the training process from Steps 3–7.

Step (3) Set iterative variable: k = k + 1.

Step (4) Compute the fitness function value of each particle. Take

current particle as individual extremum point of everyparticle and do the particle with minimal fitness value

as the global extremum point.

Step (5) Stop condition checking: if stopping criteria (maximum

iterations predefined or the error accuracy of the fitness

function) are met, go to Step 7. Otherwise, go to the next

step.

Step (6) Update the particle position by formula (38) and (39) and

form new particle swarms, go to Step 3.

Step (7) End the training procedure, output the optimal parame-

ters (C ,v , a).

On the basis of the RW m-SVM model, we can summarize a de-mand forecasting algorithm as the follows.

Algorithm 2

Step (1) Initialize the original data by normalization and fuzzifi-

cation, and then, form training and testing set.

Step (2) Deal the demand series with wavelet transform on thedifferent scale and select the best wavelet function K

and scale scope ai that can match the original series well.

Step (3) Compute the wavelet kernel function by (16). Construct

the QP problem (34) of the RW m-SVM.Step (4) Go to Algorithm 1, and get the optimal parameters com-

bination vector (C ,v , a), solve the optimization problem

(36) and obtain the parameters a(⁄).

Step (5) For a new demand task, extract product characteristics

and form a set of input variables x.

Step (6) Compute the forecasting result f ( x) by (31).

4. Experiments

To illustrate the proposed intelligence forecasting method, theforecast of car demand series is studied. The car is a type of con-

sumption product influenced by macroeconomic in manufacturing

system and its demand action is usually driven by many uncertain

factors. Some factors with large influencing weights are gathered

to develop a factor list, as shown in Table 1. The first four factors

are expressed as linguistic information and the last two factors

are expressed as numerical data.

In our experiments, car demand series are selected from pastde-

mand record in a typical company. The detailed characteristic

data and demand series of these cars compose the corresponding

Output the optimal combinational paramters

Accuracy check

Output the current

combinational parameters

v

Fig. 3. The intelligence forecasting system based on RW m-SVM and PSO.


http://-/?-http://-/?-

8/21/2019 1-s2.0-S0957417410009930-main

7/9

training and testing sample sets. During the process of the car scale

series forecasting, six influencing factors, viz., brand famous degree

(BF), performance parameter (PP), form beauty (FB),demands

experience (SE), dweller deposit (nd) and oil price (n p), are taken

into account the first four influencing factors are linguistic infor-

mation, the latest two factors are numerical information. All

linguistic information of gotten influencing factors is dealt with

fuzzy logic and form numerical information.

The proposed forecasting model has been implemented in Mat-

lab 7.1 programming language. The experiments are made on a

1.80 GHz Core (TM)2 CPU personal computer (PC) with 1.0G mem-

ory under Microsoft Windows xp professional. Some criteria, such

as mean absolute error (MAE), mean absolute percentage error

(MAPE) and mean square error (MSE), are adopted to evaluate

Table 1

Influencing factors of car demand forecast.

Product characteristics Unit Expression Weight

Brand famous degree (BF) Dimensionless Linguistic

information

0.9

Performance parameter

(PP)

Dimensionless Linguistic

information

0.8

Form beauty (FB) Dimensionless Linguistic

information

0.8

Sales experience (SE) Dimensionless Linguistic

information

0.5

Dweller deposit (DD) Dimensionless Numerical

information

0.8

Oil price (OP) Dimensionless Numerical

information

0.4

Fig. 4. Mexican wavelet transform ofdemands time series in the scope of different scale.

Fig. 5. Morlet wavelet transform of demands time series in the scope of different scale.


8/21/2019 1-s2.0-S0957417410009930-main

8/9

the performance of the intelligence forecasting system. The initial

parameters of the intelligence forecasting system are given as fol-

lows: inertia weight w0 = 0.9; positive acceleration constants c 1,

c 2 = 2; l = 1; the fitness accuracy of the normalized samples isequal to 0.0005.

The wavelettransform of the originalscale series on the different

scales is got by means of the Steps 1 and 2 of Algorithm 2. The se-

lected waveletfunctions consist of morlet,haar, mexican andGauss-

ianwavelet.To reducethe length of thispaper, onlyrepresentational

morlet and mexican wavelet transforms on the different scales are

given in Figs. 4 and 5. Mexicanwavelet transform is the bestwavelet

transform that can inosculate the original demand series on the

scope of scale from 0.01 to 2 among all given wavelet transforms.

Therefore, Mexican wavelet can be ascertained as a kernel func-

tion of RW m-SVM model, three parameters also are determined asfollows:

v 2 ½0; 1; a 2 ½0:001; 2 and

C 2 maxð xi; jÞ minð xi; jÞl

103; maxð xi: jÞ minð xi; jÞl

103

:

The optimal combinational parameters are obtained by Algo-

rithm 1, viz., C = 525.57, v = 0.82 and a = 0.27. Fig. 6 illuminated

the forecasting result of the original car demand series given byAlgorithm 2.

To analyze the forecasting capability of RW m-SVM model, themodels (wavelet m-support vector machine with Gaussian lossfunction (W g -SVM) and wavelet m-support vector machine (Wm-

SVM)) train the original demand series respectively, then give thelast 12 months forecasting results of each model shown in Table 2

(the last 12 months sample for testing sample). The linear inertia

weight of standard PSO is adopted:

w ¼ wmax wmax wminkmax

k ð41Þ

where wmax = 0.9 is the maximal inertia weight, wmin = 0.1 is the

minimal inertia weight, k is iterative number of controlling proce-

dure process.

To evaluate the forecasting error of these models, the compari-

son among different forecasting approaches is shown in Table 3.

The Table 3 shows the error index distribution by means of

dealt with four different models. The index (MAE, MAE and MSE)

of W g -SVM model is better than that of Wm-SVM model. Theindexes of RW m-SVM are better than these of Wm-SVM andW g -SVM. It is obvious that robust loss function can improve the

generalization ability of support vector machine.

Experiment results show that the regression’s precision of RW

m-SVM is improved due to adopting wavelet kernel and robust lossfunction, compared with the models (W g -SVM and Wm-SVM) andm-SVM whose kernel function is Gauss function under the sameconditions.

5. Conclusion

In this paper, a new version of WSVR, named RW m-SVM, isproposed to setup the nonlinear system of product demand series

by the integration of wavelet theory, robust loss function andm-SVM. The new forecasting model based on RW m-SVM and PSO,

Fig. 6. The car demands forecasting results from RW m-SVM model.

Table 2

Comparison of forecasting results from four different models.

Model 1 2 3 4 5 6 7 8 9 10 11 12

Real value 2967 3268 3300 1891 3489 3544 2708 1513 3411 3672 3483 1523

m-SVM 2971 3240 3269 1964 3439 3448 2754 1661 3489 3587 3433 1669Wm-SVM 2953 3257 3286 1981 3456 3465 2736 1678 3472 3605 3451 1687W g -SVM 2962 3258 3286 1936 3511 3465 2726 1632 3464 3605 3450 1641

RW m-SVM 2967 3257 3286 1922 3519 3472 2734 1611 3467 3610 3453 1620

Table 3

Error statistic of four forecasting models.

Model MAE MAPE MSE

m-SVM 69.5833 0.0309 6662Wm-SVM 63.1667 0.0303 6673W g -SVM 48.5833 0.0223 3822

RW m-SVM 43.9167 0.0194 2910


8/21/2019 1-s2.0-S0957417410009930-main

9/9

named PSO RW m-SVM, is presented to approximate arbitrarydemand curve in L2 space. The simulation results indicate RW

m-SVM can provide better forecasting precision of the productdemand series.

The performance of the RW m-SVM is evaluated using the dataof car demand, and the simulation results demonstrate that RW

m-SVM is effective in dealing with uncertain data and hybridnoises. Moreover, it is shown that particle swarm optimization

algorithm presented here is available for the RW m-SVM to seekthe optimal parameters.

Compared to Wm-SVMand W g -SVM, RWm-SVM has the best in-dexes (MAE, MAPE and MSE). RW m-SVM can overcomes the ‘‘curseof dimensionality” and has some other attractive properties, such

as the strong learning capability for small samples, the good gener-

alization performance for hybrid noises, the insensitivity to noise

or outliers and the automatic select of optimal parameters. More-

over, the wavelet transform can reduce noises in data while pre-

serve the detail or resolution of the data. Therefore, in the

process of establishing the forecasting models, much uncertain

information of scale data is not neglected but considered wholly

into the wavelet kernel function. The forecasting accuracy is im-

proved by means of adopting wavelet technique.

Acknowledgements

This research was partly supported by the National Natural Sci-

ence Foundation of China under Grant 60904043, a research grant

funded by the Hong Kong Polytechnic University, China Postdoc-

toral Science Foundation (20090451152), Jiangsu Planned Projects

for Postdoctoral Research Funds (0901023C) and Southeast Univer-

sity Planned Projects for Postdoctoral Research Funds.

References

Bell, T., Ribar, G., & Verchio, J. (1989). Neural nets vs logistic regression. In Presented

at the University of Southern California expert system symposium (Nov.) .Box, G. E. P., & Jenkins, G. M. (1994). Time series analysis: Forecasting and control (3rd

ed.). Englewood Cliffs, NJ: Prentice- Hall, Inc..

Carbonneau, R., Laframbois, K., & Vahidov, R. (2008). Application of machine

learning techniques for supply chain demand forecasting. European Journal of Operational Research, 184(3), 1140–1154.

Duliba, K. (1991). Contrasting neural nets with regression in predicting

performance. In Proceedings of the 24th international conference on systemscience, Hawaii (Vol. 4, pp. 163–170).

Engle, R. F. (1984). Combining competing forecasts of inflation using a bivariate

ARCH model. Journal of Economic Dynamics and Control, 18(2), 151–165.Gorr, W. L. (1994). Research prospective on neural forecasting. International Journal

of Forecasting, 10(1), 1–4.Hill, T., Connor, M. O., & Remus, W. (1996). Neural network models for time series

forecasts. Management Science, 42(7), 1082–1092.Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feed forward networks

are universal approximators. Neural Networks, 2(5), 359–366.Kenedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the

IEEE international conference on neural networks (pp. 1942–1948).Khandoker, A. H., Lai, D. T. H., Begg, R. K., & Palaniswami, M. (2007). Wavelet-based

feature extraction for support vector machines for screening balance

impairments in the elderly. IEEE Transactions on Neural Systems andRehabilitation Engineering, 15(4), 587–597.

Krantz, S. G. (1994). Wavelet: Mathematics and application. Boca Raton, FL: CRC.Krusienski, D. J. (2006). A modified particle swarm optimization algorithm for

adaptive filtering. In IEEE international symposium on circuits and systems, Kos,Greece (pp. 137–140).

Kwok, J. T. (1999). Moderating the outputs of support vector machine classifiers.

IEEE Transactions on Neural Networks, 10(5), 1018–1031.Liu, G. Z., & Di, S. L. (1992). Wavelet analysis and application. Xi’an, China: Xidian

Univ. Press.

Roy, J., & Cosset, J. (1990). Forecasting country risk ratings using a neural network.

In Proceedings of the 23rd international conference on system science, Hawaii (Vol.4, pp. 327–334).

Tang, Z., Almedia, C., & Fishwick, P. A. (1991). Time series forecasting using neural

networks vs. Box–Jenkins methodology. Simulation, 57 (5), 303–310.Tang, Z., & Fishwick, P. A. (1993). Feedforward neural nets as models for time series

forecasting. ORSA Journal of Computing, 5(4), 374–385.Tong, H. (1983). Threshold models in non-linear time series analysis. New York:

Springer-Verlag.

Trontl, K., Smuc, T., & Pevec, D. (2007). Support vector regression model for the

estimation of c-ray buildup factors for multi-layer shields. Annals of Nuclear Energy, 34(12), 939–952.

Tuan, D. P., & Lanh, T. T. (1981). On the first-order bilinear time series model. Journalof Applied Probability, 18(3), 617–627.

Vapnik, V. (1995). The nature of statistical learning . New York: Springer.Widodo, A., & Yang, B. S. (2008). Wavelet support vector machine for induction

machine fault diagnosis based on transient current signal. Expert Systems with Applications, 35(1–2), 307–316.

Wohlberg, B., Tartakovsky, D. M., & Guadagnini, A. (2006). Subsurface

characterization with support vector machines. IEEE Transactions onGeoscience and Remote Sensing, 44(1), 47–57.

Yamaguchi, T. (2007). Adaptive particle swarm optimization-Self-coordinating

mechanism with updating information. In IEEE international conference onsystems, man and cybernetics, Taipei, Taiwan (pp. 3: 2303–2308).

Zhang, G. P. (2001). An investigation of neural networks for linear time-series

forecasting. Computers and Operations Research, 28(12), 1183–1202.Zhang, G., Patuwo, E. B., & Hu, M. Y. (1998). Forecasting with artificial neural

network: The state of the art. International Journal of Forecasting, 14(1), 35–62.


Documents

1-s2.0-S0957417410009930-main