Upload
anonymous-psez5kgvae
View
212
Download
0
Embed Size (px)
Citation preview
8/21/2019 1-s2.0-S0957417410009930-main
1/9
An intelligent forecasting model based on robust wavelet m-support vector machine
Qi Wu a,b,⇑, Rob Law b
a Jiangsu Key Laboratory for Design and Manufacture of Micro–Nano Biomedical Instruments, Southeast University, Nanjing 211189, Chinab School of Hotel and Tourism Management, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
a r t i c l e i n f o
Keywords:
Support vector machineWavelet kernel
Robust loss function
Particle swarm optimization
Forecast
a b s t r a c t
Aiming at the problem of small samples, season character, nonlinearity, randomicity and fuzziness in
product demand series, the existing support vector kernel does not approach the random curve of thedemands time series in the L2( R n) space (quadratic continuous integral space). The robust loss function
is also proposed to solve the shortcoming of e-insensitive loss function during handling hybrid noises.A novel robust wavelet support vector machine (RW m-SVM) is proposed based on wavelet theory andthe modified support vector machine. Particle swarm optimization algorithm is designed to select the
optimal parameters of RW m-SVM model in the scope of constraint permission. The results of applicationin car demand forecasts show that the forecasting approach based on the RW m-SVM model is effectiveand feasible, the comparison between the method proposed in this paper and other ones is also given
which proves this method is better than RW m-SVM and other traditional methods. 2010 Elsevier Ltd. All rights reserved.
1. Introduction
Application of time series prediction can found in the areas of
economic and business planning, inventory and product control,
weather forecasting, signal processing and many other fields
(Box & Jenkins, 1994; Engle, 1984; Hornik, Stinchcombe, & White,
1989; Hill, Connor, & Remus, 1996; Tuan & Lanh, 1981; Tong, 1983;
Tang, Almedia, & Fishwick, 1991; Zhang, 2001). Product demand
forecasting as an application of time series forecasting is a complex
dynamic system, and the demandbehavior is affected by many fac-
tors. Many of these factors have the random, nonlinear, seasonal,
and uncertain characteristics. There is a kind of nonlinear mapping
relationship between the influencing factors and demand series,
and it is difficult to describe the relationship by definite mathemat-
ical models.
For the linear series, Box and Jenkins (1994) developed the
autoregressive integrated moving average (ARIMA) methodology
for forecasting time series events. A basic tenet of the ARIMA mod-
eling approach is the assumption of linearity among the variables.
However, there are many time series events for which the assump-
tion of linearity may not hold. Clearly, ARIMA models cannot be
effectively used to capture and explain nonlinear relationships.
When ARIMA models are applied to processes that are nonlinear,
forecasting errors often increase greatly as the forecasting horizon
becomes longer. To improve forecasting nonlinear time series
events, researchers have developed alternative modeling ap-
proaches, which include nonlinear regression models, the bilinear
model (Tuan & Lanh, 1981), the threshold autoregressive model
(Tong, 1983), and the autoregressive heteroscedastic model
(ARCH) (Engle, 1984). Although these methods exhibiting improve-
ment over the linear models for some specific case, tend to be
application specific, lack of generality and harder to implement
(Zhang, 2001).
For the nonlinear series, the artificial neural network (ANN) is a
general purpose model that has been used as a universal functional
approximator. For example, it is supposed to be able to model eas-
ily any type of parametric or non-parametric process including
automatically and optimally transforming the input data. These
claims lead an increasing interest in neural networks (Hornik
et al., 1989). Researchers use ANN methodology to forecast a num-
ber of nonlinear time series events (Hill et al., 1996; Tang et al.,
1991; Tang & Fishwick, 1993). The effectiveness of neural network
models and their performance in comparison to traditional fore-
casting methods have also been a subject of many studies (Gorr,
1994; Zhang, Patuwo, & Hu, 1998). Bell, Ribar, and Verchio
(1989) compare back-propagation networks against regression
models in predicting commercial bank failures. The neural network
model performs well in failure prediction and the expected costs
for misclassification by the neural network models are found to
be lower than those of the logistic regression model. Roy and Cos-
set (1990) also use neural network and logistic regression models
in predicting country risk ratings for economic models andpolitical
indicators. The neural network models have lower mean absolute
error in their predictions and react more evenly to the indicators
0957-4174/$ - see front matter 2010 Elsevier Ltd. All rights reserved.doi:10.1016/j.eswa.2010.09.036
⇑ Corresponding author at: Jiangsu Key Laboratory for Design andManufacture of
Micro–Nano Biomedical Instruments, Southeast University, Nanjing 211189, China.
Tel.: +86 25 51166581; fax: +86 25 511665260.
E-mail addresses: [email protected], [email protected] (Q. Wu), hmro-
[email protected] (R. Law).
Expert Systems with Applications 38 (2011) 4851–4859
Contents lists available at ScienceDirect
Expert Systems with Applications
j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://dx.doi.org/10.1016/j.eswa.2010.09.036mailto:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.eswa.2010.09.036http://www.sciencedirect.com/science/journal/09574174http://www.elsevier.com/locate/eswahttp://www.elsevier.com/locate/eswahttp://www.sciencedirect.com/science/journal/09574174http://dx.doi.org/10.1016/j.eswa.2010.09.036mailto:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.eswa.2010.09.036http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
8/21/2019 1-s2.0-S0957417410009930-main
2/9
than their logistic counterparts. Duliba (1991) compare neural net-
work models with four types of regression models in predicting the
financial performance of transportation companies. She has found
that the neural network model outperforms the random-effects
regression model rather than the fixed-effects model. Though neu-
ral networks are more powerful than regression methods for time
series prediction, their drawback is that the design of an efficient
architecture and the choice of the parameters involved require
longer processing time. In fact, learning neural network weights
can be considered as a hard optimization problem for which the
learning time scales exponentially as the problem size grows. To
overcome this disadvantage, a new approach should be explored.
Recently, a novel machine learning technique, called support
vector machine (SVM), has drawn much attention in the fields of
pattern classification and regression forecasting. SVM was first
introduced by Vapnik (1995). Support vector machine (SVM) is a
kind of classifier’s studying method on statistic study theory. This
algorithm derives from linear classifier, and can solve the problem
of two kind classifier, later this algorithm applies in non-linear
fields, that is to say, we can find the optimal hyperplane (large
margin) to classify the samples set. It is an approximate implemen-
tation to the structure risk minimization (SRM) principle in statis-
tical learning theory, rather than the empirical risk minimization
(ERM) method (Kwok, 1999).
Compared with traditional neural networks, SVM can use the
theory of minimizing the structure risk to avoid the problems of
excessive study, calamity data, local minimal value and so on.
For the small samples set, this algorithm can be generalized well.
Support vector machine (SVM) has been successfully used for ma-
chine learning with large and high dimensional data sets. These
attractive properties make SVM become a promising technique.
This is due to the fact that the generalization property of an SVM
does not depend on the complete training data but only a subset
thereof, the so-called support vectors. Now, SVM has been applied
in many fields as follows: handwriting recognition, three-dimen-
sion objects recognition, faces recognition, text images recognition,
voice recognition, regression analysis, and so on Carbonneau,Laframbois, and Vahidov (2008), Trontl, Smuc, and Pevec (2007)
Wohlberg, Tartakovsky, and Guadagnini (2006).
For pattern recognition and regression analysis, the non-linear
ability of SVM can use kernel mapping to achieve. For the kernel
mapping, the kernel function must satisfy the condition of Mercer
theorem. The Gauss function is a kind of kernel function which is
general used. It shows the good generalization ability. However,
for our used kernel functions so far, the SVM cannot approach
any curve in L2( R n) space (quadratic continuous integral space), be-
cause the kernel function which is used now is not the complete
orthonormal base. This character lead the SVM cannot approach
every curve in the L2( R n) space, similarly, the regression SVM can-
not approach every function.
According to the above describing, we need find a new kernelfunction, and this function can build a set of complete base through
horizontal floating and flexing. As we know, this kind of function
has already existed, and it is the wavelet functions. Based on wave-
let decomposition, this paper propose a kind of allowable support
vector’s kernel function which is named wavelet kernel function,
and we can prove that this kind of kernel function is existent.
The Morlet and Mexican wavelet kernel functions are the ortho-
normal base of L2( R n) space. Based on the wavelet analysis and
conditions of the support vector kernel function, Morlet or Mexican
wavelet kernel function for support vector regression machine
(SVM) is proposed, which is a kind of approximately orthonormal
function. This kernel function can simulate almost any curve in
quadratic continuous integral space, thus it enhances the general-
ization ability of the SVR. The papers (Khandoker, Lai, Begg, &Palaniswami, 2007; Widodo & Yang, 2008 research on wavelet
e-support vector machine. Much research indicates the perfor-mance of m-SVM is better than one of e-SVM. According to thewavelet kernel function and the regularization theory, m-supportvector machine on wavelet kernel function (Wm-SVM) is proposedin this paper.
However, the standard SVM encounters certain difficulties in
real application. Some improved SVMs have been put forward to
solve the concrete problems (Kwok, 1999). Though the standard
SVM that adopts e-insensitive loss function has good generaliza-tion capability in some applications. But it is difficult to handle
Gaussian noises and the normal distribution noise parts of series.
Therefore, this paper focuses on the modeling of a new wavelet
SVM that can penalize the Gaussian noise parts of series.
Based on the RW m-SVM, an intelligence forecasting approachfor car demand series with the nonlinear and uncertain character-
istics is proposed in this paper. Section 2 construct an intelligence
forecasting model based on a new m-support vector regression ma-chine on wavelet kernel function and robust loss function (RW m-SVM) and particle swarm optimization algorithm (PSO). Section 3
gives two algorithms to solve the intelligence forecasting problem.
Section 4 gives an application of the intelligence forecasting sys-
tem based on RW m-SVM model. Section 5 draws the conclusions.
2. Robust wavelet m -support vector machine (RW m -SVM)
2.1. Support vector machine
SVM represent a novel neural network technique, which has
gained ground in classification, forecasting and regression analysis.
One of its key properties is that training SVM is equivalent to solv-
ing a linearly constrained quadratic programming problem, whose
solution turns out to be unique and globally optimal. Therefore,
unlike other networks’ training techniques, SVM circumvent the
problem of getting stuck at local minima. Another advantage of
SVM is that the solution to the optimization problem depends only
on a subset of the training data points, which are referred to as thesupport vectors.
Let us consider a set of data points ( x1, y1), ( x2, y2), . . ., ( xl, yl),
which are independently and randomly generated from an un-
known function. Specifically, x i is a column vector of attributes, yiis a scalar, which represents the dependent variable, and l denotes
the number of data points in the training set. SVM approximate
such an unknown function by mapping x into a higher dimensional
space through a function / , and determining a linear maximum-
margin hyper-plane. In particle, the smallest distance to such a
hyperplane is called the margin of separation. The hyper-plane will
be an optimal separating hyper-plane if margin is maximized. The
data points that are located exactly the margin distance away from
the hyper-plane are denominated the support vectors.
Mathematically, SVM utilize a classifying hyper-plane of the
form f ( x) = w x + b = 0, where the coefficients w and b are esti-
mated by minimizing a regularized risk function:
1
2kwk2 þ C
Xli¼1
Leð yiÞ; ð1Þ
where kwk is denoted as the regularized term, Pl
i¼1Leð yiÞ is the
empirical error, and C > 0 is an arbitrary penalty parameter called
the regularization constant. Basically, SVM penalize f ( xi) when is de-
parts from yi by means of an e-insensitive loss function:
Leð yiÞ ¼ 0 if j f ð xiÞ yij
8/21/2019 1-s2.0-S0957417410009930-main
3/9
the margin of separation to the hyper-plane. The e-insensitive lossfunction is illuminated in Fig. 1.
The minimization of expression (1) is implemented by intro-
ducing the slack variables ni and ni . Specifically, the m-support vec-
tor regression (m-SVM) solves the following quadraticprogramming problem:
minw;nðÞ ;e;b sðw; nðÞ; eÞ ¼ 1
2 kwk2
þ C v eþ1
lXli¼1 n
i þ ni ! ð3ÞSubject to ðw xi þ bÞ yi 6 eþ ni; ð4Þ
yi ðw xi þ bÞ 6 eþ ni ; ð5Þn
ðÞi P 0; eP 0: ð6Þ
The solution to this minimization problem is of the form
f ð xÞ ¼Xli¼1
ai ai
K ð xi; xÞ þ b; ð7Þ
where ai and ai are the Lagrange multiplies associated with theconstrains (w xi + b) yi 6 e + ni and yi ðw xi þ bÞ 6 e þ n
i ,
respectively. The function K ( xi, x j) = /( xi)0/( x j) represents a kernel,
which is the inner product of the two vectors xi and x j in the space
/( xi) and /( x j).
Well-known kernel functions are K ð xi; x jÞ ¼ x0i x j (linear),
K ð xi; x jÞ ¼ c x0i x j þ r d
; c > 0 (polynomial), K ( xi, x j) = e x p (ck xi x jk
2), c > 0 (radial basis function), and K ð xi; x jÞ ¼ tanh c x0i x j þ r
(sigmoid). Theradialkernel is a popularchoice in the SVMliterature.
2.2. The conditions of wavelet support vector’s kernel function
The support vector’s kernel function can be described as not
only the product of point, such as K ( x, x0) = K ( x x0), but also the
horizontal floating function, such as K ( x, x0) = K ( x x0). In fact, if a
function satisfied condition of Mercer, it is the allowable support
vector kernel function.
Lemma 1. The symmetry function K (x,x0) is the kernel function of
SVM if and only if: for all function u – 0 which satisfied the conditionof R
Rd u2ðnÞdn < 1, we need satisfy the condition as follows:Z Z
K ð x; x0Þuð xÞuð xÞd xd x0 P 0: ð8Þ
This theorem proposed a simple method to build kernel function.
For the horizontal floating function, because hardly dividing
this function into two same functions, we can give the condition
of horizontal floating kernel function.
Lemma 2. The horizontal floating function is allowable support
vector’s kernel function if and only if the Fourier transform of K (x)
need satisfy the condition follows:
F ½ xðxÞ ¼ ð2pÞn=2Z
Rnexpð jðx: xÞÞK ð xÞdx P 0: ð9Þ
If the wavelet function w(x) satisfied the conditions: w(x) 2L 2(R) \ L1(R), and ŵð xÞ ¼ 0; bw is the Fourier transform of functionw(x). The wavelet function group can be defined as:wa;mð xÞ ¼ ðaÞ
12w
x ma
; ð10Þ
where a is the so-called scaling parameter, m is the horizontal floating
coefficient, and w(x) is called the ‘‘mother wavelet ”. The parameter of translation m 2 R and dilation a > 0, may be continuous or discrete.
For the function f(x), f(x) 2 L 2(R), The wavelet transform f(x) can be de-
fined as:
W ða; mÞ ¼ ðaÞ12Z þ1
1 f ð xÞw x m
a
dx; ð11Þ
where w⁄(x) stands for the complex conjugation of w(x).
The wavelet transform W (a, m) can be considered as functions
of translation m with each scale a . Eq. (11) indicates the waveletanalysis is a time–frequency analysis, or a time-scaled analysis.
Different from the Short Time Fourier Transform, the wavelet
transform can be used for multi-scale analysis of a signal through
dilation and translation so it can extract time–frequency features
of a signal effectively.
Wavelet transform is also reversible, which provides the possi-
bility to reconstruct the original signal. A classical inversion for-
mula for f ( x) is:
f ð xÞ ¼ C 1wZ þ1
1
Z þ11
W ða; mÞwa;mð xÞda
a2 dm; ð12Þ
where
C w ¼ Z 1
1 jŵðw
Þj2
jwj dw
8/21/2019 1-s2.0-S0957417410009930-main
4/9
We can build the horizontal floating kernel function as follows:
K ð x; x0Þ ¼Ydi¼1
w xi x0i
ai
; ð16Þ
where ai is the scaling parameter of wavelet, ai > 0. So far, because
the wavelet kernel function must satisfy the conditions of Theorem
2, the number of wavelet kernel function which can be showed by
existent functions is few. Now, we give an existent wavelet kernelfunction: Morlet wavelet kernel function, and we can prove that
this function can satisfy the condition of allowable support vector’s
kernel function. Morlet wavelet function is defined as follows:
wð xÞ ¼ cosðx0 xÞexp x2
2 : ð17Þ
Theorem 1. Morlet wavelet kernel function is defined as:
K ð x; x0Þ ¼Yni¼1
cos x0 xi x0i
a
exp xi x
0i
22a2
! ð18Þ
and this kernel function is an allowable support vector kernel function.
Proof. According to the Lemma 2, we only need to prove
F ½ xðxÞ ¼ ð2pÞn=2Z
Rnexpð jðx: xÞÞK ð xÞd x P 0; ð19Þ
where K ð xÞ ¼Qn
i¼1w xi
a
¼
Qni¼1 cos
w0 xia
expðk xik
2=2a2Þ, j denote imag-
inary number unit. We haveZ Rn
expð jx xÞK ð xÞd x
¼Z
Rnexpð jx xÞ
Yni¼1
cos w0 xi
a
exp k xik
2
2a2
! !d x
¼Yni¼1
Z 11
expð jxibixiÞ expð jw0 xi=aÞþexpð jw0 xi=aÞ2
exp k xik2
2a2 !
d xi ¼Yni¼1
12
Z 11
exp k xik2
2 þ w0 j
a jxia
xi
!
þexp k xik2
2 w0 j
a þ jxia
xi
!!
¼Yni¼1
jaj ffiffiffiffiffiffiffi
2pp
2 exp ðw0 xiaÞ
2
2
!þexp ðw0 þxiaÞ
2
2
! !:
ð20ÞSubstituting formula (20) into Eq. (19), we can obtain Eq. (21).
F ½ X ðxÞ ¼Yni¼1
jaj2
exp ðw0 xiaÞ
2
2
!þexp ðw0 þxiaÞ
2
2
! !;
ð21
Þwhere a – 0, we have
F ½ xðxÞ P 0: ð22Þ
If we use wavelet kernel function as the support vector’s kernel
function,the regression estimation equationof Wm-SVM is defined as:
f ð xÞ ¼Xli¼1
ai ai Yl
i¼1w
x xia
þ b: ð23Þ
For wavelet analysis and theory, see (Krantz, 1994; Liu & Di,
1992). h
2.3. Robust loss function
However, for standard wavelet m-SVM, it is difficult to deal withthe hybrid noise of time series. To solve the shortage of e-insensi-tive loss of standard wavelet m-SVM, a new hybrid function com-posed of Gaussian function, Laplace function and e-insensitiveloss function is constructed as the loss function of m-SVM, whichis called robust loss function. Then robust loss function can be de-
fined as follows:
LðnÞ ¼0 jnj 6 e12ðjnj eÞ2 e el;
8><>:
ð24Þ
where el = e + l, n are slack variable.The middle part of robust loss function curve is replaced by error
quadratic curve, which is used to inhibit (penalize) the type of noise
with the feature of Gaussian distribution. The linear part is generally
used to inhibit (penalize) singularity points and biggish magnitude
noises of time series. The curve of robust loss function, which is di-
vided into three parts, is illustrated in Fig. 2. The proposed robust
loss function integrates the advantage of Gaussian loss function, La-
place loss function ande-insensitive loss function and makes supportvector machine better robustness and good generalization ability.
2.4. Robust wavelet m-support vector machine
Integrating the wavelet kernel function, robust loss functionand m-support vector machine, a robust wavelet support vectormachine is proposed in this part. The parameter b is taken into ac-
count confidence interval of RW m-SVM, then the new optimalproblem be reformulated as
minw;nðÞ ;e;b
1
2ðkwk2 þb2ÞþC m eþ
Xi2I 1
1
2 n2i þn2i þ1
l
Xi2I 2
l ni þni !
ð25ÞSubjectto ðw xi þbÞ yi 6 eþni; ð26Þ
yi ðw xi þ bÞ6 eþni ; ð27Þn
ðÞi P0; eP0: ð28Þ
Problem (25) is a quadratic programming (QP) problem. By
introducing Lagrangian multipliers, a Lagrangian function can be
defined as follows.
L w; b;aðÞ; b; nðÞ; e;gðÞ ¼ 1
2kwk2 þ 1
2b
2 þ C meþ C Xi2I 1
12 n
2i þ n2i
þ C l
Xi2I 2
l ni þ ni
beXi2I 2
gini þgi ni
Xli¼1
ai eþ ni þ w xi þ b yið Þ
Xl
i¼1 ai eþ ni w xi b þ yi
; ð29Þε − ε +
µ
e
( )e L
µ ε
Fig. 2. Robust loss function.
4854 Q. Wu, R. Law/ Expert Systems with Applications 38 (2011) 4851–4859
http://-/?-http://-/?-http://-/?-http://-/?-
8/21/2019 1-s2.0-S0957417410009930-main
5/9
where aðÞi ;gðÞi ; bP 0 are Lagrangian multipliers. Differentiating the
Lagrangian function (29) with regard to w, b, e, n(⁄), we have
@ L
@ w ¼ 0 ) w ¼
Xli¼1
ai ai
xi; ð30Þ
@ L
@ b ¼ 0 )X
l
i¼
1
ai ai ¼ b; ð31Þ@ L
@ e ¼ 0 ) b ¼ C m
Xli¼1
ai þ ai
; ð32Þ
@ L
@ nðÞ ¼ 0 ) gðÞi ¼ C l=l aðÞi : ð33Þ
By substituting (30)–(33) into (29), we can obtain the corre-
sponding dual form of function (25) as follows:
mina;a2Rl
1
2
Xli¼1
Xl j¼1
ai ai
a j a j
ðK ð xi; x jÞ þ 1Þ Xli¼1
y i ai ai
þ 12C
Xli¼1
a2i þ a2i
s:t: eT
ða þ aÞ 6 C m;0 6 ai; a
i 6 min C m;
C
l l
: ð34Þ
Formula (34) is represented by means of matrix form, we have
mina;a2 R l
1
2 aT ; ðaÞT h i Q þ E
C Q
Q Q þ E C
" # a
a
þ y T ; y T a
a
s:t: eT ða þ aÞ 6 C v ;
0 6 ai; ai 6 min C m;
C
l l
;
ð35Þ
where Q ij = k( xi, x j) + 1 , e = [1, . . ., 1]T . a and a⁄ are lagrangian multi-
pliers, which are nonnegative number.
Transform Eq. (35) into compact formulation as follows:
min 1
2aT H aþ ya
s:t: eT ða þ aÞ 6 C v ;
0 6 ai;ai 6 min C m;
C
l l
;
ð36Þ
where a ¼ a
a
; H ¼
Q þ E =C Q Q Q þ E =C
; y ¼
y y
.
The output regression function of RW m-SVM is as follows:
f ð xÞ ¼Xli¼1
ai ai Yl
i¼1w
x xia
þ 1
!: ð37Þ
It is obvious that RW m-SVM (whose constraint conditions areless than those of the standardWm-SVMby one) has a more concisedual problem. There is no parameter b in the estimation function
Eq. (37), which reduces the complexity of the model.
2.5. The optimization algorithm for the unknown parameters of the
RW m-SVM model
The confirmation of unknown parameters of the RW m-SVM iscomplicated process. In fact, it is a multivariable optimization
problem in a continuous space. The appropriate parameter combi-
nation of models can enhance approximating degree of the original
series Therefore, it is necessary to select an intelligence algorithm
to get the optimal parameters of the proposed models. The param-
eters of RWm-SVM have a great effect on the generalization perfor-mance of RW m-SVM. An appropriate parameter combination
corresponds to a high generalization performance of the RW m-SVM. PSO algorithm is considered as an excellent technique to
solve the combinatorial optimization problems (Krusienski, 2006;
Yamaguchi, 2007). The PSO algorithm, introduced by Kenedy &
Eberhart (1995), is used to determine the parameter combination
of RW m-SVM.Similarly to evolutionary computation techniques, PSO uses a
set of particles, representing potential solutions to the problemun-
der consideration. The swarm consists of m particles; each particle
has a position X i = { xi1, xi2, . . ., xim}, and a velocity V i = {v i1,v i2, . . .,v im},
and moves through a n-dimensional search space. According to the
global variant of the PSO algorithm, each particle moves towards
its best previous position and towards the best particle g in the
swarm. Let us denote the best previously visited position of the
ith particle that gives the best fitness value as p_c i = { p_c i1, p_-
c i2, . . ., p_c im}, and the best previously visited position of the swarm
that gives best fitness as p_ g = { p_ g 1, p_ g 2, . . ., p_ g n}.
The change of position of each particle from one iteration to an-
other can be computed according to the distance between the cur-
rent position and its previous best position and the distance
between the current position and the best position of swarm. Then
the updating of velocity and particle position can be obtained by
using the following equations:
v kþ1id ¼ wv kij þ c 1r 1 p c ij xkij
þ c 2r 2 p g j xkij
; ð38Þ
xkþ1ij ¼ xkij þv kþ1ij ; ð39Þ
where w is called inertia weight and is employed to control theimpact
of theprevioushistoryof velocitiesonthecurrent one. Accordingly, the
parameter w regulates the trade-off between the global and local
explorationabilities of theswarm. A large inertia weightfacilitatesglo-
balexploration,whilea smallonetendsto facilitate local exploration. A
suitable value of the inertia weight w usually provides balance be-
tween global and local exploration abilities and consequently results
in a reduction of the number of iterations required to locate the opti-
mum solution. k = 1, 2,. . ., K max denotes the iteration number, c 1 is
the cognition learning factor, c 2 is the social learning factor and r 1and r 2 are randomnumbers uniformly distributed in [0,1].
Thus, the particle flies through potential solutions towards P kiand pg k in a navigated way while still exploring new areas by the
stochastic mechanism to escape from local optima. Since there
was no actual mechanism for controlling the velocity of a particle,
it was necessary to impose a maximum value V max on it. If the
velocity exceeds the threshold, it is set equal to V max, which con-
trols the maximum travel distance at each iteration to avoid this
particle flying past good solutions.
2.6. The intelligence forecasting system
In the forecasting technique of product demand series, two of
the key problems are how to deal with noise and nonstationarity.A potential solution to the above two problem is to use a mixture
of experts (ME) architecture illuminated by Fig. 3. ME architecture
is generalized into a two-stage architecture to handle the non-sta-
tionary in the data. In the first of the two-stage architecture, a mix-
ture of experts including evolutionary algorithm, partial least
squares, k-nearest neighbors are competed to optimize the model
in the second part of the two-stage architecture. To valuate the
model forecasting capacity of the second stage, the fitness function
of ME architecture is designed as follows:
fitness ¼ 1l
Xli¼1
yi yi yi
2; ð40Þ
where l is thesize of theselected sample, yi denote the forecastingva-lue of the selected sample, yi is original date of the selected sample.
Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 4851–4859 4855
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-
8/21/2019 1-s2.0-S0957417410009930-main
6/9
3. Intelligent forecasting method based on RW m -SVM and PSO
ME architecture is an intelligence forecasting system that can
handle the noise and nonstationarity of time series and construct
the nonlinearity relation in high dimension space effectively.
According to the above idea, Particle swarm optimization algo-
rithm can be described as following:
Algorithm 1
Step (1) Data preparation: Training and testing sets are repre-
sented as Tr and Te, respectively.
Step (2) Particle initialization and PSO parameters setting: Gener-
ate initial particles. Set the PSO parameters including
number of particles (n), particle dimension (m), number
of maximal iterations (kmax), error limitation of the fit-
ness function, velocity limitation (V max), and inertia
weight for particle velocity (w). Set iterative variable:
k = 0. And perform the training process from Steps 3–7.
Step (3) Set iterative variable: k = k + 1.
Step (4) Compute the fitness function value of each particle. Take
current particle as individual extremum point of everyparticle and do the particle with minimal fitness value
as the global extremum point.
Step (5) Stop condition checking: if stopping criteria (maximum
iterations predefined or the error accuracy of the fitness
function) are met, go to Step 7. Otherwise, go to the next
step.
Step (6) Update the particle position by formula (38) and (39) and
form new particle swarms, go to Step 3.
Step (7) End the training procedure, output the optimal parame-
ters (C ,v , a).
On the basis of the RW m-SVM model, we can summarize a de-mand forecasting algorithm as the follows.
Algorithm 2
Step (1) Initialize the original data by normalization and fuzzifi-
cation, and then, form training and testing set.
Step (2) Deal the demand series with wavelet transform on thedifferent scale and select the best wavelet function K
and scale scope ai that can match the original series well.
Step (3) Compute the wavelet kernel function by (16). Construct
the QP problem (34) of the RW m-SVM.Step (4) Go to Algorithm 1, and get the optimal parameters com-
bination vector (C ,v , a), solve the optimization problem
(36) and obtain the parameters a(⁄).
Step (5) For a new demand task, extract product characteristics
and form a set of input variables x.
Step (6) Compute the forecasting result f ( x) by (31).
4. Experiments
To illustrate the proposed intelligence forecasting method, theforecast of car demand series is studied. The car is a type of con-
sumption product influenced by macroeconomic in manufacturing
system and its demand action is usually driven by many uncertain
factors. Some factors with large influencing weights are gathered
to develop a factor list, as shown in Table 1. The first four factors
are expressed as linguistic information and the last two factors
are expressed as numerical data.
In our experiments, car demand series are selected from pastde-
mand record in a typical company. The detailed characteristic
data and demand series of these cars compose the corresponding
Output the optimal combinational paramters
Accuracy check
Output the current
combinational parameters
v
Fig. 3. The intelligence forecasting system based on RW m-SVM and PSO.
4856 Q. Wu, R. Law/ Expert Systems with Applications 38 (2011) 4851–4859
http://-/?-http://-/?-
8/21/2019 1-s2.0-S0957417410009930-main
7/9
training and testing sample sets. During the process of the car scale
series forecasting, six influencing factors, viz., brand famous degree
(BF), performance parameter (PP), form beauty (FB),demands
experience (SE), dweller deposit (nd) and oil price (n p), are taken
into account the first four influencing factors are linguistic infor-
mation, the latest two factors are numerical information. All
linguistic information of gotten influencing factors is dealt with
fuzzy logic and form numerical information.
The proposed forecasting model has been implemented in Mat-
lab 7.1 programming language. The experiments are made on a
1.80 GHz Core (TM)2 CPU personal computer (PC) with 1.0G mem-
ory under Microsoft Windows xp professional. Some criteria, such
as mean absolute error (MAE), mean absolute percentage error
(MAPE) and mean square error (MSE), are adopted to evaluate
Table 1
Influencing factors of car demand forecast.
Product characteristics Unit Expression Weight
Brand famous degree (BF) Dimensionless Linguistic
information
0.9
Performance parameter
(PP)
Dimensionless Linguistic
information
0.8
Form beauty (FB) Dimensionless Linguistic
information
0.8
Sales experience (SE) Dimensionless Linguistic
information
0.5
Dweller deposit (DD) Dimensionless Numerical
information
0.8
Oil price (OP) Dimensionless Numerical
information
0.4
Fig. 4. Mexican wavelet transform ofdemands time series in the scope of different scale.
Fig. 5. Morlet wavelet transform of demands time series in the scope of different scale.
Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 4851–4859 4857
8/21/2019 1-s2.0-S0957417410009930-main
8/9
the performance of the intelligence forecasting system. The initial
parameters of the intelligence forecasting system are given as fol-
lows: inertia weight w0 = 0.9; positive acceleration constants c 1,
c 2 = 2; l = 1; the fitness accuracy of the normalized samples isequal to 0.0005.
The wavelettransform of the originalscale series on the different
scales is got by means of the Steps 1 and 2 of Algorithm 2. The se-
lected waveletfunctions consist of morlet,haar, mexican andGauss-
ianwavelet.To reducethe length of thispaper, onlyrepresentational
morlet and mexican wavelet transforms on the different scales are
given in Figs. 4 and 5. Mexicanwavelet transform is the bestwavelet
transform that can inosculate the original demand series on the
scope of scale from 0.01 to 2 among all given wavelet transforms.
Therefore, Mexican wavelet can be ascertained as a kernel func-
tion of RW m-SVM model, three parameters also are determined asfollows:
v 2 ½0; 1; a 2 ½0:001; 2 and
C 2 maxð xi; jÞ minð xi; jÞl
103; maxð xi: jÞ minð xi; jÞl
103
:
The optimal combinational parameters are obtained by Algo-
rithm 1, viz., C = 525.57, v = 0.82 and a = 0.27. Fig. 6 illuminated
the forecasting result of the original car demand series given byAlgorithm 2.
To analyze the forecasting capability of RW m-SVM model, themodels (wavelet m-support vector machine with Gaussian lossfunction (W g -SVM) and wavelet m-support vector machine (Wm-
SVM)) train the original demand series respectively, then give thelast 12 months forecasting results of each model shown in Table 2
(the last 12 months sample for testing sample). The linear inertia
weight of standard PSO is adopted:
w ¼ wmax wmax wminkmax
k ð41Þ
where wmax = 0.9 is the maximal inertia weight, wmin = 0.1 is the
minimal inertia weight, k is iterative number of controlling proce-
dure process.
To evaluate the forecasting error of these models, the compari-
son among different forecasting approaches is shown in Table 3.
The Table 3 shows the error index distribution by means of
dealt with four different models. The index (MAE, MAE and MSE)
of W g -SVM model is better than that of Wm-SVM model. Theindexes of RW m-SVM are better than these of Wm-SVM andW g -SVM. It is obvious that robust loss function can improve the
generalization ability of support vector machine.
Experiment results show that the regression’s precision of RW
m-SVM is improved due to adopting wavelet kernel and robust lossfunction, compared with the models (W g -SVM and Wm-SVM) andm-SVM whose kernel function is Gauss function under the sameconditions.
5. Conclusion
In this paper, a new version of WSVR, named RW m-SVM, isproposed to setup the nonlinear system of product demand series
by the integration of wavelet theory, robust loss function andm-SVM. The new forecasting model based on RW m-SVM and PSO,
Fig. 6. The car demands forecasting results from RW m-SVM model.
Table 2
Comparison of forecasting results from four different models.
Model 1 2 3 4 5 6 7 8 9 10 11 12
Real value 2967 3268 3300 1891 3489 3544 2708 1513 3411 3672 3483 1523
m-SVM 2971 3240 3269 1964 3439 3448 2754 1661 3489 3587 3433 1669Wm-SVM 2953 3257 3286 1981 3456 3465 2736 1678 3472 3605 3451 1687W g -SVM 2962 3258 3286 1936 3511 3465 2726 1632 3464 3605 3450 1641
RW m-SVM 2967 3257 3286 1922 3519 3472 2734 1611 3467 3610 3453 1620
Table 3
Error statistic of four forecasting models.
Model MAE MAPE MSE
m-SVM 69.5833 0.0309 6662Wm-SVM 63.1667 0.0303 6673W g -SVM 48.5833 0.0223 3822
RW m-SVM 43.9167 0.0194 2910
4858 Q. Wu, R. Law/ Expert Systems with Applications 38 (2011) 4851–4859
8/21/2019 1-s2.0-S0957417410009930-main
9/9
named PSO RW m-SVM, is presented to approximate arbitrarydemand curve in L2 space. The simulation results indicate RW
m-SVM can provide better forecasting precision of the productdemand series.
The performance of the RW m-SVM is evaluated using the dataof car demand, and the simulation results demonstrate that RW
m-SVM is effective in dealing with uncertain data and hybridnoises. Moreover, it is shown that particle swarm optimization
algorithm presented here is available for the RW m-SVM to seekthe optimal parameters.
Compared to Wm-SVMand W g -SVM, RWm-SVM has the best in-dexes (MAE, MAPE and MSE). RW m-SVM can overcomes the ‘‘curseof dimensionality” and has some other attractive properties, such
as the strong learning capability for small samples, the good gener-
alization performance for hybrid noises, the insensitivity to noise
or outliers and the automatic select of optimal parameters. More-
over, the wavelet transform can reduce noises in data while pre-
serve the detail or resolution of the data. Therefore, in the
process of establishing the forecasting models, much uncertain
information of scale data is not neglected but considered wholly
into the wavelet kernel function. The forecasting accuracy is im-
proved by means of adopting wavelet technique.
Acknowledgements
This research was partly supported by the National Natural Sci-
ence Foundation of China under Grant 60904043, a research grant
funded by the Hong Kong Polytechnic University, China Postdoc-
toral Science Foundation (20090451152), Jiangsu Planned Projects
for Postdoctoral Research Funds (0901023C) and Southeast Univer-
sity Planned Projects for Postdoctoral Research Funds.
References
Bell, T., Ribar, G., & Verchio, J. (1989). Neural nets vs logistic regression. In Presented
at the University of Southern California expert system symposium (Nov.) .Box, G. E. P., & Jenkins, G. M. (1994). Time series analysis: Forecasting and control (3rd
ed.). Englewood Cliffs, NJ: Prentice- Hall, Inc..
Carbonneau, R., Laframbois, K., & Vahidov, R. (2008). Application of machine
learning techniques for supply chain demand forecasting. European Journal of Operational Research, 184(3), 1140–1154.
Duliba, K. (1991). Contrasting neural nets with regression in predicting
performance. In Proceedings of the 24th international conference on systemscience, Hawaii (Vol. 4, pp. 163–170).
Engle, R. F. (1984). Combining competing forecasts of inflation using a bivariate
ARCH model. Journal of Economic Dynamics and Control, 18(2), 151–165.Gorr, W. L. (1994). Research prospective on neural forecasting. International Journal
of Forecasting, 10(1), 1–4.Hill, T., Connor, M. O., & Remus, W. (1996). Neural network models for time series
forecasts. Management Science, 42(7), 1082–1092.Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feed forward networks
are universal approximators. Neural Networks, 2(5), 359–366.Kenedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the
IEEE international conference on neural networks (pp. 1942–1948).Khandoker, A. H., Lai, D. T. H., Begg, R. K., & Palaniswami, M. (2007). Wavelet-based
feature extraction for support vector machines for screening balance
impairments in the elderly. IEEE Transactions on Neural Systems andRehabilitation Engineering, 15(4), 587–597.
Krantz, S. G. (1994). Wavelet: Mathematics and application. Boca Raton, FL: CRC.Krusienski, D. J. (2006). A modified particle swarm optimization algorithm for
adaptive filtering. In IEEE international symposium on circuits and systems, Kos,Greece (pp. 137–140).
Kwok, J. T. (1999). Moderating the outputs of support vector machine classifiers.
IEEE Transactions on Neural Networks, 10(5), 1018–1031.Liu, G. Z., & Di, S. L. (1992). Wavelet analysis and application. Xi’an, China: Xidian
Univ. Press.
Roy, J., & Cosset, J. (1990). Forecasting country risk ratings using a neural network.
In Proceedings of the 23rd international conference on system science, Hawaii (Vol.4, pp. 327–334).
Tang, Z., Almedia, C., & Fishwick, P. A. (1991). Time series forecasting using neural
networks vs. Box–Jenkins methodology. Simulation, 57 (5), 303–310.Tang, Z., & Fishwick, P. A. (1993). Feedforward neural nets as models for time series
forecasting. ORSA Journal of Computing, 5(4), 374–385.Tong, H. (1983). Threshold models in non-linear time series analysis. New York:
Springer-Verlag.
Trontl, K., Smuc, T., & Pevec, D. (2007). Support vector regression model for the
estimation of c-ray buildup factors for multi-layer shields. Annals of Nuclear Energy, 34(12), 939–952.
Tuan, D. P., & Lanh, T. T. (1981). On the first-order bilinear time series model. Journalof Applied Probability, 18(3), 617–627.
Vapnik, V. (1995). The nature of statistical learning . New York: Springer.Widodo, A., & Yang, B. S. (2008). Wavelet support vector machine for induction
machine fault diagnosis based on transient current signal. Expert Systems with Applications, 35(1–2), 307–316.
Wohlberg, B., Tartakovsky, D. M., & Guadagnini, A. (2006). Subsurface
characterization with support vector machines. IEEE Transactions onGeoscience and Remote Sensing, 44(1), 47–57.
Yamaguchi, T. (2007). Adaptive particle swarm optimization-Self-coordinating
mechanism with updating information. In IEEE international conference onsystems, man and cybernetics, Taipei, Taiwan (pp. 3: 2303–2308).
Zhang, G. P. (2001). An investigation of neural networks for linear time-series
forecasting. Computers and Operations Research, 28(12), 1183–1202.Zhang, G., Patuwo, E. B., & Hu, M. Y. (1998). Forecasting with artificial neural
network: The state of the art. International Journal of Forecasting, 14(1), 35–62.
Q. Wu, R. Law / Expert Systems with Applications 38 (2011) 4851–4859 4859