A Novel Revenue Development and Forecasting Model using ...ijssst.info/Vol-17/No-48/paper39.pdf · JUI-CHAN HUANG et al: A NOVEL REVENUE DEVELOPMENT AND FORECASTING MODEL USING …

JUI-CHAN HUANG et al: A NOVEL REVENUE DEVELOPMENT AND FORECASTING MODEL USING …

DOI 10.5013/IJSSST.a.17.48.39 39.1 ISSN: 1473-804x online, 1473-8031 print

A Novel Revenue Development and Forecasting Model using Machine Learning

Approaches for Cosmetics Enterprises

Jui-Chan Huang1

Yango College Fuzhou, 350015

China

Ming-Hung Shu2

Department of Industrial & Mechanical Engineering National Kaohsiung University of Applied Sciences

Kaohsiung 80087, Taiwan

Bi-Min Hsu3

Department of Industrial Engineering & Management Cheng Shiu University

Kaohsiung 83343, Taiwan

Tzu-Jung Wu4

Institute of Human Resource Management National Sun Yat-Sen University

Kaohsiung 80424, Taiwan Abstract - In the contemporary information society, constructing an effective sales prediction model is challenging due to the sizeable amount of purchasing information obtained from diverse consumer preferences. Many empirical cases shown in the existing literature argue that the traditional forecasting methods, such as the index of smoothness, moving average, and time series, have lost their dominance of prediction accuracy when they are compared with modern forecasting approaches such as neural network (NN) and support vector machine (SVM) models. To verify these findings, this paper utilizes the Taiwanese cosmetic sales data to examine three forecasting models: i) the back propagation neural network (BPNN), ii) least-square support vector machine (LSSVM), and iii) auto regressive model (AR). The result concludes that the LS-SVM has the smallest mean absolute percent error

(MAPE) and largest Pearson correlation coefficient (2R ) between model and predicted values.

Keyword - Machine Learning, Back Propagation Neural Network, Least Square Support Vector Machine, Autoregressive Model, Model Performances Comparing.

I. INTRODUCTION

A. Research Motives and Background

Machine learning plays an indispensable role in data mining and provides lots of methods for forecasting; thus the selection of a proper method has also become a hot topic in forecast field in recent years.

Yeh (1998) probed into the high performance concrete strength forecasting with the combination of neural network and regression forecasting. Ghaffari et al. (2006) suggested that the back propagation neural network based on genetic algorithm and the error measurement method are combined in drug delivery for experimental design and performance comparison. Finally, they used the Response Surface Method (RSM) to discuss the error of model. Pani and Mohanta (2015) applied the Least Square Support Vector Regression (LS-SVR) and back propagation neural network (BPNN) into the research of the monitoring of cutting manufacturing process. They found that LS-SVR has the least Mean Absolute Error (MAE). Based on the argument of model performance evaluation supported by above-mentioned literature, this paper explores the Back Propagation Neural Network (BPNN), Least Square Support Vector Machine (LS-SVM) and Auto Regression (AR) in Autoregressive Integrated Moving Average Model (ARIMA), and compares the model error and model performance. It then determines

the one that is proper for the cosmetics sales forecasting. B. Research Purposes

This research is aimed to discuss the two machine learning forecasting approaches and auto regression forecasting approach adopted in this paper for the comparison of model error and performance, so as to further find out the forecasting model most appropriate for cosmetics sales forecasting. 1) Forecast results of BPNN, LS-SVM and AR model.

2) Comparison results of model error and performance.

II. LITERATURE REVIEW

A. Literature Review of Machine Learning Machine Learning technology can date back to 1980s; since the machine learning seminar held by American Carnegie Mellon University, the research into machine learning has developed rapidly, and human has accumulated lots of data sets in work to test algorithm, so as to test the performance of algorithm through data sets.



1) Definition of Machine Learning

TABLE 2.1 DEFINITION TABLE OF MACHINE LEARNING Scholar DefinitionLangley (1995)

Machine learning is an artificial science, and the main research subject in this field is artificial intelligence; in particular, it is an algorithm to improve concrete performance through learning experience.

Mitchell (1997)

The machine learning is the research into the auto improved calculator algorithm through experience.

Alpaydin (2004)

Machine learning is the calculator programming with the use of sample data or past experience to optimize performance standards.

Sewell (2006) Machine learning is an artificial intelligence field focusing on the research into auto improved calculator algorithm through experience. Generally speaking, it involves the procedure to optimize performance standards through data analysis.

Lantz (2013) The development process of machine learning is to enable calculator to become perceptive and learn how to instruct oneself.

Source: Langley (1995), Mitchell (1997), Alpaydin (2004), Sewell (2006), Lantz (2013).

2) 2.1.2 Methods and Types of Machine Learning In the machine learning, there are many different algorithm methods, including Classification, Cluster and Forecasting.

TABLE2.2 TABLE OF METHOD, TYPE AND USE OF MACHINE LEARNING Method Use

k-Nearest Neighbor Classification

Naïve Bayes Classification

Decision Tree Classification

Regression Methods Classification

Neural Network Classification

Support Vector Machine Classification

Hierarchical Clustering Cluster

k-Means Cluster

Fuzzy c-Means Cluster

Expectation-Maximization Cluster

Self-Organizing Maps Cluster

Source: Lanz (2013) B. Literature Review of Neural Network Model McCulloch and Pitts (1943) are the first to propose he concept of Neural Network (NN). Neural network refers to imitating the processing system of biological neural network with the use of computer, and neural network is a kind of computing system and has the ability to imitate biological neural network with the use of many simple connected artificial neurons. 1) 2.2.1 Types of Neural Network

According to the research by Fukuda and Shibata (1992), the two structural patterns of neural network are mentioned, and they are respectively Feedback Network and Feedforward Network, as shown in Figure 2.1 and Figure 2.2.

TABLE2.3 TABLE OF CLASSIFICATION AND TYPE OF NEURAL NETWORK Classification By Learning Strategy Classification By Network

Structure

Supervised Learning Network Unsupervised Learning Network

HybridLearning Network Associate Learning Network

Opimization Learning Network

Feedforward Network Feedback Network

Source: Compiled by this study

Fig. 2.1 Feedback Network Architecture Diagram. Source: Fukuda and Shibata (1992).

Figure2.1 is a Feedback Network Architecture with

several nodes, and the artificial neurons in this network are connected with each other; as for each neuron, its output is connected with all the other neurons and its

input is from the output of all the other neurons. The 1V, 2V , , 1NV , NV in the diagram represent the state of

neuron; 1X , 2X , , 1NX , NX represent the input

value of node, indicating the initial state of neuron; 1'X,

2'X , , 1'NX , 'NX is the output end of node, indicating the output value after convergence.



Fig. 2.2. Feedforward Network Architecture Diagram Source: Fukuda and Shibata (1992).

Figure 2.2 shows a Feedforward Network Architecture

on the basis of Error Back Propagation algorithm; it can be seen from the diagram that there are respectively two nodes in input end and output end, and this algorithm means that the error calculated in output end returns to input end or other nodes in the hidden layer through back propagation.

2) 2.2.2 Back Propagation Neural Network Model

Werbos (1990) combined error back propagation

method and artificial neural network to further propose the Back Propagation Neural Network. This network has effectively solved the problem of rapidly training multi-layer neural network. Fang and Fang (2013) illustrated the traditional BPNN structure (see Figure 2.3) and its equation, as shown in Eqs. (2.1) and (2.2):

( )j jY f net Eq. (2.1)

j ij i jnet W X Eq. (2.2)

where jY is output variable, and it is the output

signal of neuron; f is the weighted product sums of the

input value in other processing units; ijW is the

connection weight, indicating the strength of neuron; iX

is input variable and it is the input signal of neuron; j is the threshold limit value of neuron.

Fig. 2.3 BPNN Structure Diagram Source: Fang and Fang (2013).

Figure 2.3 shows the BPNN foundation structure.

Suppose that the node in input layer is ix , the node in

hidden layer is jy, the node in output layer is ko , the

connection weight between input layer and hidden layer is ijW , the connection weight between output layer and

hidden layer is jkT , thus the final output node in hidden layer equals error expectation value in output layer. Besides, the calculation functions for each output node are shown in Eqs. (2.3) to (2.5):

The output node in hidden layer:

j ij i j jj

y f W x f net

Eq. (2.3)

In Eq. (2.3), j ij i j

i

net W x

a) The output node in output layer:

k jk j j kk

O f T y f net

Eq. (2.4)

In Eq. (2.4), k jk j k

i

net T y

b) The output node of error:

2

2

k k k jk kk k j

E T o T f T

2

k jk ij i j kk i i

T f T f W x

Eq. (2.5)

where jy is the node in hidden layer, ijW is the



connection weight between input layer and hidden layer,

ix is the node in input layer, is error, jkT is the

connection weight between output layer and hidden

layer, ko is the node of output layer, thus the output node of error equals output expectation value E.

C. Literature Review of Support Vector Machine Model

Vapnik and Cortes (1995) proposed the concept of

Support Vector Machine (SVM) which is often used in the scope of data classification.

1) 2.3.1 Definition of Support Vector Machine Model

Lantz (2013) pointed out that SVM uses hyperplane as

linear boundary to distinguish the similar points in data and often indicate the classification point. Hyperplane can be two-dimensional plane, three-dimensional plane or multi-dimensional plane. In consideration of the factors which are difficult to be presented in vision in high dimensional graphs, so the diagram indicating SVM linear model only displays the two-dimensional and three-dimensional plane, as shown in Figure 2.4. Two-dimensional Plane Three-dimensional Space

Fig. 2.4 Space Diagrams of Linear Vector Machine

Source: Lantz (2013). Figure 2.4 shows two SVM space diagrams on the basis of linear model. The left one is the two-dimensional diagram where the data are first distinguished and then classified into two groups (square and circle); the right one is the three-dimensional diagram where the data are classified into two groups with the method of three dimensions. If the non-linear function is considered, the non-linear model must be adopted. In order to solve non-linear SVM problems, it is necessary to use Slack Variable, so that some points dropping in incorrect classification directions are allowed. In the diagram, there are two points dropping in the wrong classification directions as well as non-linear model of corresponding Slack Variable, as shown in Figure 2.5.

Fig.2.5 Space Diagram of Non-linear Support Vector Machine

Source: Lantz (2013)

2) 2.3.2 Method of Least Square Ghaedi et al. (2016) applied the least-square support

vector regression and linear multiple regression in the research of chemical structure modeling. The calculation process of support vector regression is based on the least square. The calculation is as shown in Eqs. (2.6)~ (2.7):

( ) ( )Ty x w x b Eq. (2.6)

where ( )x is from the non-linear mapping function from raw feature space to high dimensional feature space, while y is the value obtained by input data x, w is coefficient vector, and b is the bias term of minimized upper limit achieved through standard error. Based on the principle of normalized risk minimization, the regression function of least-square support vector can be defined as the following equation:

2

1

1min ( )

2

lT

i

w w

Eq. (2.7)

where is used to control the least estimated error and the penalty parameter of function smoothness, and non-negative value variable n is the Slack Variable.

3) 2.3.3 Least Square Support Vector Machine Model

With the combination of the method of least square

and support vector machine, Suykens and Vandewalle (1999) further proposed the Least Square Support Vector Machine (LS-SVM). Compared with the traditional SVM, in many literatures where forecast error is compared with the use of various SVM methods, the new LS-SVM indicates that most model performances forecasted with LS-SVM seem to be better than that of traditional SVM. It is more convenient for LS-SVM in the forecasting of



non-linear regression. In the research into the circuit breaker maintenance and scheduler optimization with the use of LS-SVM in transformer substation conducted by Liu and Tang (2015), LS-SVM regression is simply defined, i.e. LS-SVM refers to mapping the input data to a high dimensional feature space and constructing a linear regression function. Therefore, LS-SVM can also be applied into the forecasting of time series data. Cheng and Lu (2012) compared the classification problems in the analysis of CIELab parameter nucleation degree with LS-SVM, and found that LS-SVM classification is simpler than traditional SVM classification in structure and is shorter in training time.

2.4 Literature Review of Time Series Analysis Method

Time series analysis method can be divided into 4 categories in accordance with time series stationary: Autoregressive Conditional Heteroskedasticity Model (ARCH), Generalized Autoregressive Conditional Heteroskedasticity Model (GARCH), Vector Auto regression Model (VAR) and Autoregressive Integrated Moving Average Model (ARIMA).

Engle (1982) proposed ARCH model, in which the conditional variance can be allowed to be the function of past residual term, so that conditional variance has the feature of changing over the time. Based on ARCH model, Bollerslev (1986) added the effect of conditional variance on itself to propose the GARCH model. This model modifies the linear deferred structure of ARCH model and has better elasticity and explanatory power. Pokhilchuk and Savek’ev (2016) applied GARCH (1,1) model into the dynamic simulation of real stock market share. The significance of selecting different mathematical methods on GARCH analysis is investigated. They suggested to estimate the GARCH parameter of time series through Maximum Likelihood Estimation (MLE) or Auto Correlation Function (ACF).

Sims (1980) proposed the VAR model, which is defined that n variables during the same sample period can be used as the linear function of their past values. At present, this model is widely applied into the field of time series data in economics. Feldkircher and Huber (2016) used VAR model to explain the effect of internationally economic upheaval on American economy.

According to Box and Jenkins (1970), ARIMA model is proposed, and the main purpose is to carry out stationary processing in non-stationary time series data.

With the application of ARIMA model, Yuan et al. (2016) defined the ARIMA model in the Chinese non-renewable energy consumption forecasting: according to the time series stationary degree, ARIMA model can be broadly divided into Auto Regression (AR), Moving Average (MA), Auto Regression and Moving Average (ARMA). Moreover, in case of non-stationary time series, the integrated differencing steps can make the original series become more stationary, that is the ARIMA model. In the research into the time series forecasting of carbon price with the combination of ARIMA model and LS-SVM model, Zhu and Wei (2012) compared ANN, LS-SVM, ARIMA and other models, and combined ARIMA model and LS-SVM model to develop a new hybrid model. Their research has the considerable contribution to the hybrid model where time series data are processed with ARIMA. Hassan (2014) forecasted the daily and monthly definition index by using ARIMA model and regression model.

III. RESEARCH METHOD

A. Back Propagation Neural Network Model

1) 3.1.1 Structure of Back Propagation Neural Network Model

The NN model in this research adopts error back propagation learning algorithm, and BPNN model proposed is shown in Figure 3.1. During the process of function transfer, S-type function (or logic function) is used as transfer function for the sum of input weight and the definition function for this function is shown in Eq. (3.1):

tanhi iy v v and 11 iv

iy v e Eq. (3.1)

where the former logic function is the Hyperbolic Tangent Function ranging from -1 to 1, and the latter one is similar with the former one, but its scope is from 0 to 1;

iy is the output of the i th neuron, iv is the sum of input weight.



Fig. 3.1 Diagram of BPNN Model in this Research.

This model contains the hidden layer with 2 layers

having 6 nodes; the input layer with one layer having 3 nodes and the historical sales for emulsion cosmetics, soap cosmetics and cleaning cosmetics are used as input variables; the output layer with one layer having only 1 node is used as the output of sales forecasting, and there

are several nodes in each layer; among them, ix is the

input node, jy and ky are the nodes in hidden layer,

O is the node in output layer, i jx ywis the weight

between input layer and hidden layer, j ky yw is the

weight between hidden layers, and ky owis the weight

between hidden layer and output layer; the quantity in hidden layer depends on the quantity of input nodes and output nodes. Many literatures show that the number of layers in hidden layer is often determined by trial and error method, so this research carries out trial and error method to select the number of layers in hidden layer by applying R software; according to the results of trial and error method, if the hidden layer with 2 layers having 6 nodes is selected, the model has the better results. Thus, this research adopts the hidden layer with 2 layers having 6 nodes as the BPNN model.

B. Least Square Support Vector Machine Model

The forecasting to be conducted in this research belongs to the regression of LS-SVM; if T is the training data set and n is sample size, in such situation,

the indication method for training set T is shown in Eq. (3.2).

1 1 2 2, , , , , ,n nT x y x y x y Eq. (3.2)

Where n

ix R is input vector, and iy R is the

output variable corresponding to ix ; besides, iy must fall into the scope between -1 and 1,

namely, 1 , 1iy . Therefore, prior to the execution of LS-SVM model, the data trained must be standardized first.

1) 3.2.1 Mathematical Function of Least Square

Support Vector Machine Model Previous literature suggests that SVM model is often

used together with NN model, and the reason may because these two machine learning methods are both calculated with the method of minimized error function; the minimization problem of LS-SVM is shown in Eq. (3.3).

22 ,

1

min , ,2 2

NT

c ii

J w b e w w e

Eq. (3.3)

Eq. (3.3) is subject to the following conditional equality(3.4).

,[ ( ) ] 1 1,2, , .Ti i c iy w x b e i N Eq. (3.4)



If the binary object of LS-SVM is 1iy , then use 2 1iy

to obtain the following equation:

22 2 2, ,

1 1 1 1

( )N N N N

Tc i i c i i i i

i i i i

e y e e y w x b

Eq. (3.5)

where Ti i ie y w x b is the model error,

and this error will also make the corresponding data of least square become more meaningful, so that the final result is identical to the results by adopting regression model. Therefore, the classification equation of LS-SVM is identical to the following Eq. (3.6):

2( , , ) w DJ w b e E E Eq. (3.6)

where, 1

2T

WE w w,and

22

1 1

1 1

2 2

N NT

D i i ii i

E e y w x b

;

and must be the sum of squared errors (SSE) corresponding to normalization quantity adjusted by Hyper Parameters. Meanwhile, this feasible solution is interdependent with adjustment ratio / . Thus, only is used as adjustment parameter in the original equation, and and are used to give LS-SVM a traditional Bayesian Probability for explanation. After the construction of Lagrangian function, feasible solution of LS-SVM regression can be gained, as shown in Eq. (3.7).

2 21

( , , , ) ( , ) ,N

Ti i i i

i

L w b e J w e w x b e y

2

1 1

1

2 2

N NT T

i i i i ii i

w w e w x b e y

Eq. (3.7)

where i is the multiplier of Lagrangian, and

its optimization condition is shown in Eq. (3.8).

2

1

2

1

2

2

0 ,

0 0,

0 , 1, 2, , ,

0 ,1, 2, , .

N

i ii

N

ii

i ii

Ti i i

i

Lw x

w

L

b

Le i N

e

Ly w x b e N

Eq. (3.8)

As the restricted formula of traditional SVM is an in

equation, it makes the calculation process become more complex and much larger storage space is required; however, LS-SVM adopts the quadratic loss function of least square in place of the insensitive loss function of traditional SVM, and the restricted condition is changed into equation, so that the solution process is transformed into a group of linear equation for rapid solution and high efficiency and relatively smaller storage space, which is more appropriate to solve the large-scale problem.

Therefore, it is widely applied in function estimation. The LS-SVM formula used in this research is shown in Eq. (3.9):

( , )i i if x a y K x x b Eq. (3.9)

2) 3.2.2 Kernel Function of Least Square Support Vector Machine Model

Before the execution of LS-SVM model, Kernel

Function must be chosen first. The Kernel Function of LS-SVM can be broadly divided into5 categories: they are respectively Kernel Function, Linear Kernel Function, Polynomial Kernel Function, Sigmoid Function and Gaussian RBF Kernel. Kernel Function will be indicated in the form shown in Eq. (3.10):

,i j i jK x x x x

Eq. (3.10)

where Kernel Function is indicated with Greek

symbol , and x is the mapping of data in another space; with the use of this form, Kernel Function can be used to develop the data in many different fields, and other commonly used Kernel Function types are listed in the following Eqs. (3.11) ~ (3.14):

a) Linear Kernel Function:

,T

i j i jK x x x x

Eq. (3.11)

b) Polynomial Kernel Function

, , 1d

i j i jK x x x x

Eq. (3.12)

c) Sigmoid Function

, tanh ,i j i jK x x k x x

Eq. (3.13)

d) Gaussian RBF Kernel

2

2,

i j

i j

x xK x x e

Eq. (3.14)

where ix

and jx

are the eigenvectors of input space, and eigenvector is gained through the calculation of training samples or test samples. In Gaussian RBF Kernel,

2

i jx x can be considered as the square of Euclid distance between two eigenvectors; e is Exponential, and is one free parameter; one simpler definition with equal value is to set up a new parameter , and the expression function for this parameter is shown in Eq. (3.15):



2

1

Eq. (3.15)

Eq. (3.15) is taken into the original formula of Eq. (3.14) to gain:

2,i j i jK x x e x x

Eq. (3.16)

As the value of RBF Kernel Function will reduce with the distance and lies between 0 (extreme value) and 1 (when the two eigenvectors are equal), it is a kind of notation of similarity measure. The feature space of Kernel Function has infinite multiple dimensions. For

instance, suppose 1 , its expansion equation is as shown below:

2 22

0

( )

!

ki j

i j i jk

x xe x x e x e x

k

Eq. (3.17)

where e is Exponential, k is the dimensionality of vector space. Besides, the two eigenvectors are equal,

namely, i jx x.

The selection of Kernel Function has a far-reaching effect on the performance of LS-SVM, as the proper Kernel Function can greatly reduce the data calculation amount of LS-SVM and can process the input of high dimensional space; in addition, in the non-linear model of LS-SVM, it is unnecessary to know about the form and parameter of non-linear transformation function, so the importance of Kernel Function to LS-SVM can be known easily. It is not difficult to confirm Kernel Function; generally speaking, the function satisfying Mercer theorem can be used as Kernel Function.

C. Auto Regressive Model

1) 3.3.1 Mathematical Function of Auto Regressive Model

Auto Regressive Model adopts the historical time series data in the past as independent variables. Suppose that only the data in the previous are used as independent variable simply, thus it is called as first-order autoregressive model, namely AR(1) model, and its definition is shown in Eq. (3.18):

0 1 1t t tY Y Eq.

(3.18)

where 2~ (0, )t WN

and

0, 0t t jE Y j , thus 1 is called as

first-order autoregressive coefficient. With repeated iteration, Eq. (3.19) is gained:

0 1 0 1 2 1

20 1 1 2 1 1

2 20 1 1 1 1 1 2 1

0 1 1 10 0

,

1 ,

1 lim ,

lim

t t t t

t t t

kt t t t k

k

j j kt j t k

kj j

Y Y

Y

Y

Y

Eq.

(3.19)

When 1 0 , 1lim 0k

t kkY

, then Eq. (3.20) is

gained:

0 1 10 0

01

01

10

,

,1

j jt t j

j j

jt j

j

jt j

j

Y

Eq. (3.20)

Therefore, its expected value can be obtained as follows:

t t jE Y E Y Eq. (3.21)

And its variance is as follows:

2 2 41 1

2

21

1

, . 11

tVar Y

if

Eq.

(3.22)

And its covariance is:

,t t j t t jCov Y Y E Y Y

2 21 1 1 2 1 1 1 2

2 2 21 21 1 1 1 2

2 2 41 1 1

212

1

1

1

t t t t t j t j

j j jt j t j t j

j

j

E

E E E

Eq. (3.23)

It can be found that the condition to make AR(1) tend

to be stationary is 1 . As for AR(1) model, in addition to using the above

function to indicate the character of model, it can also

estimate 1 with the use of least square, and its least square equation is shown in Eq. (3.24):

121 2

2

T

t tt

T

tt

Y Y Y Y

Y Y

Eq. (3.24)



2) 3.3.2 Forecasting Method of Auto Regressive Model

If the t kY value after the kth period is expected to

be gained in the tth period, t kY can be rewritten into the form of Eq. (3.25) by using the recursive method:

1 1

1 1 2 1

21 2 1 1

11 1 1 1 1

,

,

,

t k t k t k

t k t k t k

t k t k t k

k kt t t k t k

Y Y

Y

Y

Y

Eq. (3.25)

Under the evaluation criteria where Loss Function is Mean Squared Error (MSE), the information set

1, ,t t tY Y can be used as conditional expected

value, as is shown in Eq. (3.26):

1 1t t t tE Y E Y Eq. (3.26)

Eq. (3.26) is the optimal forecast equation of 1tY ; generally speaking, conditional expected value is not necessarily the linear function of elements in information set. However, if conditional expected value is linear

projection, and the projection of tY on linear span

expanded by t is used as the forecasting of 1tY , as is shown in Eq. (3.27):

10

t t j t jj

E Y H Y

Eq. (3.27)

where, 1, ,t p t tH S Y Y is the span of liner space;

if the joint probability distribution of tY is normal

distribution, the conditional expected value equals linear

projection. tY belongs to t , so t t tE Y Y. Besides,

2~ 0,t WN and 0t j tE Y

, and the conditional expected value is linear projection, so:

1 0tE Eq. (3.28)

In brief, the future exogenous shock ( t k ) is projected to the information set in the tth period, 0 will be obtained. Therefore, if Eq. (3.29) is given:

11 1 1 1 1 1k k

t k t t t k t kY Y Eq. (3.29)

Thus,

1k

t t k tE Y Y Eq. (3.30)

Therefore, if time series data 1

T

t tY

is considered, namely the time series data in the Tth period, the

historical data are 1 2, , TY Y Y . Suppose the model given is AR(1), according to the description above, the forecast of T k is shown in Eq. (3.31):

1k

T k T T T k TE Y E Y Y Eq. (3.31)

However, 1 is unknown parameter, so the

historical data of 1 2, , TY Y Ymust be first used to find

out the estimate equation of 1 and construct the forecast

equation of T kY , as is shown in Eq. (3.32):

1

k

T T k TE Y Y Eq. (3.32)

where T T kE Y is the forecast equation constructed

with random sample, and the superscript T k and T represent that the known information in the Tth period is used to forecast the value in the (T k )th period.

D. Method for Model Measurement

1) 3.4.1 Error Calculation Method of Forecast Model The following several methods for the measurement of

forecast error are often used to calculate the model error, and they are respectively: Max Error, ME, Mean Absolute Percent Error, MAPE, Mean Square Error, and the definitions are shown in Eqs. (3.25) ~ (3.27); moreover, the evaluation criteria for the accuracy of MAPE are shown in Table3.1:

max f rME y y Eq. (3.25)

1

(%) 100

N r f

kr

y y

yMAPE

N

Eq. (3.26)

2

1

N

r fky y

MSEN

Eq. (3.27)

where ry is actual value, fy is the k th predicted

value, and N is sample size.



TABLE 3.1 TABLE OF EVALUATION CRITERIA OF MAPE MAPE (%) Description

< 10% Precise forecast

10% ~ 20% Excellent forecast

20% ~ 50% Reasonable forecast

> 50% Imprecise forecast

2) 3.4.2 2R Calculation Method of Forecast Model

In addition to these methods for the measurement of

forecast error, this research must use the square of

Pearson Correlation Coefficient (2R ) as well, which is

often used for the measurement of model performance based on regression effect. The BPNN model, LS-SVM model and AR (1) model mentioned in this research are

all based on regression effect, so with the adoption of 2R ,

the performance of each model can be known and compared; besides, R software can provide a very simple analysis method: select the variable to be calculated, then

give the desired 2R through “cor” instruction.

2R is the important tool of variable linear performance in the definition of regression analysis. The simpler definition

equation of 2R is shown in Eq. (3.28).

2 1SSR SSE

RSST SST

Eq. (3.28)

where SSR is the regression sum of squares, SST is the total sum of squares, SSE is the error sum of squares,

2R and SSE are often used to evaluate the predictability of model.

Moreover, the evaluation indicators of 2R are

summarized in Table3.2:

TABLE 3.2 TABLE OF EVALUATION INDICATOR OF 2R

Performance Negative Correlation Positive CorrelationNone -0.09 ~ 0.0 0.0 ~ 0.09 Weak -0.3 ~ -0.1 0.1 ~ 0.3

Medium -0.5 ~ -0.3 0.3 ~ 0.5 Strong -1.0 ~ -0.5 0.5 ~ 1.0

IV. RESULTS AND DISCUSSION

A. Industrial Data Set

This paper adopts the sales data of cosmetic industry accumulated by the department of statistics of Ministry of Economic Affairs in Taiwan. In the database, the data are accumulated from January 1982 to December 2014, stretching over 33 years, with a total of 396 months, so totally 396 records can be used as the basis for the discussion on the development history and forecasting tendency of cosmetic industry.

B. Forecast Results

1) 4.2.1 Forecast Results of Back Propagation Neural Network Model

In terms of the forecast process of BPNN model, emulsion cosmetics, soap cosmetics and cleaning cosmetics are used as the input variable of network and then sales predicted value is given to be output variable. The hidden layer with two layers having 6 nodes is adopted in the network. Besides, this forecasting process is to use the historical data from 1982 to 2014 to forecast the sales in 2015, and its forecast results are shown in Figure 4.1:

Fig. 4.1 Forecast Results of BPNN

This paper adopts the BPNN model to forecast the sales of cosmetics in 2015, and it shows that the fluctuation of this predicted value is very unstable. In January, the sales amount performs noticeably well, but in February, it hits the rock bottom of 230,000 (thousand NTD); it is found from the yearly historical data that it may be directly caused by lunar new year; in March, there is significant increase; from April to June, there is a fluctuant change; after June, the sharp recovery appears; until August, it reaches its peak of 390,000 (thousand NTD), then in September, a sharp incline occurs; in October, there is a slight recovery, while in November, a sharp decline happens again; in December, its performance tend to be gentle.

2) 4.2.2 Forecast Results of Least-square Support

Vector Machine Model In terms of the forecast process of LS-SVM model,

emulsion cosmetics, soap cosmetics and cleaning cosmetics are used as the input variable of model and then sales predicted value is given to be output variable; with regard to Kernel Function, Gaussian RBF Kernel is adopted. Besides, this forecasting process is to use the historical data from 1982 to 2014 to forecast the sales amount in 2015, and its forecast results are shown in Figure 4.2: Forecast results of least-square support vector machine



Fig. 4.2 Forecast Results of LS-SVM This paper adopts the LS-SVM model to forecast the

sales amount of cosmetics in 2015, which shows that this predicted value is less stable than the predicted value obtained in the neural network. In January, the sales amount of cosmetics has the general performance, but in February, it also hits the rock bottom of 260,000 (thousand NTD), it is found from the yearly historical data that it maybe directly caused by lunar new year; in March, there is significant increase; from April to May, there is a gentle decline; in June, the tendency of sales decline becomes increasingly serious; after July, there is an extremely significant growth to reach its peak of 395,000 (thousand NTD); in August, a sharp decline occurs, while in September, the sales amount increases considerably; then it declines sharply until November; finally, it rises again at the end of year.

3) 4.2.3 Forecast Results of Auto Regressive Model

Prior to the discussion on the forecast results of AR

(1) model, auto-correlogram (Figure 4.3) and partial auto-correlogram (Figure 4.4) are first used to judge the stability of data.

Fig. 4.3 ACF Diagram: Auto-correlogram

Fig. 4.4 PACF Diagram: Partial Auto-correlogram.

Figure 4.3 shows that the data belong to stable time

series and the autocorrelation coefficient is 0.7. According to the methods to distinguish ARIMA model given by the Box and Jenkins (1970) of ARIMA model, in case of autocorrelation coefficient greater than 0.5, it will be better in forecast effect to adopt auto regression. On the contrary, in case of autocorrelation coefficient less than 0.5, forecast becomes inaccurate, so this paper adopts AR(1) model to forecast the cosmetics sales data in 2015.

Prior to adoption of AR(1) model, in order to discuss the relationship of time series of cosmetics sales data in AR(1) model, the sales data must be taken into AR(1) model first to gain the auto regression scatter diagram

with the comparison between tY and 1tY , as is shown in Figure4.5.

Fig.4.5 Scatter Diagram of AR(1) Model: Auto regression scatter diagram

In the scatter diagram of time series data adopted in

this research, tY of Y axis represents the data point of series in the current period and it is used as dependent

variable; 1tY of X axis represents the data of series in the previous period and it is used as independent variable. Thus, simply speaking, AR(1) is the model only using the data in the previous period as independent variable and

using 1tY to forecast tY.

The forecast process of AR(1) model in this research is to use the historical data from 1982 to 2014 to forecast the sales amount in 2015, and its forecast results are shown in Figure 4.6:



Fig.4.6 Forecast Results of AR(1) Model: Forecast results of auto

regressive model

Based on the forecast results of cosmetics sales in 2015 with AR(1) model, it can be known that this predicted value presents the fluctuation with big rise and fall; meanwhile it can also be known that in January the cosmetics sales perform ordinarily, but in February, it also hits the rock bottom of 280,000 (thousand NTD),it is found from the yearly historical data that it may be directly caused by lunar new year; in March, there is an explosive growth; in April, it declines sharply, and the tendency of decline becomes slow in May; from June to July, the cosmetics sales pick up significantly, after August, significant incline occurs, while September sees the significant increase; in October, there is a gentle decline and the tendency of decline becomes obvious in November; however, the explosive growth occurs in December to reach its peak of 476,000.

C. Measurement and Comparison of Models

1) 4.3.1 Comparison of Forecast Results In order to analyze the difference between forecast

results and actual values of each model, the line graph after comparison is shown in Figure4.7.

Fig.4.7 Comparison between Forecast Results and Actual Values of Each

Model.

The three predicted values and actual values are compared to gain the line graph, and it can be seen that

the line of AR(1) model (upper left) deviates the line of actual values with the maximum distance; while the distance between the line of LS-SVM model (lower left) and the line of actual values is the shortest, in spite of the occasional inconsistency between the two. Therefore, the conclusion can be preliminarily drawn that LS-SVM model may have the better results.

2) 4.3.2 Error Comparison of Forecasting Models

The error comparison for the overall forecasting

models is shown in Table 4.1:

TABLE 4.1 TABLE OF ERROR COMPARISON OF DEVELOPMENT MODELS

BPNN Model LS-SVM Model AR (1) Model

Performance measurement

indicator

Training set

Test set

Training set

Test set

Overall model

ME 28475 26261 19468 17132 56259

MSE 89.33 10 83.17 10 84.38 10

82.94 10 96.11 10

MAPE 21.78% 12.56% 12.81% 9.21% 17.13%

The table shows the performance measurement

indicators of models proposed by this research, and it can be seen that LS-SVM model has the least MSE although the three models have the quite similar ME; in terms of the overall development model of MAPE, LS-SVM has the least MAPE, i.e. LS-SVM model developed by this research is the most accurate in error comparison.

3) 4.3.3 Comparison of

2R of Forecasting Models

In forecasting models, the 2R of training set and

predicted value, the 2R of test set and predicted value,

and the overall 2R of AR(1) model are shown in Table

4.2.

TABLE 4.2 TABLE OF PERFORMANCE COMPARISON OF DEVELOPMENT MODELS

BPNN Model LS-SVM Model AR (1) Model

Square of correlation coefficient

Training set Test set Training

set Test set

Overall model

2R 0.7837 0.8628 0.8512 0.9084 0.8076

It can be known that the models developed have a

considerable degree of performance to predicted values. However, AR (1) discusses the

2R of overall model, so it is inferior to other values in data set of LS-SVM. Therefore, the conclusion can be drawn that LS-SVM is more suitable for cosmetics sales forecasting than other models.



V. CONCLUSION AND SUGGESTIONS

A. Conclusion

1) In terms of sales forecasting of cosmetic industry, the conclusion is drawn that LS-SVM has the least model error through the error comparison of forecasting models, i.e. for the forecasting of cosmetics sales, the adoption of LS-SVM will have the least error and gain the better results.

2) In terms of sales forecasting of cosmetic industry, the conclusion is drawn that test set and predicted value have the largest

2R in LS-SVM model through the performance comparison of forecasting models, i.e. for the forecasting of cosmetics sales, the adoption of LS-SVM will have the better performance and results.

Based on the above two conclusions, it can be proved that LS-SVM is more suitable for cosmetics forecasting than other models.

B. Future Research Directions and Suggestions

This paper has not yet discussed the outlier in the data. Therefore, the outliers for the sales in recent ten years are shown in Figure 5.1 through box plot:

Fig. 5.1 Box Plot of Outlier.

The box plot is made according to the outliers of yearly sales from 2005 to 2015, and the outlier is gained through one month divided by the days in this month in every year. This research fails to consider the outliers in forecasting, so it is suggested that the future research should consider the outliers to gain a better research results.

REFERENCES [1] E. Alpaydin, Introduction to Machine Learning, Cambridge, MA:

The MIT Press, 2004. [2] T. Bollerslev, Generalized Auto Regressive Conditional

Heteroskedasticity, Journal of Econometrics. 31 (3) (1986), 307–327.

[3] T. Bollerslev, R.Y. Chou, and K.F. Kroner, ARCH Modeling in Finance, Journal of Econometrics. 52 (1992), 5–59.

[4] G. Box, and G. Jenkins, Time Series Analysis: Forecasting and Control, Holden Day Publish. San Francisco, LA, USA, 1970.

[5] R. Engle, Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation, Econometrica. 50 (4) (1982), 987–1007.

[6] F.P. Fang, and X.F. Fang, Multivariate Forecasting Mode of Guangdong Province Port Throughput with Genetic Algorithms and Back Propagation Neural Network, Procedia-Social and Behavioral Sciences. 96 (2013), 1165–1174.

[7] M. Feldkircher, and F. Huber, The International Transmission of US Shocks—Evidence from Bayesian Global Vector Auto Regressions, European Economic Review. 81(2016), 167–188.

[8] T. Fukuda, and T. Shibata, Theory and Applications of Neural Networks for Industrial Control Systems, IEEE Transactions on Industrial Electronics. 39 (6) (1992), 472–489.

[9] M.,Ghaedi, M.R. Rahimi, A.M. Ghaedi, I. Tyagi, S. Agarwal, and V.K. Gupta, Application of Least Squares Support Vector Regression and Linear Multiple Regression for Modeling Removal of Methyl Orange onto Tin Oxide Nanoparticles Loaded on Activated Carbon and Activated Carbon Prepared from PistaciaAtlantica Wood, Journal of Colloid and Interface Science, 461(2016). 425–434.

[10] A. Ghaffari, H. Abdollahi, M.R. Khoshayand, I. Soltani Bozchalooi, A. Dadgar, and M. Rafiee Tehrani, Performance Comparison of Neural Network Training Algorithms in Modeling of Bimodal Drug Delivery, International Journal of Pharmaceutics. 327 (2006), 126–138.

[11] J. Hassan, ARIMA and Regression Models for Prediction of Daily and Monthly Clearness Index, Renewable Energy. 68(2014), 421–427.

[12] K. Kandananond, Applying 2k Factorial Design to Assess the

Performance of ANN and SVM Methods for Forecasting Stationary and Non-stationary Time Series, Procedia Computer Science. 22 (2013). 60–69.

[13] P. Langley, Elements of Machine Learning, Morgan Kaufmann Publish. San Francisco, CA, USA, 1996.

[14] B. Lantz, Machine Learning with R, PACKT Publish. Birmingham, UK, 2013.

[15] J. Liu, and T.Y. Tang, LS-SVM Based Substation Circuit Breakers Maintenance Scheduling Optimization, International Journal of Electrical Power and Energy Systems. 64 (2015), 1251–1258.

[16] W.S. Mcculloch, and W. Pitts. A Logical Calculus of the Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics, 5(1943). 115–133.

[17] T.M. Mitchell, Machine Learning, McGrawHill. New York, USA, 1997.

[18] A.K. Pani, and H.K. Mohanta, Online Monitoring and Control of Particle Size in the Grinding Process Using Least Square Support Vector Regression and Resilient Back Propagation Neural Network, ISA Transactions. 56 (2015),206–221.

[19] K.A. Pokhilchuk, and S.E. Savek’ev, On the Choice of GARCH Parameters for Efficient Modelling of Real Stock Price Dynamics, Statistical Mechanics and its Applications. 448 (2016), 248–253.

[20] C.A. Sims, Macroeconomics and Reality, Econometrica. 448(1980), 248–253.

[21] J.A.K. Suykens, and J. Vandewalle, Least Squares Support Vector Machine Classifiers, Neural Processing Letters. 9 (3) (1999), 293–300.

[22] V. Vapnik, and C. Cortes, Support-Vector Network, Machine Learning, 20(1995), 273–297.

[23] P.J. Werbos. Back Propagation through Time: What It Does and How to Do It, Proceedings of the IEEE. 78 (10) (1990). 1550–1560.

[24] I.C. Yeh, Modeling of Strength of High Performance Concrete Using Artificial Neural Networks, Cement and Concrete Research. 28 (12) (1998), 1797–1808.

[25] C.Q.Yuan, S.F. Liu, and Z. GFang, Comparison of China's primary energy consumption forecasting by using ARIMA (the autoregressive integrated moving average) model and GM(1,1) model, Energy. 100 (1) (2016), 384–390.

[26] H. Zheng, and H.F. Lu, A Least-Squares Support Vector Machine (LS-SVM) Based on Fractal Analysis and CIEL ab Parameters for the Detection of Browning Degree on Mango, Computers and Electronics in Agriculture/ 83(2012), 47–51.

[27] B.Z. Zhu, and Y.M. Wei, Carbon Price Forecasting with a Novel Hybrid ARIMA and Least Squares Support Vector Machines Methodology, Omega. 41 (3) (2013), 517–524.

Documents

A Novel Revenue Development and Forecasting Model using ...ijssst.info/Vol-17/No-48/paper39.pdf · JUI-CHAN HUANG et al: A NOVEL REVENUE DEVELOPMENT AND FORECASTING MODEL USING …