A Hybrid GMDH and Box-Jenkins Models in Time Series … · 2014. 5. 22. · Box-Jenkins method 1. Introduction The most comprehensive of all popular and widely known statistical models

Applied Mathematical Sciences, Vol. 8, 2014, no. 62, 3051 - 3062

HIKARI Ltd, www.m-hikari.com

http://dx.doi.org/10.12988/ams.2014.44270

A Hybrid GMDH and Box-Jenkins Models in

Time Series Forecasting

Ani Shabri

Department of Mathematics

Faculty of Science

University Technology of Malaysia

81310 Skudai, Johor, Malaysia

Ruhaidah Samsudin

Department of Software Engineering

Faculty of Computing

University Technology of Malaysia

81310 Skudai, Johor, Malaysia

Copyright © 2014 Ani Shabri and Ruhaidah Samsudin. This is an open access article distributed under the

Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any

medium, provided the original work is properly cited.

Abstract

The group method of data handling technique (GMDH) and Box-Jenkins methods are two well-

known time series forecasting of mathematical modeling. In this paper, we introduce a hybrid

modeling which combines the GMDH method with the Box-Jenkins method to model time series

data. The Box-Jenkins method was used to determine the useful input variables of GMDH

method and then the GMDH method which works as time series forecasting. The lynx series

contains the number of lynx trapped per year is used in this study to demonstrate the

effectiveness of the forecasting model. The results found by the proposed GMDH were

compared with the results of Box-Jenkins and artificial neural network (ANN) models. The

comparison of modeling results shows that the GMDH model perform better than two other

models based on terms of mean absolute error (MAE) and root mean square error (RMSE). It

also indicates that GMDH provides a promising technique in time series forecasting.

3052 Ani Shabri and Ruhaidah Samsudin

Keywords: Group method of data handling technique, time series forecasting, neural networks,

Box-Jenkins method

1. Introduction

The most comprehensive of all popular and widely known statistical models which have been

utilized in the last four decades for time series forecasting are Box-Jenkins method. However,

the Box-Jenkins model is only a class of linear model and thus it can only capture linear feature

of data time series [1], but many time series are often full of nonlinearity and chaotic.

More advanced nonlinear methods such as neural networks have been frequently applied in

nonlinear time series modeling and chaotic time series modeling in recent years [2, 3, 4, 5, 6, 7].

ANN provides an attractive alternative tool for both forecasting researchers and has shown their

nonlinear modeling capability in data time series forecasting.

One sub-model of ANN is a group method of data handling (GMDH) algorithm was first

developed by [8]. This model has been successfully used to deal with uncertainty, linear or

nonlinearity of systems in a wide range of disciplines such as engineering, science, economy,

medical diagnostics, signal processing and control systems [9, 10, 11].

Improving forecasting especially time series forecasting accuracy is an important yet often

difficult task facing many decision makers in a wide range of areas. Combining several models

or using hybrid models can be an effective way to improve forecasting performance. There have

been several studies suggesting hybrid models such as combining the ARIMA and ANN model

[1, 12, 13, 14], the GMDH and ANN model [14], GMDH and differential evolution [15], GMDH

and LSSVM [16]. More recently, a new class of neural network combining ANN model with

Box-Jenkins (BJ) approach was explored for modeling time series [17, 18, 19, 20]. The BJ

approach was used to determine the most important variables as input nodes in the input layer

and ANN were explored for modeling time series data. Their results showed that the hybrid

model can be an effective way to improving predictions achieved when the variables of input

layer of ANN is chosen based on BJ approach rather than on traditional methods.

In this paper, a new hybrid GMDH-type algorithm is proposed by combining the GMDH

model with the BJ approach to model time series data. The BJ approach is used to generate the

most useful variables as input nodes in the input layer of data from under study time series.

Then, a GMDH model is used to model the generated data by Box-Jenkins model and to predict

the future of time series. To verify the application of this approach, the Canadian lynx data sets

are used in this study.

2. Forecasting Methodology

This section presents the ARIMA, ANN and GMDH models used for modeling time series.

The choice of these models in this study was because these methods have been widely and

successfully used in time series forecasting.

A hybrid GMDH and Box-Jenkins models 3053

2.1 Box-Jenkins Approach

The Box-Jenkins method that was introduced by [21] has been one of the most popular

approaches to the analysis of the time series and prediction. The general ARIMA models are

represented by the following way:

tqtd

p aByBB )()1)(( (1)

where )(B and )(B are polynomials of order p and q, respectively; d number of regular

differencing. Random errors, ta are assumed to be independently and identically distributed with

a mean of zero and a constant variance of 2 . The Box-Jenkins methodology is basically divided in the four steps: identification, estimation, diagnostic checking and forecasting. In the

identification step, transformation is often needed to make time series stationary. The next step is

choosing a tentative model by matching both the autocorrelation (ACF) and partial

autocorrelation function (PACF) of the stationary series. Once a tentative model is identified, the

parameters of the model are estimated. The last step of model building is the diagnostic checking

of model adequacy, basically to check if the model assumptions about the error, ta are satisfied.

The process is repeated several times until a satisfactory model is finally selected. The

forecasting model was then used to compute the fitted values and forecasts values.

2.2 The Neural Network Forecasting Model

The artificial neural networks (ANN) that serve as flexible computational frameworks have

been extensively studied and gained much popularity in many areas applications as well as in

science, psychology and engineering. The ANN with single hidden layer feed-forward network is

the most widely used and suitable for modeling and forecasting in time series. In a feed-forward

ANN, the neurons are usually arranged in layers. The first layer is the input layer where the data

are introduced to the network, the second layer is the hidden layer where data are processed and

the last layer is the output layer where the results of given input are produced. The structure of a

feed-forward ANN is shown in Fig. 1.


Fig. 1 Architecture of three layers feed-forward back-propagation ANN

The relationship between the input observations )...,,,( 21 pttt yyy and the output value )( ty

assuming a linear output neuron is given by

q

j

p

i itijjjtywwfabgy

1 100)( (2)

where jb (j = 0, 1, 2, …, q) is a bias on the jth unit, and ijw (i = 0, 1, 2, …, p; j = 0, 1, 2, …, q)

are connection weights, f and g are hidden and output layer activation functions, respectively.

[22]. Several optimization algorithms can be used to train the ANN. Among the several training

algorithms available, back-propagation has been the most popular and most widely used [23]. In

a back-propagation network, the weights and bias values are initially chosen as random numbers

and then fixed by the results of a training process. The goal of training algorithm is to minimize

the global error.

2.3 The Group Method of Data Handling

The GMDH method was originally formulated to solve for higher order regression polynomials

especially for solving modeling and classification problem. General connection between inputs

and output variables can be expressed by a complicated polynomial series in the form of the

Volterra series, known as the Kolmogorov-Gabor polynomial [8]:

Input Layer

Hidden Layer

Output Layer1ty

2ty

.

.

.pty

ty

jw0

ijw

jw

Bias unit

(Tangent Sigmoid transfer function)

(Linear transfer function)

.

.

.


M

i

M

j

M

k

kjiijk

M

i

M

j

jiij

M

i

ii xxxaxxaxaay1 1 11 11

0 ... (3)

where ,...),,( kji xxx is the input vector variables, M is the number of input and ,...),,,( 0 ijkiji aaaa

is the vector of summand coefficients.. However, for most application the quadratic form are

called as partial descriptions (PD) for only two variables is used in the form

2

5

2

43210),( jijijiji xaxaxxaxaxaaxxGy (4)

to predict the output. The input variables are set to ),...,,,( 321 Mxxxx and output is set to {y}. The

coefficients ia for 5...,,1,0i are determined using the least square method. The framework of

the design procedure of the GMDH consists of the following steps.

Step 1: Select input variables },...,,{ 21 MxxxX where M is the total number of input. The data

are separated into training and testing data sets. The training data set is used to construct

a GMDH model and the testing data set is used to evaluate the estimated GMDH model.

Step 2: Construct 2/)1( MML new variables },...,,{ 21 LzzzZ in the training data set for

all independent variables and choose a PD of the GMDH. Conventional GMDH is

developed using the polynomial

2

5

2

43210),( jijijijil xaxaxxaxaxaaxxGz for Ll ,...,2,1 (5)

as PD. In this study, a PD structure, namely radial basis function (RBF) using the

polynomial function is proposed in construct the GMDH. The radial basis function

(RBF) model is used in the form

2lg

l ez

where 2

5

2

43210 jijijil xaxaxxaxaxaag (6)

Step 3: Estimate the coefficient of the PD. The vectors of coefficients of the PDs are determined

using the least square method.

Step 4: Determine new input variables for the next layer. There are several specific selection

criteria to identify the input variables for the next layer. In our study, we used two

criteria. The first criteria, the single best neuron out of these L neurons, 'z identified according to the value of mean square error (MSE) of testing dataset such that


Tn

i

kii

T

zyn 1

2

, )(1

MSE for k = 1, 2, …, L. (7)

where Tn is the number of testing data set. If the smallest value of MSE of 'z less than

threshold, then terminated, otherwise set the new input variables )',,...,,,( 321 zxxxx M .

In second criteria, eliminate the least effective variables, replace the column of

},...,,{ 21 kxxxX by those column },...,,{ 21 kzzzZ that best estimate the dependent

variable y in the testing dataset. This is captured by the expression

11 zx , 22 zx , … , kk zx (8)

where k is the total number of the retained new input variables.

Step 5 : Check the stopping criterion. The lowest value of selection criteria using GMDH model

at each layer obtained during this iteration is compared with the smallest value obtained

at the previous one. If an improvement is achieved, one goes back and repeats step 1 to

5, otherwise the iterations terminate and a realization of the network has been

completed. Once the final layer has been determined, only the one node characterized by

the best performance is selected as the output node. The remaining nodes in that layer

are discarded. And finally, the GMDH model is obtained.

3. Time series prediction by ANN and GMDH

A classical method, for time series forecasting problem, the number of input nodes of nonlinear

such as the ANN or GMDH model is equal to the number of lagged variables ),...,,( 21 pttt yyy ,

where p is the number of chosen lagged. The outputs, ty , the predicted value of a time series

defined as

),...,,( 21 ptttt yyyfy (9)

However, currently there is no suggested systematic way to determine the optimum number of

lagged p. The other method to determine the number of nodes in the input layer is based on the

Box-Jenkins model. Unlike the previous method where the number of lagged p is chosen either

in an ad hoc basis or from traditional methods, the lagged variables obtained from the Box-

Jenkins analysis are the most important variables to be used as input nodes in the input layer of

the ANN or GMDH model. In our proposed model, a time series model based on Box-Jenkins

methodology is considered as nonlinear function of several past observations and random errors

as follows:


)],...,,(),,...,,[( 2121 qtttptttt aaayyyfy (10)

where f is a nonlinear function determined by the ANN and GMDH.

4. Empirical Results

In this section, we illustrate the hybrid GMDH-type algorithm and show its performance for

forecasting the Canadian lynx data. The lynx series contains the number of lynx trapped per year

in the Mackenzie River district of Northern Canada. The data set has 114 observations,

corresponding to the period of 1821 to 1934. It has also been extensively analyzed in the time

series literature with a focus on the nonlinear modeling. This lynx data are one of the most

frequently used time series. The data are plotted in Fig. 2, which shows a periodicity of

approximately 10 years. It indicates that the series is stationary in the mean but not be stationary

in variance. The lynx series was studied by many researchers and the first time series analysis

was carried out by [24] and then recently by [25] Kajitani et al. (2005) who fit an AR(2) model

to the logged data. [26], [1], and [20] found the best-fitted model is AR(12) model.

Fig.2 Canadian Lynx data series (1821-1934)

The authors use R-package to formulate the Box-Jenkins technique. Using the Box-Jenkins

technique on the Lynx time series, two models, AR(2) and AR(12) were considered and

statistical results are compared in the following Table 1 based on mean squared error (MSE),

Akaike Information Criterion (AIC) and Schwardz Information Criterion (SIC). We used log

with base 10, which makes the lynx data more symmetrical.

Time

Ly

nx N

um

be

rs

1109988776655443322111

7000

6000

5000

4000

3000

2000

1000

0


Table 1: Comparison of ARIMA models’ Statistical Results

Model MSE AIC SIC

AR(2)

AR(12) 0.0179

0.0238 -1.7072

-1.3834 -1.6550

-1.0708 Note: The data in boldface means the best statistical results.

Table 1 shows that the lowest MSE, AIC and SIC statistics of 0.0179, -1.7072 and -1.6550,

respectively were observed for AR(2). Hence, according to their performances indices, AR(2) is

selected for appropriate ARIMA model for Lynx series. The AR(2) that we identified takes the

following form

tttt ayyy 21 7385.03693.10652.1 (11)

In designing the ANN and GMDH models, one must determine the following variables: the

number of input nodes and the number of layers. The selection of the number of input

corresponds to the number of variables play important roles for many successful applications of

ANN and GMDH models.

To make the ANN and GMDH models simple and reduce some computational burden, only the

lagged variables obtained from the Box-Jenkins are used as input layers. In this study, based on

Box-Jenkins methodology in linear modeling from Eq. 11, a time series is considered as

nonlinear function of several past observations as follows

),( 21 ttt yyfy (12)

where f is a nonlinear function determined by ANN and GMDH models. The nodes in the input

layer consist of lagged variables 1ty and 2ty obtained from the Box-Jenkins analysis.

The ANN model was implemented with software package neural network toolbox using

MATLAB The hidden nodes use the hyperbolic tangent sigmoid transfer function and the output

layer uses the linear function because the prediction performance is the best when these transfer

functions are used. The network was trained for 5000 epochs using the conjugate gradient

descent back-propagation algorithm with a learning rate of 0.001 and a momentum coefficient of

0.9. The optimal number of neuron in the hidden layer was identified using several practical

guidelines. These includes using I/2 [27], 2I [28] and 2I+1 [29], where I is the number of input.

In this study, a trial and error method is performed to optimize the number of neurons in the

hidden layer. The best neural networks architecture for lynx series found consists of 2 inputs, 5

hidden and 1 output neurons (2x5x1).

The GMDH model works by building successive layers with complex connections that are

created by using second-order polynomial and exponential function. The first layer created is

made by computing regressions of the input variables. The second layer is created by computing

regressions of the output value. Only the best are chosen at each layer and this process continues

until a pre-specified selection criterion is found. The optimal number of neuron in the hidden


layer of GMDH model was identified using a trial and error procedure by varying the number of

hidden neurons from 1 to 10 for each model. The best fit model structure is for each model is

determined according to criteria of performance index (MSE).

Table 2 shows the performance results of the ARIMA, ANN and GMDH approach based on

mean squared error (MSE), mean absolute error (MAE) and correlation coefficients (R2).

Table 2: Comparison of ARIMA, ANN and GMDH

Model MSE MAE R2

AR(2) 0.0179 0.1166 0.8959

ANN (2x5x1) 0.0107 0.0684 0.9351

GMDH 0.0074 0.0624 0.9589 Note: The data in boldface means the best statistical results.

From Table 2, considering the MSE, MAE and R2 being regarded here as a performance

indicator, the experimental results clearly demonstrate that the GMDH outperforms the other

models. Fig. 3 shows the actual and forecasted values respectively.

Table 3 shows the performance of our proposed model and other models studied in the

previous literature. To measure forecasting performance, MSE and MAE are employed as

performance indicator. The experiment results show that our proposed model offers encouraging

advantages and has good performance.

Table 3: Comparison of the performance of the proposed model with those of other forecasting

models

Model MSE MAE

Zhang’s ARIMA model 0.02049 0.1123

Zhang’s ANN model 0.02046 0.1121

Zhang’s Hybrid model 0.01723 0.10397

Khashei & Bijari’ s ANN model 0.01361 0.08963

Kajitani’s SETAR model 0.01400 -

Kajitani’s FNN model 0.0090 -

Aladag’s Hybrid model 0.0090 -

Proposed model 0.0074 0.0624 Note: The data in boldface means the best statistical results.


Fig. 3: Comparison between observed and predicted for GMDH, ARIMA and ANN models for

Lynx time series (testing phase)

5. Conclusion

One of the major developments in ANN over the last decade is the model combining or hybrid

models. In this paper we proposed a hybrid GMDH model that combines the time series Box-

Jenkins model and the GMDH model to forecast time series data. The GMDH model in

conjunction with Box-Jenkins approach has been demonstrated to model the lynx data. The Box-

Jenkins approach is applied to propose a new hybrid method for improving the performance of

the GMDH to time series forecasting. The empirical results indicate the proposed method yields

better results than other methods. This approach presents a superior and reliable alternative to

ANN, ARIMA and hybrid models studied by other researchers.

Acknowledgements. This research has been funded by the Ministry of Science, Technology and

Innovation (MOSTI), Malaysia under Vot 4F399. Lastly, thanks are given to the Universiti

Teknologi Malaysia.

References

[1] G.P. Zhang, Time Series Forecasting Using a Hybrid ARIMA and Neural Network Model, Neurocomputing, 50 (2003), 159 - 175.

Time

Ly

nx N

um

be

r

1413121110987654321

3.75

3.50

3.25

3.00

2.75

2.50

Variable

ANN

GMDH

Data

ARIMA


[2] D.S.K. Karunasinghe and S.Y. Liong, Chaotic time series prediction with a global model: Artificial neural network, Journal of Hydrology, 323 (2006), 92 - 105.

[3] I. Rojas, O. Valenzuela, F. Rojas, A. Guillen, L. J. Herrera, H. Pomares, L. Marquez, M. Pasadas, Soft-computing techniques and ARMA model for time series prediction,

Neurocomputing, 71(4 – 6) (2008), 519 - 537.

[4] F. Camastra, and A. Colla, Mneural short-term rediction based on dynamics reconstruction, ACM, 9(1) (1999), 45-52.

[5] M. Han and M. Wang, Analysis and modeling of multivariate chaotic time series based on neural network, Expert Systems with Applications, 2(36) (2009), 1280-1290.

[6] A. Abraham and B. Nath, A neuro-fuzzy approach for modeling electricity demand in Victoria, Applied Soft Computing, 1(2) (2001), 127–138.

[7] K.A. Oliveira, I. Vannucci, E.C. Silva, Using artificial neural networks to forecast chaotic time series, Physica A: Statistical Mechanics and its Applications, 284(1-4) (2000), 393 -

404.

[8] A.G. Ivakheneko, Polynomial theory of complex system, IEEE Trans. Syst., Man Cybern. SMCI-1, 1 (1971), 364-378.

[9] H. Tamura, and T. Kondo, Heuristic free group method of data handling algorithm of generating optional partial polynomials with application to air pollution prediction.

International Journal of Systems Science, 11 (1980), 1095–1111.

[10] A.G. Ivakheneko and G.A. Ivakheneko, A review of problems solved by algorithms of the GMDH, Pattern Recognition and Image Analysis, 5(4) (1995), 527-535.

[11] M.S. Voss and X. Feng, A new methodology for emergent system identification using particle swarm optimization (PSO) and the group method data handling (GMDH). GECCO

2002, (2002), 1227-1232.

[12] A. Jain and A. Kumar, An evaluation of artificial neural network technique for the determination of infiltration model parameters, Applied Soft Computing, 6 (2006), 272–282.

[13] C.T. Su, L.I. Tong, C.M. Leou, Combination of time series and neural network for reliability forecasting modeling, Journal of Chinese Industrial Engineering, 14 (1997), 419

– 429.

[14] W. Wang, P.V. Gelder, J.K. Vrijling, Improving daily stream flow forecasts by combining ARMA and ANN models, International Conference on Innovation Advances and

Implementation of Flood Forecasting Technology, 2005.

[15] G.C. Onwubolu, Design of hybrid differential evolution and group method of data handling networks for modeling and prediction, Information Sciences, 178 (2008), 3616-3634.

[16] R. Samsudin, P. Saad, A. Shabri, A hybrid GMDH and least squares support vector machines in time series forecasting, Neural Network World, 21(3) (2011), 251-268.

[17] K.Y. Chen and C.H. Wang, A hybrid ARIMA and support vector machines in forecasting the production values of the machinery industry in Taiwan, Expert Systems with

Applications, 32 (2007), 254 - 264.

[18] S. BuHamra, N. Smaoui, M. Gabr, The Box–Jenkins analysis and neural networks, California, Wadsworth, 2003.


[19] C. Hamzaçebi, Improving artificial neural networks' performance in seasonal time series forecasting, Information Sciences, 178(23) (2008), 4550-4559.

[20] M. Khashei and M. Bijari, An artificial neural network (p, d, q) model for timeseries forecasting, Expert Systems with Applications, 37(1) (2010), 479-489.

[21] G.E.P. Box and G. Jenkins, Time Series Analysis. Forecasting and Control, Holden-Day, San Francisco, CA, 1970.

[22] K.K. Lai, L. Yu, S. Wang, W. Huang, Hybridizing Exponential Smoothing and Neural Network for Financial Time Series Predication. ICCS 2006, Part IV, LNCS 3994 (2006),

493-500.

[23] H.F. Zou, G.P. Xia, F.T. Yang, H.Y. Wang, An investigation and comparison of artificial neural network and time series models for chinese food grain price forecasting,

Neurocomputing, 70 (2007), 2913-2923.

[24] P.A.P. Moran, The statistical analysis of the sunspot and lynx cycles. Journal of Animal Ecology, 18 (1953), 115–116.

[25] Y. Katijani, W.K. Hipel, A.I. Mcleod, Forecasting nonlinear time series with feedforward neural networks: A case study of Canadian lynx data, Journal of Forecasting 24 (2005),

105–117.

[26] T. Subba Rao and M.M. Gabr, An Introduction to Bispectral Analysis and Bilinear Time Series Models, Springer, Berlin, 1984.

[27] S. Kang, An Investigation of the Use of Feed forward Neural Network for Forecasting. Ph.D. Thesis, Kent State University, 1991.

[28] F.S. Wong, Time series forecasting using back propagation neural network, Neurocomputing, 2 (1991), 147 - 159.

[29] R.P. Lippmann, An introduction to computing with neural nets. IEEE ASSP Magazine, April, 4-22, 1987.

Received: April 14, 2014
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V0M-3VJ3YPK-D&_user=167669&_coverDate=01%2F01%2F1999&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1356002637&_rerunOrigin=google&_acct=C000013278&_version=1&_urlVersion=0&_userid=167669&md5=27b36d5a0a49516ca15c7f659e81376d#bb16

Documents

A Hybrid GMDH and Box-Jenkins Models in Time Series … · 2014. 5. 22. · Box-Jenkins method 1. Introduction The most comprehensive of all popular and widely known statistical models