16
Research Article A CEEMDAN and XGBOOST-Based Approach to Forecast Crude Oil Prices Yingrui Zhou, 1 Taiyong Li , 1,2 Jiayi Shi, 1 and Zijie Qian 1 1 School of Economic Information Engineering, Southwestern University of Finance and Economics, Chengdu 611130, China 2 Sichuan Province Key Laboratory of Financial Intelligence and Financial Engineering, Southwestern University of Finance and Economics, Chengdu 611130, China Correspondence should be addressed to Taiyong Li; [email protected] Received 13 October 2018; Accepted 13 January 2019; Published 3 February 2019 Guest Editor: Marisol B. Correia Copyright © 2019 Yingrui Zhou et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Crude oil is one of the most important types of energy for the global economy, and hence it is very attractive to understand the movement of crude oil prices. However, the sequences of crude oil prices usually show some characteristics of nonstationarity and nonlinearity, making it very challenging for accurate forecasting crude oil prices. To cope with this issue, in this paper, we propose a novel approach that integrates complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and extreme gradient boosting (XGBOOST), so-called CEEMDAN-XGBOOST, for forecasting crude oil prices. Firstly, we use CEEMDAN to decompose the nonstationary and nonlinear sequences of crude oil prices into several intrinsic mode functions (IMFs) and one residue. Secondly, XGBOOST is used to predict each IMF and the residue individually. Finally, the corresponding prediction results of each IMF and the residue are aggregated as the final forecasting results. To demonstrate the performance of the proposed approach, we conduct extensive experiments on the West Texas Intermediate (WTI) crude oil prices. e experimental results show that the proposed CEEMDAN-XGBOOST outperforms some state-of-the-art models in terms of several evaluation metrics. 1. Introduction As one of the most important types of energy that power the global economy, crude oil has great impacts on every country, every enterprise, and even every person. erefore, it is a crucial task for the governors, investors, and researchers to forecast the crude oil prices accurately. However, the existing research has shown that crude oil prices are affected by many factors, such as supply and demand, interest rate, exchange rate, speculation activities, international and political events, climate, and so on [1, 2]. erefore, the movement of crude oil prices is irregular. For example, starting from about 11 USD/barrel in December 1998, the WTI crude oil prices grad- ually reached the peak of 145.31 USD/barrel in July 2008, and then the prices drastically declined to 30.28 USD/barrel in the next five months because of the subprime mortgage crisis. Aſter that, the prices climbed to more than 113 USD/barrel in April 2011, and, once again, they sharply dropped to about 26 USD/barrel in February 2016. e movement of the crude oil prices in the last decades has shown that the forecasting task is very challenging, due to the characteristics of high nonlinearity and nonstationarity of crude oil prices. Many scholars have devoted efforts to trying to forecast crude oil prices accurately. e most widely used approaches to forecasting crude oil prices can be roughly divided into two groups: statistical approaches and artificial intelligence (AI) approaches. Recently, Miao et al. have explored the factors of affecting crude oil prices based on the least absolute shrinkage and selection operator (LASSO) model [1]. Ye et al. proposed an approach integrating ratchet effect for linear prediction of crude oil prices [3]. Morana put forward a semiparametric generalized autoregressive conditional het- eroskedasticity (GARCH) model to predict crude oil prices at different lag periods even without the conditional average of historical crude oil prices [4]. Naser found that using the dynamic model averaging (DMA) with empirical evidence is better than linear models such as autoregressive (AR) model and its variants [5]. Gong and Lin proposed several new Hindawi Complexity Volume 2019, Article ID 4392785, 15 pages https://doi.org/10.1155/2019/4392785

A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

Research ArticleA CEEMDAN and XGBOOST-Based Approach toForecast Crude Oil Prices

Yingrui Zhou1 Taiyong Li 12 Jiayi Shi1 and Zijie Qian1

1School of Economic Information Engineering Southwestern University of Finance and Economics Chengdu 611130 China2Sichuan Province Key Laboratory of Financial Intelligence and Financial EngineeringSouthwestern University of Finance and Economics Chengdu 611130 China

Correspondence should be addressed to Taiyong Li litaiyonggmailcom

Received 13 October 2018 Accepted 13 January 2019 Published 3 February 2019

Guest Editor Marisol B Correia

Copyright copy 2019 Yingrui Zhou et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Crude oil is one of the most important types of energy for the global economy and hence it is very attractive to understand themovement of crude oil prices However the sequences of crude oil prices usually show some characteristics of nonstationarityand nonlinearity making it very challenging for accurate forecasting crude oil prices To cope with this issue in this paper wepropose a novel approach that integrates complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN)and extreme gradient boosting (XGBOOST) so-called CEEMDAN-XGBOOST for forecasting crude oil prices Firstly we useCEEMDAN to decompose the nonstationary and nonlinear sequences of crude oil prices into several intrinsic mode functions(IMFs) and one residue Secondly XGBOOST is used to predict each IMF and the residue individually Finally the correspondingprediction results of each IMF and the residue are aggregated as the final forecasting results To demonstrate the performance of theproposed approach we conduct extensive experiments on the West Texas Intermediate (WTI) crude oil prices The experimentalresults show that the proposed CEEMDAN-XGBOOST outperforms some state-of-the-art models in terms of several evaluationmetrics

1 Introduction

As one of the most important types of energy that power theglobal economy crude oil has great impacts on every countryevery enterprise and even every person Therefore it is acrucial task for the governors investors and researchers toforecast the crude oil prices accurately However the existingresearch has shown that crude oil prices are affected by manyfactors such as supply and demand interest rate exchangerate speculation activities international and political eventsclimate and so on [1 2] Therefore the movement of crudeoil prices is irregular For example starting from about 11USDbarrel inDecember 1998 theWTI crude oil prices grad-ually reached the peak of 14531 USDbarrel in July 2008 andthen the prices drastically declined to 3028USDbarrel in thenext five months because of the subprime mortgage crisisAfter that the prices climbed to more than 113 USDbarrelin April 2011 and once again they sharply dropped to about26 USDbarrel in February 2016The movement of the crude

oil prices in the last decades has shown that the forecastingtask is very challenging due to the characteristics of highnonlinearity and nonstationarity of crude oil prices

Many scholars have devoted efforts to trying to forecastcrude oil prices accurately The most widely used approachesto forecasting crude oil prices can be roughly divided intotwo groups statistical approaches and artificial intelligence(AI) approaches Recently Miao et al have explored thefactors of affecting crude oil prices based on the least absoluteshrinkage and selection operator (LASSO) model [1] Ye etal proposed an approach integrating ratchet effect for linearprediction of crude oil prices [3] Morana put forward asemiparametric generalized autoregressive conditional het-eroskedasticity (GARCH) model to predict crude oil pricesat different lag periods even without the conditional averageof historical crude oil prices [4] Naser found that using thedynamic model averaging (DMA) with empirical evidence isbetter than linear models such as autoregressive (AR) modeland its variants [5] Gong and Lin proposed several new

HindawiComplexityVolume 2019 Article ID 4392785 15 pageshttpsdoiorg10115520194392785

2 Complexity

heterogeneous autoregressive (HAR) models to forecast thegood and bad uncertainties in crude oil prices [6] Wen et alalso used HAR models with structural breaks to forecast thevolatility of crude oil futures [7]

Although the statistical approaches improve the accuracyof forecasting crude oil prices to some extent the assumptionof linearity of crude oil prices cannot be met accordingto some recent research and hence it limits the accuracyTherefore a variety of AI approaches have been proposedto capture the nonlinearity and nonstationarity of crude oilprices in the last decades [8ndash11] Chiroma et al reviewed theexisting research associated with forecasting crude oil pricesand found that AI methodologies are attracting unprece-dented interest from scholars in the domain of crude oil priceforecasting [8]Wang et al proposed anAI system frameworkthat integrated artificial neural networks (ANN) and rule-based expert system with text mining to forecast crude oilprices and it was shown that the proposed approach wassignificantly effective and practically feasible [9] Barunikand Malinska used neural networks to forecast the termstructure in crude oil futures prices [10] Most recentlyChen et al have studied forecasting crude oil prices usingdeep learning framework and have found that the randomwalk deep belief networks (RW-DBN) model outperformsthe long short term memory (LSTM) and the random walkLSTM (RW-LSTM) models in terms of forecasting accuracy[11] Other AI-methodologies such as genetic algorithm[12] compressive sensing [13] least square support vectorregression (LSSVR) [14] and cluster support vector machine(ClusterSVM) [15] were also applied to forecasting crudeoil prices Due to the extreme nonlinearity and nonstation-arity it is hard to achieve satisfactory results by forecastingthe original time series directly An ideal approach is todivide the tough task of forecasting original time series intoseveral subtasks and each of them forecasts a relativelysimpler subsequence And then the results of all subtasksare accumulated as the final result Based on this idea aldquodecomposition and ensemblerdquo framework was proposed andwidely applied to the analysis of time series such as energyforecasting [16 17] fault diagnosis [18ndash20] and biosignalanalysis [21ndash23] This framework consists of three stages Inthe first stage the original time series was decomposed intoseveral components Typical decomposition methodologiesinclude wavelet decomposition (WD) independent compo-nent analysis (ICA) [24] variational mode decomposition(VMD) [25] empirical mode decomposition (EMD) [226] and its extension (ensemble EMD (EEMD)) [27 28]and complementary EEMD (CEEMD) [29] In the secondstage some statistical or AI-based methodologies are appliedto forecast each decomposed component individually Intheory any regression methods can be used to forecast theresults of each component In the last stage the predictedresults from all components are aggregated as the finalresults Recently various researchers have devoted effortsto forecasting crude oil prices following the framework ofldquodecomposition and ensemblerdquo Fan et al put forward a novelapproach that integrates independent components analy-sis (ICA) and support vector regression (SVR) to forecastcrude oil prices and the experimental results verified the

effectiveness of the proposed approach [24] Yu et al usedEMD to decompose the sequences of the crude oil pricesinto several intrinsic mode functions (IMFs) at first andthen used a three-layer feed-forward neural network (FNN)model for predicting each IMF Finally the authors usedan adaptive linear neural network (ALNN) to combine allthe results of the IMFS as the final forecasting output [2]Yu et al also used EEMD and extended extreme learningmachine (EELM) to forecast crude oil prices following theframework of rdquodecomposition and ensemblerdquo The empiri-cal results demonstrated the effectiveness and efficiency ofthe proposed approach [28] Tang et al further proposedan improved approach integrating CEEMD and EELM forforecasting crude oil prices and the experimental resultsdemonstrated that the proposed approach outperformed allthe listed state-of-the-art benchmarks [29] Li et al usedEEMD to decompose raw crude oil prices into several com-ponents and then used kernel and nonkernel sparse Bayesianlearning (SBL) to forecast each component respectively [3031]

From the perspective of decomposition although EMDand EEMD are capable of improving the accuracy of fore-casting crude oil prices they still suffer ldquomode mixingrdquoand introducing new noise in the reconstructed signalsrespectively To overcome these shortcomings an extensionof EEMD so-called complete EEMD with adaptive noise(CEEMDAN) was proposed by Torres et al [32] Later theauthors put forward an improved version of CEEMDAN toobtain decomposed components with less noise and morephysical meaning [33] The CEEMDAN has succeeded inwind speed forecasting [34] electricity load forecasting [35]and fault diagnosis [36ndash38] Therefore CEEMDAN mayhave the potential to forecast crude oil prices As pointedout above any regression methods can be used to forecasteach decomposed component A recently proposed machinelearning algorithm extreme gradient boosting (XGBOOST)can be used for both classification and regression [39]The existing research has demonstrated the advantages ofXGBOOST in forecasting time series [40ndash42]

With the potential of CEEMDAN in decomposition andXGBOOST in regression in this paper we aim at proposinga novel approach that integrates CEEMDAN and XGBOOSTnamely CEEMDAN-XGBOOST to improve the accuracy offorecasting crude oil prices following the ldquodecompositionand ensemblerdquo framework Specifically we firstly decomposethe raw crude oil price series into several components withCEEMDAN And then for each component XGBOOSTis applied to building a specific model to forecast thecomponent Finally all the predicted results from everycomponent are aggregated as the final forecasting resultsThe main contributions of this paper are threefold (1) Wepropose a novel approach so-called CEEMDAN-XGBOOSTfor forecasting crude oil prices following the ldquodecompositionand ensemblerdquo framework (2) extensive experiments areconducted on the publicly-accessed West Texas Intermediate(WTI) to demonstrate the effectiveness of the proposedapproach in terms of several evaluation metrics (3) wefurther study the impacts of several parameter settings withthe proposed approach

Complexity 3

50

40

30

20

10

15

20

original sequencelocal meanupper envelopeslower envelopes

1 5 10Time

Am

plitu

de

(a) The line of the original sequence local mean upper envelopes and lowerenvelopes

10864

minus2minus4minus6minus8

minus10

2

2

1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2000

TimeAm

plitu

de

(b) The first IMF decomposed by EMD from the above original sequence

Figure 1 An illustration of EMD

The remainder of this paper is organized as followsSection 2 describes CEEMDAN and XGBOOST Section 3formulates the proposed CEEMDAN-XGBOOST approachin detail Experimental results are reported and analyzedin Section 4 We also discussed the impacts of parametersettings in this section Finally Section 5 concludes this paper

2 Preliminaries

21 EMD EEMD and CEEMDAN EMD was proposed byHuang et al in 1998 and it has been developed and appliedin many disciplines of science and engineering [26] The keyfeature of EMD is to decompose a nonlinear nonstationarysequence into intrinsic mode functions (IMFs) in the spiritof the Fourier series In contrast to the Fourier series theyare not simply sine or cosine functions but rather functionsthat represent the characteristics of the local oscillationfrequency of the original data These IMFs need to satisfy twoconditions (1) the number of local extrema and the numberof zero crossing must be equal or differ at most by one and (2)the curve of the ldquolocal meanrdquo is defined as zero

At first EMDfinds out the upper and the lower envelopeswhich are computed by finding the local extrema of the orig-inal sequence Then the local maxima (minima) are linkedby two cubic spines to construct the upper (lower) envelopesrespectively The mean of these envelopes is considered asthe ldquolocal meanrdquo Meanwhile the curve of this ldquolocal meanrdquois defined as the first residue and the difference betweenoriginal sequence and the ldquolocal meanrdquo is defined as the firstIMF An illustration of EMD is shown in Figure 1

After the first IMF is decomposed by EMD there is still aresidue (the local mean ie the yellow dot line in Figure 1(a))between the IMF and the original data Obviously extremaand high-frequency oscillations also exist in the residueAnd EMD decomposes the residue into another IMF andone residue If the variance of the new residue is not smallenough to satisfy the Cauchy criterion EMD will repeat to

decompose new residue into another IMF and a new residueFinally EMD decomposes original sequence into severalIMFs and one residue The difference between the IMF andthe residues is defined as

119903119896 [119905] = 119903119896minus1 [119905] minus 119868119872119865119896 [119905] 119896 = 2 119870 (1)

where 119903119896[119905] is the k-th residue at the time t and K is the totalnumber of IMFs and residues

Subsequently Huang et al thought that EMD could notextract the local features from the mixed features of theoriginal sequence completely One of the reasons for thisis the frequent appearance of the mode mixing The modemixing can be defined as the situation that similar piecesof oscillations exist at the same corresponding position indifferent IMFs which causes that a single IMF has lostits physical meanings What is more if one IMF has thisproblem the following IMFs cannot avoid it either To solvethis problem Wu and Huang extended EMD to a newversion namely EEMD that adds white noise to the originaltime series and performs EMDmany times [27] Given a timeseries and corresponding noise the new time series can beexpressed as

119909119894 [119905] = 119909 [119905] + 119908119894 [119905] (2)

where 119909[119905] stands for the original data and 119908119894[119905] is the i-thwhite noise (i=12 N and N is the times of performingEMD)

Then EEMD decomposes every 119909119894[119905] into 119868119872119865119894119896[119905] Inorder to get the real k-th IMF 119868119872119865119896 EEMD calculates theaverage of the 119868119872119865119894119896[119905] In theory because the mean of thewhite noise is zero the effect of the white noise would beeliminated by computing the average of 119868119872119865119894119896[119905] as shownin

119868119872119865119896 = 1119873119873sum119894=1

119868119872119865119894119896 [119905] (3)

4 Complexity

However Torres et al found that due to the limitednumber of 119909119894[119905] in empirical research EEMD could not com-pletely eliminate the influence of white noise in the end Forthis situation Torres et al put forward a new decompositiontechnology CEEMDAN on the basis of EEMD [32]

CEEMDAN decomposes the original sequence into thefirst IMF and residue which is the same as EMD ThenCEEMDAN gets the second IMF and residue as shown in

1198681198721198652 = 1119873119873sum119894=1

1198641 (1199031 [119905] + 12057611198641 (119908119894 [119905])) (4)

1199032 [119905] = 1199031 [119905] minus 1198681198721198652 (5)

where 1198641() stands for the first IMF decomposed from thesequence and 120576119894 is used to set the signal-to-noise ratio (SNR)at each stage

In the same way the k-th IMF and residue can becalculated as

119868119872119865119896 = 1119873119873sum119894=1

1198641 (119903119896minus1 [119905] + 120576119896minus1119864119896minus1 (119908119894 [119905])) (6)

119903119896 [119905] = 119903119896minus1 [119905] minus 119868119872119865119896 (7)

Finally CEEMDAN gets several IMFs and computes theresidue as shown in

R [119905] = x [119905] minus 119870sum119896=1

119868119872119865119896 (8)

The sequences decomposed by EMD EEMD and CEEM-DAN satisfy (8) Although CEEMDAN can solve the prob-lems that EEMD leaves it still has two limitations (1) theresidual noise that themodels contain and (2) the existence ofspurious modes Aiming at dealing with these issues Torreset al proposed a new algorithm to improve CEEMDAN [33]

Compared with the original CEEMDAN the improvedversion obtains the residues by calculating the local meansFor example in order to get the first residue shown in (9)it would compute the local means of N realizations 119909119894[119905] =119909[119905] + 12057601198641(119908119894[119905])(i=1 2 N)

1199031 [119905] = 1119873119873sum119894=1

119872(119909119894 [119905]) (9)

whereM() is the local mean of the sequenceThen it can get the first IMF shown in

1198681198721198651 = 119909 [119905] minus 1199031 [119905] (10)

For the k-th residue and IMF they can be computed as(11) and (12) respectively

119903119896 [119905] = 1119873119873sum119894=1

119872(119903119896minus1 [119905] + 120576119896minus1119864k (119908119894 [119905])) (11)

119868119872119865119896 = 119903119896minus1 [119905] minus 119903119896 [119905] (12)

The authors have demonstrated that the improvedCEEMDAN outperformed the original CEEMDAN in signaldecomposition [33] In what follows we will refer to theimproved version of CEEMDAN as CEEMDAN unlessotherwise statedWithCEEMDAN the original sequence canbe decomposed into several IMFs and one residue that is thetough task of forecasting the raw time series can be dividedinto forecasting several simpler subtasks

22 XGBOOST Boosting is the ensemble method that cancombine several weak learners into a strong learner as

119910119894 = 0 (119909119894) = 119870sum119896=1

119891119896 (119909119894) (13)

where 119891119896() is a weak learner and K is the number of weaklearners

When it comes to the tree boosting its learners aredecision trees which can be used for both regression andclassification

To a certain degree XGBOOST is considered as treeboost and its core is the Newton boosting instead of GradientBoosting which finds the optimal parameters by minimizingthe loss function (0) as shown in

119871 (0) = 119899sum119894=1

119897 (119910119894 119910119894) + 119870sum119896=1

Ω(119891119896) (14)

Ω(119891119896) = 120574119879 + 12120572 1205962 (15)

where Ω(119891119896) is the complexity of the k-th tree model n isthe sample size T is the number of leaf nodes of the decisiontrees 120596 is the weight of the leaf nodes 120574 controls the extentof complexity penalty for tree structure on T and 120572 controlsthe degree of the regularization of 119891119896

Since it is difficult for the tree ensemble model to mini-mize loss function in (14) and (15) with traditional methodsin Euclidean space the model uses the additive manner [43]It adds 119891119905 that improves the model and forms the new lossfunction as

119871119905 = 119899sum119894=1

119897 (119910119894 119910119894(119905minus1) + 119891119905 (119909119894)) + Ω (119891119905) (16)

where 119910119894(119905) is the prediction of the i-th instance at the t-thiteration and 119891119905 is the weaker learner at the t-th iteration

Then Newton boosting performs a Taylor second-orderexpansion on the loss function 119897(119910119894 119910119894(119905minus1) + 119891119905(119909119894)) to obtain119892119894119891119905(119909119894) + (12)ℎ1198941198912119905 (119909119894) because the second-order approxi-mation helps to minimize the loss function conveniently andquickly [43]The equations of 119892119894 ℎ119894 and the new loss functionare defined respectively as

119892119894 = 120597119897 (119910119894 119910119894(119905minus1))119910119894(119905minus1) (17)

Complexity 5

ℎ119894 = 1205972119897 (119910119894 119910119894(119905minus1))119910119894(119905minus1) (18)

119871 119905 = 119899sum119894=1

[119897 (119910119894 119910119894(119905minus1)) + 119892119894119891119905 (119909119894) + 12ℎ1198941198912119905 (119909119894)]+ Ω (119891119905)

(19)

Assume that the sample set 119868119895 in the leaf node j is definedas 119868119895=119894 | 119902(119909119894) = 119895 where q(119909119894) represents the tree structurefrom the root to the leaf node j in the decision tree (19) canbe transformed into the following formula as show in

(119905) = 119899sum119894=1

[119892119894119891119905 (119909119894) + 12ℎ1198941198912119905 (119909119894)] + 120574119879 + 12120572119879sum119895=1

1205962119895= 119879sum119895=1

[[(sum119894isin119868119895

119892119894)120596119895 + 12 (sum119894isin119868119895

ℎ119894 + 119886)1205962119895]] + 120574119879(20)

The formula for estimating the weight of each leaf in thedecision tree is formulated as

120596lowast119895 = minus sum119894isin119868119895 119892119894sum119894isin119868119895 ℎ119894 + 119886 (21)

According to (21) as for the tree structure q the lossfunction at the leaf node j can be changed as

(119905) (119902) = minus12119879sum119895=1

(sum119894isin119868119895 119892119894)2sum119894isin119868119895 ℎ119894 + 119886 + 120574119879 (22)

Therefore the equation of the information gain afterbranching can be defined as

119866119886119894119899= 12 [[[

(sum119894isin119868119895119871 119892119894)2sum119894isin119868119895119871 ℎ119894 + 119886 + (sum119894isin119868119895119877 119892119894)2sum119894isin119868119895119877 ℎ119894 + 119886 minus (sum119894isin119868119895 119892119894)2sum119894isin119868119895 ℎ119894 + 119886]]]minus 120574(23)

where 119868119895119871 and 119868119895119877 are the sample sets of the left and right leafnode respectively after splitting the leaf node j

XGBOOST branches each leaf node and constructs basiclearners by the criterion of maximizing the information gain

With the help of Newton boosting the XGBOOSTcan deal with missing values by adaptively learning To acertain extent XGBOOST is based on the multiple additiveregression tree (MART) but it can get better tree structure bylearning with Newton boosting In addition XGBOOST canalso subsample among columns which reduces the relevanceof each weak learner [39]

3 The Proposed CEEMDAN-XGBOOSTApproach

From the existing literature we can see that CEEMDAN hasadvantages in time series decomposition while XGBOOST

does well in regression Therefore in this paper we inte-grated these two methods and proposed a novel approachso-called CEEMDAN-XGBOOST for forecasting crude oilprices The proposed CEEMDAN-XGBOOST includes threestages decomposition individual forecasting and ensembleIn the first stage CEEMDAN is used to decompose the rawseries of crude oil prices into k+1 components includingk IMFs and one residue Among the components someshow high-frequency characteristics while the others showlow-frequency ones of the raw series In the second stagefor each component a forecasting model is built usingXGBOOST and then the built model is applied to forecasteach component and then get an individual result Finally allthe results from the components are aggregated as the finalresult Although there exist a lot of methods to aggregatethe forecasting results from components in the proposedapproach we use the simplest way ie addition to sum-marize the results of all components The flowchart of theCEEMDAN-XGBOOST is shown in Figure 2

From Figure 2 it can be seen that the proposedCEEMDAN-XGBOOST based on the framework of ldquodecom-position and ensemblerdquo is also a typical strategy of ldquodivideand conquerrdquo that is the tough task of forecasting crude oilprices from the raw series is divided into several subtasks offorecasting from simpler components Since the raw seriesis extremely nonlinear and nonstationary while each decom-posed component has a relatively simple form for forecastingthe CEEMDAN-XGBOOST has the ability to achieve higheraccuracy of forecasting crude oil prices In short the advan-tages of the proposed CEEMDAN-XGBOOST are threefold(1) the challenging task of forecasting crude oil prices isdecomposed into several relatively simple subtasks (2) forforecasting each component XGBOOST can build modelswith different parameters according to the characteristics ofthe component and (3) a simple operation addition is usedto aggregate the results from subtasks as the final result

4 Experiments and Analysis

41 Data Description To demonstrate the performance ofthe proposed CEEMDAN-XGBOOST we use the crudeoil prices from the West Texas Intermediate (WTI) asexperimental data (the data can be downloaded fromhttpswwweiagovdnavpethistRWTCDhtm) We usethe daily closing prices covering the period from January 21986 to March 19 2018 with 8123 observations in total forempirical studies Among the observations the first 6498ones from January 2 1986 to September 21 2011 accountingfor 80 of the total observations are used as trainingsamples while the remaining 20 ones are for testing Theoriginal crude oil prices are shown in Figure 3

We perform multi-step-ahead forecasting in this paperFor a given time series 119909119905 (119905 = 1 2 119879) the m-step-aheadforecasting for 119909119905+119898 can be formulated as119909119905+119898 = 119891 (119909119905minus(119897minus1) 119909119905minus(119897minus2) 119909119905minus1 119909119905) (24)where 119909119905+119898 is the m-step-ahead predicted result at time t f isthe forecasting model 119909119894 is the true value at time i and l isthe lag order

6 Complexity

X[n]

CEEMDAN

IMF1 IMF2 IMF3 IMFk R[n]

XGBoost XGBoost XGBoost XGBoost

Decomposition

Individual forecasting

Ensemble

XGBoost

x[n]

R[n]IMF1IMF2

IMF3IMFk

k

i=1IMFi +

R[n]sum

Figure 2 The flowchart of the proposed CEEMDAN-XGBOOST

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 2018

Date

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

Pric

es (U

SDb

arre

l)

WTI crude oil prices

Figure 3 The original crude oil prices of WTI

For SVR and FNN we normalize each decomposed com-ponent before building the model to forecast the componentindividually In detail the normalization process can bedefined as

1199091015840119905 = 119909119905 minus 120583120590 (25)

where 1199091015840119905 is the normalized series of crude oil prices series 119909119905is the data before normalization 120583 is the mean of 119909119905 and 120590 isthe standard deviation of 119909119905 Meanwhile since normalizationis not necessary for XGBOOST and ARIMA for models withthese two algorithms we build forecasting models from eachof the decomposed components directly

Complexity 7

42 Evaluation Criteria When we evaluate the accuracy ofthe models we focus on not only the numerical accuracybut also the accuracy of forecasting direction Therefore weselect the root-mean-square error (RMSE) and the meanabsolute error (MAE) to evaluate the numerical accuracy ofthe models Besides we use directional statistic (Dstat) as thecriterion for evaluating the accuracy of forecasting directionThe RMSE MAE and Dstat are defined as

RMSE = radic 1119873119873sum119894=1

(119910119905 minus 119910119905)2 (26)

MAE = 1119873 ( 119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816) (27)

Dstat = sum119873119894=2 119889119894119873 minus 1 119889119894 =

1 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) gt 00 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) lt 0

(28)

where 119910119905 is the actual crude oil prices at the time t119910119905 is theprediction and N is the size of the test set

In addition we take the Wilcoxon signed rank test(WSRT) for proving the significant differences between theforecasts of the selectedmodels [44]TheWSRT is a nonpara-metric statistical hypothesis test that can be used to evaluatewhether the population mean ranks of two predictions fromdifferent models on the same sample differ Meanwhile it isa paired difference test which can be used as an alternative tothe paired Student t-test The null hypothesis of the WSRTis whether the median of the loss differential series d(t) =g(119890119886(t)) minus g(119890119887(t)) is equal to zero or not where 119890119886(t) and 119890119887(t)are the error series of model a and model b respectively andg() is a loss function If the p value of pairs of models is below005 the test rejects the null hypothesis (there is a significantdifference between the forecasts of this pair of models) underthe confidence level of 95 In this way we can prove thatthere is a significant difference between the optimal modeland the others

However the criteria defined above are global If somesingular points exist the optimal model chosen by thesecriteria may not be the best one Thus we make the modelconfidence set (MCS) [31 45] in order to choose the optimalmodel convincingly

In order to calculate the p-value of the statistics accuratelythe MCS performs bootstrap on the prediction series whichcan soften the impact of the singular points For the j-thmodel suppose that the size of a bootstrapped sample isT and the t-th bootstrapped sample has the loss functionsdefined as

119871119895119905 = 1119879ℎ+119879sum119905=ℎ+1

lfloor119910119905 minus 119910119905rfloor (29)

Suppose that a set M0=mi i = 1 2 3 n thatcontains n models for any two models j and k the relativevalues of the loss between these two models can be defined as

119889119895119896119905 = 119871119895119905 minus 119871119896119905 (30)

According to the above definitions the set of superiormodels can be defined as

119872lowast equiv 119898119895 isin M0 119864 (119889119895119896119905) le 0 forall119898119896 isin M0 (31)

where E() represents the average valueThe MCS repeatedly performs the significant test in

M0 At each time the worst prediction model in the set iseliminated In the test the hypothesis is the null hypothesisof equal predictive ability (EPA) defined as

1198670 119864 (119889119895119896119905) = 0 forall119898119895 119898119896 isin 119872 sub M0 (32)

The MCS mainly depends on the equivalence test andelimination criteria The specific process is as follows

Step 1 AssumingM=1198720 at the level of significance 120572 use theequivalence test to test the null hypothesis1198670119872Step 2 If it accepts the null hypothesis and then it defines119872lowast1minus120572 = 119872 otherwise it eliminates the model which rejectsthe null hypothesis from M according to the eliminationcriteria The elimination will not stop until there are not anymodels that reject the null hypothesis in the setM In the endthe models in119872lowast1minus120572 are thought as surviving models

Meanwhile the MCS has two kinds of statistics that canbe defined as

119879119877 = max119895119896isin119872

1003816100381610038161003816100381611990511989511989610038161003816100381610038161003816 (33)

119879119878119876 = max119895119896isinM

1199052119895119896 (34)

119905119895119896 = 119889119895119896radicvar (119889119895119896) (35)

119889119895119896 = 1119879ℎ+119879sum119905=ℎ+1

119889119895119896119905 (36)

where T119877 and 119879119878119876 stand for the range statistics and thesemiquadratic statistic respectively and both statistics arebased on the t-statistics as shown in (35)-(36) These twostatistics (T119877 and 119879119878119876) are mainly to remove the model whosep-value is less than the significance level 120572 When the p-valueis greater than the significance level120572 themodels can surviveThe larger the p-value the more accurate the prediction ofthe modelWhen the p-value is equal to 1 it indicates that themodel is the optimal forecasting model

43 Parameter Settings To test the performance ofXGBOOST and CEEMDAN-XGBOOST we conducttwo groups of experiments single models that forecast crude

8 Complexity

Table 1 The ranges of the parameters for XGBOOST by grid search

Parameter Description rangeBooster Booster to use lsquogblinearrsquo lsquogbtreersquoN estimators Number of boosted trees 100200300400500Max depth Maximum tree depth for base learners 345678Min child weight Maximum delta step we allow each treersquos weight estimation to be 123456Gamma Minimum loss reduction required to make a further partition on a leaf node of the tree 001 005010203 Subsample Subsample ratio of the training instance 060708091Colsample Subsample ratio of columns when constructing each tree 060708091Reg alpha L1 regularization term on weights 00100501Reg lambda L2 regularization term on weights 00100501Learning rate Boosting learning rate 0010050070102

oil prices with original sequence and ensemble models thatforecast crude oil prices based on the ldquodecomposition andensemblerdquo framework

For single models we compare XGBOOST with onestatistical model ARIMA and two widely used AI-modelsSVR and FNN Since the existing research has shownthat EEMD significantly outperforms EMD in forecastingcrude oil prices [24 31] in the experiments we onlycompare CEEMDAN with EEMD Therefore we com-pare the proposed CEEMDAN-XGBOOST with EEMD-SVR EEMD-FNN EEMD-XGBOOST CEEMDAN-SVRand CEEMDAN-FNN

For ARIMA we use the Akaike information criterion(AIC) [46] to select the parameters (p-d-q) For SVR weuse RBF as kernel function and use grid search to opti-mize C and gamma in the ranges of 2012345678 and2minus9minus8minus7minus6minus5minus4minus3minus2minus10 respectively We use one hiddenlayer with 20 nodes for FNNWe use a grid search to optimizethe parameters for XGBOOST the search ranges of theoptimized parameters are shown in Table 1

We set 002 and 005 as the standard deviation of theadded white noise and set 250 and 500 as the numberof realizations of EEMD and CEEMDAN respectively Thedecomposition results of the original crude oil prices byEEMD and CEEMDAN are shown in Figures 4 and 5respectively

It can be seen from Figure 4 that among the componentsdecomposed byEEMD the first six IMFs show characteristicsof high frequency while the remaining six components showcharacteristics of low frequency However regarding thecomponents by CEEMDAN the first seven ones show clearhigh-frequency and the last four show low-frequency asshown in Figure 5

The experiments were conducted with Python 27 andMATLAB 86 on a 64-bit Windows 7 with 34 GHz I7 CPUand 32 GB memory Specifically we run FNN and MCSwith MATLAB and for the remaining work we use PythonRegarding XGBoost we used a widely used Python pack-age (httpsxgboostreadthedocsioenlatestpython) in theexperiments

Table 2The RMSE MAE and Dstat by single models with horizon= 1 3 and 6

Horizon Model RMSE MAE Dstat

1

XGBOOST 12640 09481 04827SVR 12899 09651 04826FNN 13439 09994 04837

ARIMA 12692 09520 04883

3

XGBOOST 20963 16159 04839SVR 22444 17258 05080FNN 21503 16512 04837

ARIMA 21056 16177 04901

6

XGBOOST 29269 22945 05158SVR 31048 24308 05183FNN 30803 24008 05028

ARIMA 29320 22912 05151

44 Experimental Results In this subsection we use a fixedvalue 6 as the lag order andwe forecast crude oil prices with 1-step-ahead 3-step-ahead and 6-step-ahead forecasting thatis to say the horizons for these three forecasting tasks are 1 3and 6 respectively

441 Experimental Results of Single Models For single mod-els we compare XGBOOST with state-of-the-art SVR FNNand ARIMA and the results are shown in Table 2

It can be seen from Table 2 that XGBOOST outperformsother models in terms of RMSE and MAE with horizons 1and 3 For horizon 6 XGBOOST achieves the best RMSE andthe second best MAE which is slightly worse than that byARIMA For horizon 1 FNNachieves theworst results amongthe four models however for horizons 3 and 6 SVR achievesthe worst results Regarding Dstat none of the models canalways outperform others and the best result of Dstat isachieved by SVR with horizon 6 It can be found that theRMSE and MAE values gradually increase with the increaseof horizon However the Dstat values do not show suchdiscipline All the values of Dstat are around 05 ie from

Complexity 9

minus505

IMF1

minus4minus2

024

IMF2

minus505

IMF3

minus505

IMF4

minus4minus2

024

IMF5

minus505

IMF6

minus200

20

IMF7

minus100

1020

IMF8

minus100

10

IMF9

minus20minus10

010

IMF1

0

minus200

20

IMF1

1

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 4 The IMFs and residue of WTI crude oil prices by EEMD

04826 to 05183 which are very similar to the results ofrandom guesses showing that it is very difficult to accuratelyforecast the direction of the movement of crude oil priceswith raw crude oil prices directly

To further verify the advantages of XGBOOST over othermodels we report the results by WSRT and MCS as shownin Tables 3 and 4 respectively As for WSRT the p-valuebetween XGBOOST and other models except ARIMA isbelow 005 which means that there is a significant differenceamong the forecasting results of XGBOOST SVR and FNNin population mean ranks Besides the results of MCS showthat the p-value of119879119877 and119879119878119876 of XGBOOST is always equal to1000 and prove that XGBOOST is the optimal model among

Table 3Wilcoxon signed rank test between XGBOOST SVR FNNand ARIMA

XGBOOST SVR FNN ARIMAXGBOOST 1 40378e-06 22539e-35 57146e-01SVR 40378e-06 1 46786e-33 07006FNN 22539e-35 46786e-33 1 69095e-02ARIMA 57146e-01 07006 69095e-02 1

all the models in terms of global errors and most local errorsof different samples obtained by bootstrap methods in MCSAccording to MCS the p-values of 119879119877 and 119879119878119876 of SVR onthe horizon of 3 are greater than 02 so SVR becomes the

10 Complexity

minus505

IMF1

minus505

IMF2

minus505

IMF3

minus505

IMF4

minus505

IMF5

minus100

10

IMF6

minus100

10

IMF7

minus20

0

20

IMF8

minus200

20

IMF9

minus200

20

IMF1

0

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 5 The IMFs and residue of WTI crude oil prices by CEEMDAN

Table 4 MCS between XGBOOST SVR FNN and ARIMA

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876XGBOOST 10000 10000 10000 10000 10000 10000SVR 00004 00004 04132 04132 00200 00200FNN 00002 00002 00248 00538 00016 00022ARIMA 00000 00000 00000 00000 00000 00000

survival and the second best model on this horizon Whenit comes to ARIMA ARIMA almost performs as good asXGBOOST in terms of evaluation criteria of global errorsbut it does not pass the MCS It indicates that ARIMA doesnot perform better than other models in most local errors ofdifferent samples

442 Experimental Results of Ensemble Models With EEMDor CEEMDAN the results of forecasting the crude oil pricesby XGBOOST SVR and FNN with horizons 1 3 and 6 areshown in Table 5

It can be seen from Table 5 that the RMSE and MAEvalues of the CEEMDAN-XGBOOST are the lowest ones

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 2: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

2 Complexity

heterogeneous autoregressive (HAR) models to forecast thegood and bad uncertainties in crude oil prices [6] Wen et alalso used HAR models with structural breaks to forecast thevolatility of crude oil futures [7]

Although the statistical approaches improve the accuracyof forecasting crude oil prices to some extent the assumptionof linearity of crude oil prices cannot be met accordingto some recent research and hence it limits the accuracyTherefore a variety of AI approaches have been proposedto capture the nonlinearity and nonstationarity of crude oilprices in the last decades [8ndash11] Chiroma et al reviewed theexisting research associated with forecasting crude oil pricesand found that AI methodologies are attracting unprece-dented interest from scholars in the domain of crude oil priceforecasting [8]Wang et al proposed anAI system frameworkthat integrated artificial neural networks (ANN) and rule-based expert system with text mining to forecast crude oilprices and it was shown that the proposed approach wassignificantly effective and practically feasible [9] Barunikand Malinska used neural networks to forecast the termstructure in crude oil futures prices [10] Most recentlyChen et al have studied forecasting crude oil prices usingdeep learning framework and have found that the randomwalk deep belief networks (RW-DBN) model outperformsthe long short term memory (LSTM) and the random walkLSTM (RW-LSTM) models in terms of forecasting accuracy[11] Other AI-methodologies such as genetic algorithm[12] compressive sensing [13] least square support vectorregression (LSSVR) [14] and cluster support vector machine(ClusterSVM) [15] were also applied to forecasting crudeoil prices Due to the extreme nonlinearity and nonstation-arity it is hard to achieve satisfactory results by forecastingthe original time series directly An ideal approach is todivide the tough task of forecasting original time series intoseveral subtasks and each of them forecasts a relativelysimpler subsequence And then the results of all subtasksare accumulated as the final result Based on this idea aldquodecomposition and ensemblerdquo framework was proposed andwidely applied to the analysis of time series such as energyforecasting [16 17] fault diagnosis [18ndash20] and biosignalanalysis [21ndash23] This framework consists of three stages Inthe first stage the original time series was decomposed intoseveral components Typical decomposition methodologiesinclude wavelet decomposition (WD) independent compo-nent analysis (ICA) [24] variational mode decomposition(VMD) [25] empirical mode decomposition (EMD) [226] and its extension (ensemble EMD (EEMD)) [27 28]and complementary EEMD (CEEMD) [29] In the secondstage some statistical or AI-based methodologies are appliedto forecast each decomposed component individually Intheory any regression methods can be used to forecast theresults of each component In the last stage the predictedresults from all components are aggregated as the finalresults Recently various researchers have devoted effortsto forecasting crude oil prices following the framework ofldquodecomposition and ensemblerdquo Fan et al put forward a novelapproach that integrates independent components analy-sis (ICA) and support vector regression (SVR) to forecastcrude oil prices and the experimental results verified the

effectiveness of the proposed approach [24] Yu et al usedEMD to decompose the sequences of the crude oil pricesinto several intrinsic mode functions (IMFs) at first andthen used a three-layer feed-forward neural network (FNN)model for predicting each IMF Finally the authors usedan adaptive linear neural network (ALNN) to combine allthe results of the IMFS as the final forecasting output [2]Yu et al also used EEMD and extended extreme learningmachine (EELM) to forecast crude oil prices following theframework of rdquodecomposition and ensemblerdquo The empiri-cal results demonstrated the effectiveness and efficiency ofthe proposed approach [28] Tang et al further proposedan improved approach integrating CEEMD and EELM forforecasting crude oil prices and the experimental resultsdemonstrated that the proposed approach outperformed allthe listed state-of-the-art benchmarks [29] Li et al usedEEMD to decompose raw crude oil prices into several com-ponents and then used kernel and nonkernel sparse Bayesianlearning (SBL) to forecast each component respectively [3031]

From the perspective of decomposition although EMDand EEMD are capable of improving the accuracy of fore-casting crude oil prices they still suffer ldquomode mixingrdquoand introducing new noise in the reconstructed signalsrespectively To overcome these shortcomings an extensionof EEMD so-called complete EEMD with adaptive noise(CEEMDAN) was proposed by Torres et al [32] Later theauthors put forward an improved version of CEEMDAN toobtain decomposed components with less noise and morephysical meaning [33] The CEEMDAN has succeeded inwind speed forecasting [34] electricity load forecasting [35]and fault diagnosis [36ndash38] Therefore CEEMDAN mayhave the potential to forecast crude oil prices As pointedout above any regression methods can be used to forecasteach decomposed component A recently proposed machinelearning algorithm extreme gradient boosting (XGBOOST)can be used for both classification and regression [39]The existing research has demonstrated the advantages ofXGBOOST in forecasting time series [40ndash42]

With the potential of CEEMDAN in decomposition andXGBOOST in regression in this paper we aim at proposinga novel approach that integrates CEEMDAN and XGBOOSTnamely CEEMDAN-XGBOOST to improve the accuracy offorecasting crude oil prices following the ldquodecompositionand ensemblerdquo framework Specifically we firstly decomposethe raw crude oil price series into several components withCEEMDAN And then for each component XGBOOSTis applied to building a specific model to forecast thecomponent Finally all the predicted results from everycomponent are aggregated as the final forecasting resultsThe main contributions of this paper are threefold (1) Wepropose a novel approach so-called CEEMDAN-XGBOOSTfor forecasting crude oil prices following the ldquodecompositionand ensemblerdquo framework (2) extensive experiments areconducted on the publicly-accessed West Texas Intermediate(WTI) to demonstrate the effectiveness of the proposedapproach in terms of several evaluation metrics (3) wefurther study the impacts of several parameter settings withthe proposed approach

Complexity 3

50

40

30

20

10

15

20

original sequencelocal meanupper envelopeslower envelopes

1 5 10Time

Am

plitu

de

(a) The line of the original sequence local mean upper envelopes and lowerenvelopes

10864

minus2minus4minus6minus8

minus10

2

2

1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2000

TimeAm

plitu

de

(b) The first IMF decomposed by EMD from the above original sequence

Figure 1 An illustration of EMD

The remainder of this paper is organized as followsSection 2 describes CEEMDAN and XGBOOST Section 3formulates the proposed CEEMDAN-XGBOOST approachin detail Experimental results are reported and analyzedin Section 4 We also discussed the impacts of parametersettings in this section Finally Section 5 concludes this paper

2 Preliminaries

21 EMD EEMD and CEEMDAN EMD was proposed byHuang et al in 1998 and it has been developed and appliedin many disciplines of science and engineering [26] The keyfeature of EMD is to decompose a nonlinear nonstationarysequence into intrinsic mode functions (IMFs) in the spiritof the Fourier series In contrast to the Fourier series theyare not simply sine or cosine functions but rather functionsthat represent the characteristics of the local oscillationfrequency of the original data These IMFs need to satisfy twoconditions (1) the number of local extrema and the numberof zero crossing must be equal or differ at most by one and (2)the curve of the ldquolocal meanrdquo is defined as zero

At first EMDfinds out the upper and the lower envelopeswhich are computed by finding the local extrema of the orig-inal sequence Then the local maxima (minima) are linkedby two cubic spines to construct the upper (lower) envelopesrespectively The mean of these envelopes is considered asthe ldquolocal meanrdquo Meanwhile the curve of this ldquolocal meanrdquois defined as the first residue and the difference betweenoriginal sequence and the ldquolocal meanrdquo is defined as the firstIMF An illustration of EMD is shown in Figure 1

After the first IMF is decomposed by EMD there is still aresidue (the local mean ie the yellow dot line in Figure 1(a))between the IMF and the original data Obviously extremaand high-frequency oscillations also exist in the residueAnd EMD decomposes the residue into another IMF andone residue If the variance of the new residue is not smallenough to satisfy the Cauchy criterion EMD will repeat to

decompose new residue into another IMF and a new residueFinally EMD decomposes original sequence into severalIMFs and one residue The difference between the IMF andthe residues is defined as

119903119896 [119905] = 119903119896minus1 [119905] minus 119868119872119865119896 [119905] 119896 = 2 119870 (1)

where 119903119896[119905] is the k-th residue at the time t and K is the totalnumber of IMFs and residues

Subsequently Huang et al thought that EMD could notextract the local features from the mixed features of theoriginal sequence completely One of the reasons for thisis the frequent appearance of the mode mixing The modemixing can be defined as the situation that similar piecesof oscillations exist at the same corresponding position indifferent IMFs which causes that a single IMF has lostits physical meanings What is more if one IMF has thisproblem the following IMFs cannot avoid it either To solvethis problem Wu and Huang extended EMD to a newversion namely EEMD that adds white noise to the originaltime series and performs EMDmany times [27] Given a timeseries and corresponding noise the new time series can beexpressed as

119909119894 [119905] = 119909 [119905] + 119908119894 [119905] (2)

where 119909[119905] stands for the original data and 119908119894[119905] is the i-thwhite noise (i=12 N and N is the times of performingEMD)

Then EEMD decomposes every 119909119894[119905] into 119868119872119865119894119896[119905] Inorder to get the real k-th IMF 119868119872119865119896 EEMD calculates theaverage of the 119868119872119865119894119896[119905] In theory because the mean of thewhite noise is zero the effect of the white noise would beeliminated by computing the average of 119868119872119865119894119896[119905] as shownin

119868119872119865119896 = 1119873119873sum119894=1

119868119872119865119894119896 [119905] (3)

4 Complexity

However Torres et al found that due to the limitednumber of 119909119894[119905] in empirical research EEMD could not com-pletely eliminate the influence of white noise in the end Forthis situation Torres et al put forward a new decompositiontechnology CEEMDAN on the basis of EEMD [32]

CEEMDAN decomposes the original sequence into thefirst IMF and residue which is the same as EMD ThenCEEMDAN gets the second IMF and residue as shown in

1198681198721198652 = 1119873119873sum119894=1

1198641 (1199031 [119905] + 12057611198641 (119908119894 [119905])) (4)

1199032 [119905] = 1199031 [119905] minus 1198681198721198652 (5)

where 1198641() stands for the first IMF decomposed from thesequence and 120576119894 is used to set the signal-to-noise ratio (SNR)at each stage

In the same way the k-th IMF and residue can becalculated as

119868119872119865119896 = 1119873119873sum119894=1

1198641 (119903119896minus1 [119905] + 120576119896minus1119864119896minus1 (119908119894 [119905])) (6)

119903119896 [119905] = 119903119896minus1 [119905] minus 119868119872119865119896 (7)

Finally CEEMDAN gets several IMFs and computes theresidue as shown in

R [119905] = x [119905] minus 119870sum119896=1

119868119872119865119896 (8)

The sequences decomposed by EMD EEMD and CEEM-DAN satisfy (8) Although CEEMDAN can solve the prob-lems that EEMD leaves it still has two limitations (1) theresidual noise that themodels contain and (2) the existence ofspurious modes Aiming at dealing with these issues Torreset al proposed a new algorithm to improve CEEMDAN [33]

Compared with the original CEEMDAN the improvedversion obtains the residues by calculating the local meansFor example in order to get the first residue shown in (9)it would compute the local means of N realizations 119909119894[119905] =119909[119905] + 12057601198641(119908119894[119905])(i=1 2 N)

1199031 [119905] = 1119873119873sum119894=1

119872(119909119894 [119905]) (9)

whereM() is the local mean of the sequenceThen it can get the first IMF shown in

1198681198721198651 = 119909 [119905] minus 1199031 [119905] (10)

For the k-th residue and IMF they can be computed as(11) and (12) respectively

119903119896 [119905] = 1119873119873sum119894=1

119872(119903119896minus1 [119905] + 120576119896minus1119864k (119908119894 [119905])) (11)

119868119872119865119896 = 119903119896minus1 [119905] minus 119903119896 [119905] (12)

The authors have demonstrated that the improvedCEEMDAN outperformed the original CEEMDAN in signaldecomposition [33] In what follows we will refer to theimproved version of CEEMDAN as CEEMDAN unlessotherwise statedWithCEEMDAN the original sequence canbe decomposed into several IMFs and one residue that is thetough task of forecasting the raw time series can be dividedinto forecasting several simpler subtasks

22 XGBOOST Boosting is the ensemble method that cancombine several weak learners into a strong learner as

119910119894 = 0 (119909119894) = 119870sum119896=1

119891119896 (119909119894) (13)

where 119891119896() is a weak learner and K is the number of weaklearners

When it comes to the tree boosting its learners aredecision trees which can be used for both regression andclassification

To a certain degree XGBOOST is considered as treeboost and its core is the Newton boosting instead of GradientBoosting which finds the optimal parameters by minimizingthe loss function (0) as shown in

119871 (0) = 119899sum119894=1

119897 (119910119894 119910119894) + 119870sum119896=1

Ω(119891119896) (14)

Ω(119891119896) = 120574119879 + 12120572 1205962 (15)

where Ω(119891119896) is the complexity of the k-th tree model n isthe sample size T is the number of leaf nodes of the decisiontrees 120596 is the weight of the leaf nodes 120574 controls the extentof complexity penalty for tree structure on T and 120572 controlsthe degree of the regularization of 119891119896

Since it is difficult for the tree ensemble model to mini-mize loss function in (14) and (15) with traditional methodsin Euclidean space the model uses the additive manner [43]It adds 119891119905 that improves the model and forms the new lossfunction as

119871119905 = 119899sum119894=1

119897 (119910119894 119910119894(119905minus1) + 119891119905 (119909119894)) + Ω (119891119905) (16)

where 119910119894(119905) is the prediction of the i-th instance at the t-thiteration and 119891119905 is the weaker learner at the t-th iteration

Then Newton boosting performs a Taylor second-orderexpansion on the loss function 119897(119910119894 119910119894(119905minus1) + 119891119905(119909119894)) to obtain119892119894119891119905(119909119894) + (12)ℎ1198941198912119905 (119909119894) because the second-order approxi-mation helps to minimize the loss function conveniently andquickly [43]The equations of 119892119894 ℎ119894 and the new loss functionare defined respectively as

119892119894 = 120597119897 (119910119894 119910119894(119905minus1))119910119894(119905minus1) (17)

Complexity 5

ℎ119894 = 1205972119897 (119910119894 119910119894(119905minus1))119910119894(119905minus1) (18)

119871 119905 = 119899sum119894=1

[119897 (119910119894 119910119894(119905minus1)) + 119892119894119891119905 (119909119894) + 12ℎ1198941198912119905 (119909119894)]+ Ω (119891119905)

(19)

Assume that the sample set 119868119895 in the leaf node j is definedas 119868119895=119894 | 119902(119909119894) = 119895 where q(119909119894) represents the tree structurefrom the root to the leaf node j in the decision tree (19) canbe transformed into the following formula as show in

(119905) = 119899sum119894=1

[119892119894119891119905 (119909119894) + 12ℎ1198941198912119905 (119909119894)] + 120574119879 + 12120572119879sum119895=1

1205962119895= 119879sum119895=1

[[(sum119894isin119868119895

119892119894)120596119895 + 12 (sum119894isin119868119895

ℎ119894 + 119886)1205962119895]] + 120574119879(20)

The formula for estimating the weight of each leaf in thedecision tree is formulated as

120596lowast119895 = minus sum119894isin119868119895 119892119894sum119894isin119868119895 ℎ119894 + 119886 (21)

According to (21) as for the tree structure q the lossfunction at the leaf node j can be changed as

(119905) (119902) = minus12119879sum119895=1

(sum119894isin119868119895 119892119894)2sum119894isin119868119895 ℎ119894 + 119886 + 120574119879 (22)

Therefore the equation of the information gain afterbranching can be defined as

119866119886119894119899= 12 [[[

(sum119894isin119868119895119871 119892119894)2sum119894isin119868119895119871 ℎ119894 + 119886 + (sum119894isin119868119895119877 119892119894)2sum119894isin119868119895119877 ℎ119894 + 119886 minus (sum119894isin119868119895 119892119894)2sum119894isin119868119895 ℎ119894 + 119886]]]minus 120574(23)

where 119868119895119871 and 119868119895119877 are the sample sets of the left and right leafnode respectively after splitting the leaf node j

XGBOOST branches each leaf node and constructs basiclearners by the criterion of maximizing the information gain

With the help of Newton boosting the XGBOOSTcan deal with missing values by adaptively learning To acertain extent XGBOOST is based on the multiple additiveregression tree (MART) but it can get better tree structure bylearning with Newton boosting In addition XGBOOST canalso subsample among columns which reduces the relevanceof each weak learner [39]

3 The Proposed CEEMDAN-XGBOOSTApproach

From the existing literature we can see that CEEMDAN hasadvantages in time series decomposition while XGBOOST

does well in regression Therefore in this paper we inte-grated these two methods and proposed a novel approachso-called CEEMDAN-XGBOOST for forecasting crude oilprices The proposed CEEMDAN-XGBOOST includes threestages decomposition individual forecasting and ensembleIn the first stage CEEMDAN is used to decompose the rawseries of crude oil prices into k+1 components includingk IMFs and one residue Among the components someshow high-frequency characteristics while the others showlow-frequency ones of the raw series In the second stagefor each component a forecasting model is built usingXGBOOST and then the built model is applied to forecasteach component and then get an individual result Finally allthe results from the components are aggregated as the finalresult Although there exist a lot of methods to aggregatethe forecasting results from components in the proposedapproach we use the simplest way ie addition to sum-marize the results of all components The flowchart of theCEEMDAN-XGBOOST is shown in Figure 2

From Figure 2 it can be seen that the proposedCEEMDAN-XGBOOST based on the framework of ldquodecom-position and ensemblerdquo is also a typical strategy of ldquodivideand conquerrdquo that is the tough task of forecasting crude oilprices from the raw series is divided into several subtasks offorecasting from simpler components Since the raw seriesis extremely nonlinear and nonstationary while each decom-posed component has a relatively simple form for forecastingthe CEEMDAN-XGBOOST has the ability to achieve higheraccuracy of forecasting crude oil prices In short the advan-tages of the proposed CEEMDAN-XGBOOST are threefold(1) the challenging task of forecasting crude oil prices isdecomposed into several relatively simple subtasks (2) forforecasting each component XGBOOST can build modelswith different parameters according to the characteristics ofthe component and (3) a simple operation addition is usedto aggregate the results from subtasks as the final result

4 Experiments and Analysis

41 Data Description To demonstrate the performance ofthe proposed CEEMDAN-XGBOOST we use the crudeoil prices from the West Texas Intermediate (WTI) asexperimental data (the data can be downloaded fromhttpswwweiagovdnavpethistRWTCDhtm) We usethe daily closing prices covering the period from January 21986 to March 19 2018 with 8123 observations in total forempirical studies Among the observations the first 6498ones from January 2 1986 to September 21 2011 accountingfor 80 of the total observations are used as trainingsamples while the remaining 20 ones are for testing Theoriginal crude oil prices are shown in Figure 3

We perform multi-step-ahead forecasting in this paperFor a given time series 119909119905 (119905 = 1 2 119879) the m-step-aheadforecasting for 119909119905+119898 can be formulated as119909119905+119898 = 119891 (119909119905minus(119897minus1) 119909119905minus(119897minus2) 119909119905minus1 119909119905) (24)where 119909119905+119898 is the m-step-ahead predicted result at time t f isthe forecasting model 119909119894 is the true value at time i and l isthe lag order

6 Complexity

X[n]

CEEMDAN

IMF1 IMF2 IMF3 IMFk R[n]

XGBoost XGBoost XGBoost XGBoost

Decomposition

Individual forecasting

Ensemble

XGBoost

x[n]

R[n]IMF1IMF2

IMF3IMFk

k

i=1IMFi +

R[n]sum

Figure 2 The flowchart of the proposed CEEMDAN-XGBOOST

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 2018

Date

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

Pric

es (U

SDb

arre

l)

WTI crude oil prices

Figure 3 The original crude oil prices of WTI

For SVR and FNN we normalize each decomposed com-ponent before building the model to forecast the componentindividually In detail the normalization process can bedefined as

1199091015840119905 = 119909119905 minus 120583120590 (25)

where 1199091015840119905 is the normalized series of crude oil prices series 119909119905is the data before normalization 120583 is the mean of 119909119905 and 120590 isthe standard deviation of 119909119905 Meanwhile since normalizationis not necessary for XGBOOST and ARIMA for models withthese two algorithms we build forecasting models from eachof the decomposed components directly

Complexity 7

42 Evaluation Criteria When we evaluate the accuracy ofthe models we focus on not only the numerical accuracybut also the accuracy of forecasting direction Therefore weselect the root-mean-square error (RMSE) and the meanabsolute error (MAE) to evaluate the numerical accuracy ofthe models Besides we use directional statistic (Dstat) as thecriterion for evaluating the accuracy of forecasting directionThe RMSE MAE and Dstat are defined as

RMSE = radic 1119873119873sum119894=1

(119910119905 minus 119910119905)2 (26)

MAE = 1119873 ( 119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816) (27)

Dstat = sum119873119894=2 119889119894119873 minus 1 119889119894 =

1 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) gt 00 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) lt 0

(28)

where 119910119905 is the actual crude oil prices at the time t119910119905 is theprediction and N is the size of the test set

In addition we take the Wilcoxon signed rank test(WSRT) for proving the significant differences between theforecasts of the selectedmodels [44]TheWSRT is a nonpara-metric statistical hypothesis test that can be used to evaluatewhether the population mean ranks of two predictions fromdifferent models on the same sample differ Meanwhile it isa paired difference test which can be used as an alternative tothe paired Student t-test The null hypothesis of the WSRTis whether the median of the loss differential series d(t) =g(119890119886(t)) minus g(119890119887(t)) is equal to zero or not where 119890119886(t) and 119890119887(t)are the error series of model a and model b respectively andg() is a loss function If the p value of pairs of models is below005 the test rejects the null hypothesis (there is a significantdifference between the forecasts of this pair of models) underthe confidence level of 95 In this way we can prove thatthere is a significant difference between the optimal modeland the others

However the criteria defined above are global If somesingular points exist the optimal model chosen by thesecriteria may not be the best one Thus we make the modelconfidence set (MCS) [31 45] in order to choose the optimalmodel convincingly

In order to calculate the p-value of the statistics accuratelythe MCS performs bootstrap on the prediction series whichcan soften the impact of the singular points For the j-thmodel suppose that the size of a bootstrapped sample isT and the t-th bootstrapped sample has the loss functionsdefined as

119871119895119905 = 1119879ℎ+119879sum119905=ℎ+1

lfloor119910119905 minus 119910119905rfloor (29)

Suppose that a set M0=mi i = 1 2 3 n thatcontains n models for any two models j and k the relativevalues of the loss between these two models can be defined as

119889119895119896119905 = 119871119895119905 minus 119871119896119905 (30)

According to the above definitions the set of superiormodels can be defined as

119872lowast equiv 119898119895 isin M0 119864 (119889119895119896119905) le 0 forall119898119896 isin M0 (31)

where E() represents the average valueThe MCS repeatedly performs the significant test in

M0 At each time the worst prediction model in the set iseliminated In the test the hypothesis is the null hypothesisof equal predictive ability (EPA) defined as

1198670 119864 (119889119895119896119905) = 0 forall119898119895 119898119896 isin 119872 sub M0 (32)

The MCS mainly depends on the equivalence test andelimination criteria The specific process is as follows

Step 1 AssumingM=1198720 at the level of significance 120572 use theequivalence test to test the null hypothesis1198670119872Step 2 If it accepts the null hypothesis and then it defines119872lowast1minus120572 = 119872 otherwise it eliminates the model which rejectsthe null hypothesis from M according to the eliminationcriteria The elimination will not stop until there are not anymodels that reject the null hypothesis in the setM In the endthe models in119872lowast1minus120572 are thought as surviving models

Meanwhile the MCS has two kinds of statistics that canbe defined as

119879119877 = max119895119896isin119872

1003816100381610038161003816100381611990511989511989610038161003816100381610038161003816 (33)

119879119878119876 = max119895119896isinM

1199052119895119896 (34)

119905119895119896 = 119889119895119896radicvar (119889119895119896) (35)

119889119895119896 = 1119879ℎ+119879sum119905=ℎ+1

119889119895119896119905 (36)

where T119877 and 119879119878119876 stand for the range statistics and thesemiquadratic statistic respectively and both statistics arebased on the t-statistics as shown in (35)-(36) These twostatistics (T119877 and 119879119878119876) are mainly to remove the model whosep-value is less than the significance level 120572 When the p-valueis greater than the significance level120572 themodels can surviveThe larger the p-value the more accurate the prediction ofthe modelWhen the p-value is equal to 1 it indicates that themodel is the optimal forecasting model

43 Parameter Settings To test the performance ofXGBOOST and CEEMDAN-XGBOOST we conducttwo groups of experiments single models that forecast crude

8 Complexity

Table 1 The ranges of the parameters for XGBOOST by grid search

Parameter Description rangeBooster Booster to use lsquogblinearrsquo lsquogbtreersquoN estimators Number of boosted trees 100200300400500Max depth Maximum tree depth for base learners 345678Min child weight Maximum delta step we allow each treersquos weight estimation to be 123456Gamma Minimum loss reduction required to make a further partition on a leaf node of the tree 001 005010203 Subsample Subsample ratio of the training instance 060708091Colsample Subsample ratio of columns when constructing each tree 060708091Reg alpha L1 regularization term on weights 00100501Reg lambda L2 regularization term on weights 00100501Learning rate Boosting learning rate 0010050070102

oil prices with original sequence and ensemble models thatforecast crude oil prices based on the ldquodecomposition andensemblerdquo framework

For single models we compare XGBOOST with onestatistical model ARIMA and two widely used AI-modelsSVR and FNN Since the existing research has shownthat EEMD significantly outperforms EMD in forecastingcrude oil prices [24 31] in the experiments we onlycompare CEEMDAN with EEMD Therefore we com-pare the proposed CEEMDAN-XGBOOST with EEMD-SVR EEMD-FNN EEMD-XGBOOST CEEMDAN-SVRand CEEMDAN-FNN

For ARIMA we use the Akaike information criterion(AIC) [46] to select the parameters (p-d-q) For SVR weuse RBF as kernel function and use grid search to opti-mize C and gamma in the ranges of 2012345678 and2minus9minus8minus7minus6minus5minus4minus3minus2minus10 respectively We use one hiddenlayer with 20 nodes for FNNWe use a grid search to optimizethe parameters for XGBOOST the search ranges of theoptimized parameters are shown in Table 1

We set 002 and 005 as the standard deviation of theadded white noise and set 250 and 500 as the numberof realizations of EEMD and CEEMDAN respectively Thedecomposition results of the original crude oil prices byEEMD and CEEMDAN are shown in Figures 4 and 5respectively

It can be seen from Figure 4 that among the componentsdecomposed byEEMD the first six IMFs show characteristicsof high frequency while the remaining six components showcharacteristics of low frequency However regarding thecomponents by CEEMDAN the first seven ones show clearhigh-frequency and the last four show low-frequency asshown in Figure 5

The experiments were conducted with Python 27 andMATLAB 86 on a 64-bit Windows 7 with 34 GHz I7 CPUand 32 GB memory Specifically we run FNN and MCSwith MATLAB and for the remaining work we use PythonRegarding XGBoost we used a widely used Python pack-age (httpsxgboostreadthedocsioenlatestpython) in theexperiments

Table 2The RMSE MAE and Dstat by single models with horizon= 1 3 and 6

Horizon Model RMSE MAE Dstat

1

XGBOOST 12640 09481 04827SVR 12899 09651 04826FNN 13439 09994 04837

ARIMA 12692 09520 04883

3

XGBOOST 20963 16159 04839SVR 22444 17258 05080FNN 21503 16512 04837

ARIMA 21056 16177 04901

6

XGBOOST 29269 22945 05158SVR 31048 24308 05183FNN 30803 24008 05028

ARIMA 29320 22912 05151

44 Experimental Results In this subsection we use a fixedvalue 6 as the lag order andwe forecast crude oil prices with 1-step-ahead 3-step-ahead and 6-step-ahead forecasting thatis to say the horizons for these three forecasting tasks are 1 3and 6 respectively

441 Experimental Results of Single Models For single mod-els we compare XGBOOST with state-of-the-art SVR FNNand ARIMA and the results are shown in Table 2

It can be seen from Table 2 that XGBOOST outperformsother models in terms of RMSE and MAE with horizons 1and 3 For horizon 6 XGBOOST achieves the best RMSE andthe second best MAE which is slightly worse than that byARIMA For horizon 1 FNNachieves theworst results amongthe four models however for horizons 3 and 6 SVR achievesthe worst results Regarding Dstat none of the models canalways outperform others and the best result of Dstat isachieved by SVR with horizon 6 It can be found that theRMSE and MAE values gradually increase with the increaseof horizon However the Dstat values do not show suchdiscipline All the values of Dstat are around 05 ie from

Complexity 9

minus505

IMF1

minus4minus2

024

IMF2

minus505

IMF3

minus505

IMF4

minus4minus2

024

IMF5

minus505

IMF6

minus200

20

IMF7

minus100

1020

IMF8

minus100

10

IMF9

minus20minus10

010

IMF1

0

minus200

20

IMF1

1

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 4 The IMFs and residue of WTI crude oil prices by EEMD

04826 to 05183 which are very similar to the results ofrandom guesses showing that it is very difficult to accuratelyforecast the direction of the movement of crude oil priceswith raw crude oil prices directly

To further verify the advantages of XGBOOST over othermodels we report the results by WSRT and MCS as shownin Tables 3 and 4 respectively As for WSRT the p-valuebetween XGBOOST and other models except ARIMA isbelow 005 which means that there is a significant differenceamong the forecasting results of XGBOOST SVR and FNNin population mean ranks Besides the results of MCS showthat the p-value of119879119877 and119879119878119876 of XGBOOST is always equal to1000 and prove that XGBOOST is the optimal model among

Table 3Wilcoxon signed rank test between XGBOOST SVR FNNand ARIMA

XGBOOST SVR FNN ARIMAXGBOOST 1 40378e-06 22539e-35 57146e-01SVR 40378e-06 1 46786e-33 07006FNN 22539e-35 46786e-33 1 69095e-02ARIMA 57146e-01 07006 69095e-02 1

all the models in terms of global errors and most local errorsof different samples obtained by bootstrap methods in MCSAccording to MCS the p-values of 119879119877 and 119879119878119876 of SVR onthe horizon of 3 are greater than 02 so SVR becomes the

10 Complexity

minus505

IMF1

minus505

IMF2

minus505

IMF3

minus505

IMF4

minus505

IMF5

minus100

10

IMF6

minus100

10

IMF7

minus20

0

20

IMF8

minus200

20

IMF9

minus200

20

IMF1

0

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 5 The IMFs and residue of WTI crude oil prices by CEEMDAN

Table 4 MCS between XGBOOST SVR FNN and ARIMA

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876XGBOOST 10000 10000 10000 10000 10000 10000SVR 00004 00004 04132 04132 00200 00200FNN 00002 00002 00248 00538 00016 00022ARIMA 00000 00000 00000 00000 00000 00000

survival and the second best model on this horizon Whenit comes to ARIMA ARIMA almost performs as good asXGBOOST in terms of evaluation criteria of global errorsbut it does not pass the MCS It indicates that ARIMA doesnot perform better than other models in most local errors ofdifferent samples

442 Experimental Results of Ensemble Models With EEMDor CEEMDAN the results of forecasting the crude oil pricesby XGBOOST SVR and FNN with horizons 1 3 and 6 areshown in Table 5

It can be seen from Table 5 that the RMSE and MAEvalues of the CEEMDAN-XGBOOST are the lowest ones

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 3: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

Complexity 3

50

40

30

20

10

15

20

original sequencelocal meanupper envelopeslower envelopes

1 5 10Time

Am

plitu

de

(a) The line of the original sequence local mean upper envelopes and lowerenvelopes

10864

minus2minus4minus6minus8

minus10

2

2

1 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2000

TimeAm

plitu

de

(b) The first IMF decomposed by EMD from the above original sequence

Figure 1 An illustration of EMD

The remainder of this paper is organized as followsSection 2 describes CEEMDAN and XGBOOST Section 3formulates the proposed CEEMDAN-XGBOOST approachin detail Experimental results are reported and analyzedin Section 4 We also discussed the impacts of parametersettings in this section Finally Section 5 concludes this paper

2 Preliminaries

21 EMD EEMD and CEEMDAN EMD was proposed byHuang et al in 1998 and it has been developed and appliedin many disciplines of science and engineering [26] The keyfeature of EMD is to decompose a nonlinear nonstationarysequence into intrinsic mode functions (IMFs) in the spiritof the Fourier series In contrast to the Fourier series theyare not simply sine or cosine functions but rather functionsthat represent the characteristics of the local oscillationfrequency of the original data These IMFs need to satisfy twoconditions (1) the number of local extrema and the numberof zero crossing must be equal or differ at most by one and (2)the curve of the ldquolocal meanrdquo is defined as zero

At first EMDfinds out the upper and the lower envelopeswhich are computed by finding the local extrema of the orig-inal sequence Then the local maxima (minima) are linkedby two cubic spines to construct the upper (lower) envelopesrespectively The mean of these envelopes is considered asthe ldquolocal meanrdquo Meanwhile the curve of this ldquolocal meanrdquois defined as the first residue and the difference betweenoriginal sequence and the ldquolocal meanrdquo is defined as the firstIMF An illustration of EMD is shown in Figure 1

After the first IMF is decomposed by EMD there is still aresidue (the local mean ie the yellow dot line in Figure 1(a))between the IMF and the original data Obviously extremaand high-frequency oscillations also exist in the residueAnd EMD decomposes the residue into another IMF andone residue If the variance of the new residue is not smallenough to satisfy the Cauchy criterion EMD will repeat to

decompose new residue into another IMF and a new residueFinally EMD decomposes original sequence into severalIMFs and one residue The difference between the IMF andthe residues is defined as

119903119896 [119905] = 119903119896minus1 [119905] minus 119868119872119865119896 [119905] 119896 = 2 119870 (1)

where 119903119896[119905] is the k-th residue at the time t and K is the totalnumber of IMFs and residues

Subsequently Huang et al thought that EMD could notextract the local features from the mixed features of theoriginal sequence completely One of the reasons for thisis the frequent appearance of the mode mixing The modemixing can be defined as the situation that similar piecesof oscillations exist at the same corresponding position indifferent IMFs which causes that a single IMF has lostits physical meanings What is more if one IMF has thisproblem the following IMFs cannot avoid it either To solvethis problem Wu and Huang extended EMD to a newversion namely EEMD that adds white noise to the originaltime series and performs EMDmany times [27] Given a timeseries and corresponding noise the new time series can beexpressed as

119909119894 [119905] = 119909 [119905] + 119908119894 [119905] (2)

where 119909[119905] stands for the original data and 119908119894[119905] is the i-thwhite noise (i=12 N and N is the times of performingEMD)

Then EEMD decomposes every 119909119894[119905] into 119868119872119865119894119896[119905] Inorder to get the real k-th IMF 119868119872119865119896 EEMD calculates theaverage of the 119868119872119865119894119896[119905] In theory because the mean of thewhite noise is zero the effect of the white noise would beeliminated by computing the average of 119868119872119865119894119896[119905] as shownin

119868119872119865119896 = 1119873119873sum119894=1

119868119872119865119894119896 [119905] (3)

4 Complexity

However Torres et al found that due to the limitednumber of 119909119894[119905] in empirical research EEMD could not com-pletely eliminate the influence of white noise in the end Forthis situation Torres et al put forward a new decompositiontechnology CEEMDAN on the basis of EEMD [32]

CEEMDAN decomposes the original sequence into thefirst IMF and residue which is the same as EMD ThenCEEMDAN gets the second IMF and residue as shown in

1198681198721198652 = 1119873119873sum119894=1

1198641 (1199031 [119905] + 12057611198641 (119908119894 [119905])) (4)

1199032 [119905] = 1199031 [119905] minus 1198681198721198652 (5)

where 1198641() stands for the first IMF decomposed from thesequence and 120576119894 is used to set the signal-to-noise ratio (SNR)at each stage

In the same way the k-th IMF and residue can becalculated as

119868119872119865119896 = 1119873119873sum119894=1

1198641 (119903119896minus1 [119905] + 120576119896minus1119864119896minus1 (119908119894 [119905])) (6)

119903119896 [119905] = 119903119896minus1 [119905] minus 119868119872119865119896 (7)

Finally CEEMDAN gets several IMFs and computes theresidue as shown in

R [119905] = x [119905] minus 119870sum119896=1

119868119872119865119896 (8)

The sequences decomposed by EMD EEMD and CEEM-DAN satisfy (8) Although CEEMDAN can solve the prob-lems that EEMD leaves it still has two limitations (1) theresidual noise that themodels contain and (2) the existence ofspurious modes Aiming at dealing with these issues Torreset al proposed a new algorithm to improve CEEMDAN [33]

Compared with the original CEEMDAN the improvedversion obtains the residues by calculating the local meansFor example in order to get the first residue shown in (9)it would compute the local means of N realizations 119909119894[119905] =119909[119905] + 12057601198641(119908119894[119905])(i=1 2 N)

1199031 [119905] = 1119873119873sum119894=1

119872(119909119894 [119905]) (9)

whereM() is the local mean of the sequenceThen it can get the first IMF shown in

1198681198721198651 = 119909 [119905] minus 1199031 [119905] (10)

For the k-th residue and IMF they can be computed as(11) and (12) respectively

119903119896 [119905] = 1119873119873sum119894=1

119872(119903119896minus1 [119905] + 120576119896minus1119864k (119908119894 [119905])) (11)

119868119872119865119896 = 119903119896minus1 [119905] minus 119903119896 [119905] (12)

The authors have demonstrated that the improvedCEEMDAN outperformed the original CEEMDAN in signaldecomposition [33] In what follows we will refer to theimproved version of CEEMDAN as CEEMDAN unlessotherwise statedWithCEEMDAN the original sequence canbe decomposed into several IMFs and one residue that is thetough task of forecasting the raw time series can be dividedinto forecasting several simpler subtasks

22 XGBOOST Boosting is the ensemble method that cancombine several weak learners into a strong learner as

119910119894 = 0 (119909119894) = 119870sum119896=1

119891119896 (119909119894) (13)

where 119891119896() is a weak learner and K is the number of weaklearners

When it comes to the tree boosting its learners aredecision trees which can be used for both regression andclassification

To a certain degree XGBOOST is considered as treeboost and its core is the Newton boosting instead of GradientBoosting which finds the optimal parameters by minimizingthe loss function (0) as shown in

119871 (0) = 119899sum119894=1

119897 (119910119894 119910119894) + 119870sum119896=1

Ω(119891119896) (14)

Ω(119891119896) = 120574119879 + 12120572 1205962 (15)

where Ω(119891119896) is the complexity of the k-th tree model n isthe sample size T is the number of leaf nodes of the decisiontrees 120596 is the weight of the leaf nodes 120574 controls the extentof complexity penalty for tree structure on T and 120572 controlsthe degree of the regularization of 119891119896

Since it is difficult for the tree ensemble model to mini-mize loss function in (14) and (15) with traditional methodsin Euclidean space the model uses the additive manner [43]It adds 119891119905 that improves the model and forms the new lossfunction as

119871119905 = 119899sum119894=1

119897 (119910119894 119910119894(119905minus1) + 119891119905 (119909119894)) + Ω (119891119905) (16)

where 119910119894(119905) is the prediction of the i-th instance at the t-thiteration and 119891119905 is the weaker learner at the t-th iteration

Then Newton boosting performs a Taylor second-orderexpansion on the loss function 119897(119910119894 119910119894(119905minus1) + 119891119905(119909119894)) to obtain119892119894119891119905(119909119894) + (12)ℎ1198941198912119905 (119909119894) because the second-order approxi-mation helps to minimize the loss function conveniently andquickly [43]The equations of 119892119894 ℎ119894 and the new loss functionare defined respectively as

119892119894 = 120597119897 (119910119894 119910119894(119905minus1))119910119894(119905minus1) (17)

Complexity 5

ℎ119894 = 1205972119897 (119910119894 119910119894(119905minus1))119910119894(119905minus1) (18)

119871 119905 = 119899sum119894=1

[119897 (119910119894 119910119894(119905minus1)) + 119892119894119891119905 (119909119894) + 12ℎ1198941198912119905 (119909119894)]+ Ω (119891119905)

(19)

Assume that the sample set 119868119895 in the leaf node j is definedas 119868119895=119894 | 119902(119909119894) = 119895 where q(119909119894) represents the tree structurefrom the root to the leaf node j in the decision tree (19) canbe transformed into the following formula as show in

(119905) = 119899sum119894=1

[119892119894119891119905 (119909119894) + 12ℎ1198941198912119905 (119909119894)] + 120574119879 + 12120572119879sum119895=1

1205962119895= 119879sum119895=1

[[(sum119894isin119868119895

119892119894)120596119895 + 12 (sum119894isin119868119895

ℎ119894 + 119886)1205962119895]] + 120574119879(20)

The formula for estimating the weight of each leaf in thedecision tree is formulated as

120596lowast119895 = minus sum119894isin119868119895 119892119894sum119894isin119868119895 ℎ119894 + 119886 (21)

According to (21) as for the tree structure q the lossfunction at the leaf node j can be changed as

(119905) (119902) = minus12119879sum119895=1

(sum119894isin119868119895 119892119894)2sum119894isin119868119895 ℎ119894 + 119886 + 120574119879 (22)

Therefore the equation of the information gain afterbranching can be defined as

119866119886119894119899= 12 [[[

(sum119894isin119868119895119871 119892119894)2sum119894isin119868119895119871 ℎ119894 + 119886 + (sum119894isin119868119895119877 119892119894)2sum119894isin119868119895119877 ℎ119894 + 119886 minus (sum119894isin119868119895 119892119894)2sum119894isin119868119895 ℎ119894 + 119886]]]minus 120574(23)

where 119868119895119871 and 119868119895119877 are the sample sets of the left and right leafnode respectively after splitting the leaf node j

XGBOOST branches each leaf node and constructs basiclearners by the criterion of maximizing the information gain

With the help of Newton boosting the XGBOOSTcan deal with missing values by adaptively learning To acertain extent XGBOOST is based on the multiple additiveregression tree (MART) but it can get better tree structure bylearning with Newton boosting In addition XGBOOST canalso subsample among columns which reduces the relevanceof each weak learner [39]

3 The Proposed CEEMDAN-XGBOOSTApproach

From the existing literature we can see that CEEMDAN hasadvantages in time series decomposition while XGBOOST

does well in regression Therefore in this paper we inte-grated these two methods and proposed a novel approachso-called CEEMDAN-XGBOOST for forecasting crude oilprices The proposed CEEMDAN-XGBOOST includes threestages decomposition individual forecasting and ensembleIn the first stage CEEMDAN is used to decompose the rawseries of crude oil prices into k+1 components includingk IMFs and one residue Among the components someshow high-frequency characteristics while the others showlow-frequency ones of the raw series In the second stagefor each component a forecasting model is built usingXGBOOST and then the built model is applied to forecasteach component and then get an individual result Finally allthe results from the components are aggregated as the finalresult Although there exist a lot of methods to aggregatethe forecasting results from components in the proposedapproach we use the simplest way ie addition to sum-marize the results of all components The flowchart of theCEEMDAN-XGBOOST is shown in Figure 2

From Figure 2 it can be seen that the proposedCEEMDAN-XGBOOST based on the framework of ldquodecom-position and ensemblerdquo is also a typical strategy of ldquodivideand conquerrdquo that is the tough task of forecasting crude oilprices from the raw series is divided into several subtasks offorecasting from simpler components Since the raw seriesis extremely nonlinear and nonstationary while each decom-posed component has a relatively simple form for forecastingthe CEEMDAN-XGBOOST has the ability to achieve higheraccuracy of forecasting crude oil prices In short the advan-tages of the proposed CEEMDAN-XGBOOST are threefold(1) the challenging task of forecasting crude oil prices isdecomposed into several relatively simple subtasks (2) forforecasting each component XGBOOST can build modelswith different parameters according to the characteristics ofthe component and (3) a simple operation addition is usedto aggregate the results from subtasks as the final result

4 Experiments and Analysis

41 Data Description To demonstrate the performance ofthe proposed CEEMDAN-XGBOOST we use the crudeoil prices from the West Texas Intermediate (WTI) asexperimental data (the data can be downloaded fromhttpswwweiagovdnavpethistRWTCDhtm) We usethe daily closing prices covering the period from January 21986 to March 19 2018 with 8123 observations in total forempirical studies Among the observations the first 6498ones from January 2 1986 to September 21 2011 accountingfor 80 of the total observations are used as trainingsamples while the remaining 20 ones are for testing Theoriginal crude oil prices are shown in Figure 3

We perform multi-step-ahead forecasting in this paperFor a given time series 119909119905 (119905 = 1 2 119879) the m-step-aheadforecasting for 119909119905+119898 can be formulated as119909119905+119898 = 119891 (119909119905minus(119897minus1) 119909119905minus(119897minus2) 119909119905minus1 119909119905) (24)where 119909119905+119898 is the m-step-ahead predicted result at time t f isthe forecasting model 119909119894 is the true value at time i and l isthe lag order

6 Complexity

X[n]

CEEMDAN

IMF1 IMF2 IMF3 IMFk R[n]

XGBoost XGBoost XGBoost XGBoost

Decomposition

Individual forecasting

Ensemble

XGBoost

x[n]

R[n]IMF1IMF2

IMF3IMFk

k

i=1IMFi +

R[n]sum

Figure 2 The flowchart of the proposed CEEMDAN-XGBOOST

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 2018

Date

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

Pric

es (U

SDb

arre

l)

WTI crude oil prices

Figure 3 The original crude oil prices of WTI

For SVR and FNN we normalize each decomposed com-ponent before building the model to forecast the componentindividually In detail the normalization process can bedefined as

1199091015840119905 = 119909119905 minus 120583120590 (25)

where 1199091015840119905 is the normalized series of crude oil prices series 119909119905is the data before normalization 120583 is the mean of 119909119905 and 120590 isthe standard deviation of 119909119905 Meanwhile since normalizationis not necessary for XGBOOST and ARIMA for models withthese two algorithms we build forecasting models from eachof the decomposed components directly

Complexity 7

42 Evaluation Criteria When we evaluate the accuracy ofthe models we focus on not only the numerical accuracybut also the accuracy of forecasting direction Therefore weselect the root-mean-square error (RMSE) and the meanabsolute error (MAE) to evaluate the numerical accuracy ofthe models Besides we use directional statistic (Dstat) as thecriterion for evaluating the accuracy of forecasting directionThe RMSE MAE and Dstat are defined as

RMSE = radic 1119873119873sum119894=1

(119910119905 minus 119910119905)2 (26)

MAE = 1119873 ( 119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816) (27)

Dstat = sum119873119894=2 119889119894119873 minus 1 119889119894 =

1 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) gt 00 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) lt 0

(28)

where 119910119905 is the actual crude oil prices at the time t119910119905 is theprediction and N is the size of the test set

In addition we take the Wilcoxon signed rank test(WSRT) for proving the significant differences between theforecasts of the selectedmodels [44]TheWSRT is a nonpara-metric statistical hypothesis test that can be used to evaluatewhether the population mean ranks of two predictions fromdifferent models on the same sample differ Meanwhile it isa paired difference test which can be used as an alternative tothe paired Student t-test The null hypothesis of the WSRTis whether the median of the loss differential series d(t) =g(119890119886(t)) minus g(119890119887(t)) is equal to zero or not where 119890119886(t) and 119890119887(t)are the error series of model a and model b respectively andg() is a loss function If the p value of pairs of models is below005 the test rejects the null hypothesis (there is a significantdifference between the forecasts of this pair of models) underthe confidence level of 95 In this way we can prove thatthere is a significant difference between the optimal modeland the others

However the criteria defined above are global If somesingular points exist the optimal model chosen by thesecriteria may not be the best one Thus we make the modelconfidence set (MCS) [31 45] in order to choose the optimalmodel convincingly

In order to calculate the p-value of the statistics accuratelythe MCS performs bootstrap on the prediction series whichcan soften the impact of the singular points For the j-thmodel suppose that the size of a bootstrapped sample isT and the t-th bootstrapped sample has the loss functionsdefined as

119871119895119905 = 1119879ℎ+119879sum119905=ℎ+1

lfloor119910119905 minus 119910119905rfloor (29)

Suppose that a set M0=mi i = 1 2 3 n thatcontains n models for any two models j and k the relativevalues of the loss between these two models can be defined as

119889119895119896119905 = 119871119895119905 minus 119871119896119905 (30)

According to the above definitions the set of superiormodels can be defined as

119872lowast equiv 119898119895 isin M0 119864 (119889119895119896119905) le 0 forall119898119896 isin M0 (31)

where E() represents the average valueThe MCS repeatedly performs the significant test in

M0 At each time the worst prediction model in the set iseliminated In the test the hypothesis is the null hypothesisof equal predictive ability (EPA) defined as

1198670 119864 (119889119895119896119905) = 0 forall119898119895 119898119896 isin 119872 sub M0 (32)

The MCS mainly depends on the equivalence test andelimination criteria The specific process is as follows

Step 1 AssumingM=1198720 at the level of significance 120572 use theequivalence test to test the null hypothesis1198670119872Step 2 If it accepts the null hypothesis and then it defines119872lowast1minus120572 = 119872 otherwise it eliminates the model which rejectsthe null hypothesis from M according to the eliminationcriteria The elimination will not stop until there are not anymodels that reject the null hypothesis in the setM In the endthe models in119872lowast1minus120572 are thought as surviving models

Meanwhile the MCS has two kinds of statistics that canbe defined as

119879119877 = max119895119896isin119872

1003816100381610038161003816100381611990511989511989610038161003816100381610038161003816 (33)

119879119878119876 = max119895119896isinM

1199052119895119896 (34)

119905119895119896 = 119889119895119896radicvar (119889119895119896) (35)

119889119895119896 = 1119879ℎ+119879sum119905=ℎ+1

119889119895119896119905 (36)

where T119877 and 119879119878119876 stand for the range statistics and thesemiquadratic statistic respectively and both statistics arebased on the t-statistics as shown in (35)-(36) These twostatistics (T119877 and 119879119878119876) are mainly to remove the model whosep-value is less than the significance level 120572 When the p-valueis greater than the significance level120572 themodels can surviveThe larger the p-value the more accurate the prediction ofthe modelWhen the p-value is equal to 1 it indicates that themodel is the optimal forecasting model

43 Parameter Settings To test the performance ofXGBOOST and CEEMDAN-XGBOOST we conducttwo groups of experiments single models that forecast crude

8 Complexity

Table 1 The ranges of the parameters for XGBOOST by grid search

Parameter Description rangeBooster Booster to use lsquogblinearrsquo lsquogbtreersquoN estimators Number of boosted trees 100200300400500Max depth Maximum tree depth for base learners 345678Min child weight Maximum delta step we allow each treersquos weight estimation to be 123456Gamma Minimum loss reduction required to make a further partition on a leaf node of the tree 001 005010203 Subsample Subsample ratio of the training instance 060708091Colsample Subsample ratio of columns when constructing each tree 060708091Reg alpha L1 regularization term on weights 00100501Reg lambda L2 regularization term on weights 00100501Learning rate Boosting learning rate 0010050070102

oil prices with original sequence and ensemble models thatforecast crude oil prices based on the ldquodecomposition andensemblerdquo framework

For single models we compare XGBOOST with onestatistical model ARIMA and two widely used AI-modelsSVR and FNN Since the existing research has shownthat EEMD significantly outperforms EMD in forecastingcrude oil prices [24 31] in the experiments we onlycompare CEEMDAN with EEMD Therefore we com-pare the proposed CEEMDAN-XGBOOST with EEMD-SVR EEMD-FNN EEMD-XGBOOST CEEMDAN-SVRand CEEMDAN-FNN

For ARIMA we use the Akaike information criterion(AIC) [46] to select the parameters (p-d-q) For SVR weuse RBF as kernel function and use grid search to opti-mize C and gamma in the ranges of 2012345678 and2minus9minus8minus7minus6minus5minus4minus3minus2minus10 respectively We use one hiddenlayer with 20 nodes for FNNWe use a grid search to optimizethe parameters for XGBOOST the search ranges of theoptimized parameters are shown in Table 1

We set 002 and 005 as the standard deviation of theadded white noise and set 250 and 500 as the numberof realizations of EEMD and CEEMDAN respectively Thedecomposition results of the original crude oil prices byEEMD and CEEMDAN are shown in Figures 4 and 5respectively

It can be seen from Figure 4 that among the componentsdecomposed byEEMD the first six IMFs show characteristicsof high frequency while the remaining six components showcharacteristics of low frequency However regarding thecomponents by CEEMDAN the first seven ones show clearhigh-frequency and the last four show low-frequency asshown in Figure 5

The experiments were conducted with Python 27 andMATLAB 86 on a 64-bit Windows 7 with 34 GHz I7 CPUand 32 GB memory Specifically we run FNN and MCSwith MATLAB and for the remaining work we use PythonRegarding XGBoost we used a widely used Python pack-age (httpsxgboostreadthedocsioenlatestpython) in theexperiments

Table 2The RMSE MAE and Dstat by single models with horizon= 1 3 and 6

Horizon Model RMSE MAE Dstat

1

XGBOOST 12640 09481 04827SVR 12899 09651 04826FNN 13439 09994 04837

ARIMA 12692 09520 04883

3

XGBOOST 20963 16159 04839SVR 22444 17258 05080FNN 21503 16512 04837

ARIMA 21056 16177 04901

6

XGBOOST 29269 22945 05158SVR 31048 24308 05183FNN 30803 24008 05028

ARIMA 29320 22912 05151

44 Experimental Results In this subsection we use a fixedvalue 6 as the lag order andwe forecast crude oil prices with 1-step-ahead 3-step-ahead and 6-step-ahead forecasting thatis to say the horizons for these three forecasting tasks are 1 3and 6 respectively

441 Experimental Results of Single Models For single mod-els we compare XGBOOST with state-of-the-art SVR FNNand ARIMA and the results are shown in Table 2

It can be seen from Table 2 that XGBOOST outperformsother models in terms of RMSE and MAE with horizons 1and 3 For horizon 6 XGBOOST achieves the best RMSE andthe second best MAE which is slightly worse than that byARIMA For horizon 1 FNNachieves theworst results amongthe four models however for horizons 3 and 6 SVR achievesthe worst results Regarding Dstat none of the models canalways outperform others and the best result of Dstat isachieved by SVR with horizon 6 It can be found that theRMSE and MAE values gradually increase with the increaseof horizon However the Dstat values do not show suchdiscipline All the values of Dstat are around 05 ie from

Complexity 9

minus505

IMF1

minus4minus2

024

IMF2

minus505

IMF3

minus505

IMF4

minus4minus2

024

IMF5

minus505

IMF6

minus200

20

IMF7

minus100

1020

IMF8

minus100

10

IMF9

minus20minus10

010

IMF1

0

minus200

20

IMF1

1

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 4 The IMFs and residue of WTI crude oil prices by EEMD

04826 to 05183 which are very similar to the results ofrandom guesses showing that it is very difficult to accuratelyforecast the direction of the movement of crude oil priceswith raw crude oil prices directly

To further verify the advantages of XGBOOST over othermodels we report the results by WSRT and MCS as shownin Tables 3 and 4 respectively As for WSRT the p-valuebetween XGBOOST and other models except ARIMA isbelow 005 which means that there is a significant differenceamong the forecasting results of XGBOOST SVR and FNNin population mean ranks Besides the results of MCS showthat the p-value of119879119877 and119879119878119876 of XGBOOST is always equal to1000 and prove that XGBOOST is the optimal model among

Table 3Wilcoxon signed rank test between XGBOOST SVR FNNand ARIMA

XGBOOST SVR FNN ARIMAXGBOOST 1 40378e-06 22539e-35 57146e-01SVR 40378e-06 1 46786e-33 07006FNN 22539e-35 46786e-33 1 69095e-02ARIMA 57146e-01 07006 69095e-02 1

all the models in terms of global errors and most local errorsof different samples obtained by bootstrap methods in MCSAccording to MCS the p-values of 119879119877 and 119879119878119876 of SVR onthe horizon of 3 are greater than 02 so SVR becomes the

10 Complexity

minus505

IMF1

minus505

IMF2

minus505

IMF3

minus505

IMF4

minus505

IMF5

minus100

10

IMF6

minus100

10

IMF7

minus20

0

20

IMF8

minus200

20

IMF9

minus200

20

IMF1

0

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 5 The IMFs and residue of WTI crude oil prices by CEEMDAN

Table 4 MCS between XGBOOST SVR FNN and ARIMA

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876XGBOOST 10000 10000 10000 10000 10000 10000SVR 00004 00004 04132 04132 00200 00200FNN 00002 00002 00248 00538 00016 00022ARIMA 00000 00000 00000 00000 00000 00000

survival and the second best model on this horizon Whenit comes to ARIMA ARIMA almost performs as good asXGBOOST in terms of evaluation criteria of global errorsbut it does not pass the MCS It indicates that ARIMA doesnot perform better than other models in most local errors ofdifferent samples

442 Experimental Results of Ensemble Models With EEMDor CEEMDAN the results of forecasting the crude oil pricesby XGBOOST SVR and FNN with horizons 1 3 and 6 areshown in Table 5

It can be seen from Table 5 that the RMSE and MAEvalues of the CEEMDAN-XGBOOST are the lowest ones

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 4: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

4 Complexity

However Torres et al found that due to the limitednumber of 119909119894[119905] in empirical research EEMD could not com-pletely eliminate the influence of white noise in the end Forthis situation Torres et al put forward a new decompositiontechnology CEEMDAN on the basis of EEMD [32]

CEEMDAN decomposes the original sequence into thefirst IMF and residue which is the same as EMD ThenCEEMDAN gets the second IMF and residue as shown in

1198681198721198652 = 1119873119873sum119894=1

1198641 (1199031 [119905] + 12057611198641 (119908119894 [119905])) (4)

1199032 [119905] = 1199031 [119905] minus 1198681198721198652 (5)

where 1198641() stands for the first IMF decomposed from thesequence and 120576119894 is used to set the signal-to-noise ratio (SNR)at each stage

In the same way the k-th IMF and residue can becalculated as

119868119872119865119896 = 1119873119873sum119894=1

1198641 (119903119896minus1 [119905] + 120576119896minus1119864119896minus1 (119908119894 [119905])) (6)

119903119896 [119905] = 119903119896minus1 [119905] minus 119868119872119865119896 (7)

Finally CEEMDAN gets several IMFs and computes theresidue as shown in

R [119905] = x [119905] minus 119870sum119896=1

119868119872119865119896 (8)

The sequences decomposed by EMD EEMD and CEEM-DAN satisfy (8) Although CEEMDAN can solve the prob-lems that EEMD leaves it still has two limitations (1) theresidual noise that themodels contain and (2) the existence ofspurious modes Aiming at dealing with these issues Torreset al proposed a new algorithm to improve CEEMDAN [33]

Compared with the original CEEMDAN the improvedversion obtains the residues by calculating the local meansFor example in order to get the first residue shown in (9)it would compute the local means of N realizations 119909119894[119905] =119909[119905] + 12057601198641(119908119894[119905])(i=1 2 N)

1199031 [119905] = 1119873119873sum119894=1

119872(119909119894 [119905]) (9)

whereM() is the local mean of the sequenceThen it can get the first IMF shown in

1198681198721198651 = 119909 [119905] minus 1199031 [119905] (10)

For the k-th residue and IMF they can be computed as(11) and (12) respectively

119903119896 [119905] = 1119873119873sum119894=1

119872(119903119896minus1 [119905] + 120576119896minus1119864k (119908119894 [119905])) (11)

119868119872119865119896 = 119903119896minus1 [119905] minus 119903119896 [119905] (12)

The authors have demonstrated that the improvedCEEMDAN outperformed the original CEEMDAN in signaldecomposition [33] In what follows we will refer to theimproved version of CEEMDAN as CEEMDAN unlessotherwise statedWithCEEMDAN the original sequence canbe decomposed into several IMFs and one residue that is thetough task of forecasting the raw time series can be dividedinto forecasting several simpler subtasks

22 XGBOOST Boosting is the ensemble method that cancombine several weak learners into a strong learner as

119910119894 = 0 (119909119894) = 119870sum119896=1

119891119896 (119909119894) (13)

where 119891119896() is a weak learner and K is the number of weaklearners

When it comes to the tree boosting its learners aredecision trees which can be used for both regression andclassification

To a certain degree XGBOOST is considered as treeboost and its core is the Newton boosting instead of GradientBoosting which finds the optimal parameters by minimizingthe loss function (0) as shown in

119871 (0) = 119899sum119894=1

119897 (119910119894 119910119894) + 119870sum119896=1

Ω(119891119896) (14)

Ω(119891119896) = 120574119879 + 12120572 1205962 (15)

where Ω(119891119896) is the complexity of the k-th tree model n isthe sample size T is the number of leaf nodes of the decisiontrees 120596 is the weight of the leaf nodes 120574 controls the extentof complexity penalty for tree structure on T and 120572 controlsthe degree of the regularization of 119891119896

Since it is difficult for the tree ensemble model to mini-mize loss function in (14) and (15) with traditional methodsin Euclidean space the model uses the additive manner [43]It adds 119891119905 that improves the model and forms the new lossfunction as

119871119905 = 119899sum119894=1

119897 (119910119894 119910119894(119905minus1) + 119891119905 (119909119894)) + Ω (119891119905) (16)

where 119910119894(119905) is the prediction of the i-th instance at the t-thiteration and 119891119905 is the weaker learner at the t-th iteration

Then Newton boosting performs a Taylor second-orderexpansion on the loss function 119897(119910119894 119910119894(119905minus1) + 119891119905(119909119894)) to obtain119892119894119891119905(119909119894) + (12)ℎ1198941198912119905 (119909119894) because the second-order approxi-mation helps to minimize the loss function conveniently andquickly [43]The equations of 119892119894 ℎ119894 and the new loss functionare defined respectively as

119892119894 = 120597119897 (119910119894 119910119894(119905minus1))119910119894(119905minus1) (17)

Complexity 5

ℎ119894 = 1205972119897 (119910119894 119910119894(119905minus1))119910119894(119905minus1) (18)

119871 119905 = 119899sum119894=1

[119897 (119910119894 119910119894(119905minus1)) + 119892119894119891119905 (119909119894) + 12ℎ1198941198912119905 (119909119894)]+ Ω (119891119905)

(19)

Assume that the sample set 119868119895 in the leaf node j is definedas 119868119895=119894 | 119902(119909119894) = 119895 where q(119909119894) represents the tree structurefrom the root to the leaf node j in the decision tree (19) canbe transformed into the following formula as show in

(119905) = 119899sum119894=1

[119892119894119891119905 (119909119894) + 12ℎ1198941198912119905 (119909119894)] + 120574119879 + 12120572119879sum119895=1

1205962119895= 119879sum119895=1

[[(sum119894isin119868119895

119892119894)120596119895 + 12 (sum119894isin119868119895

ℎ119894 + 119886)1205962119895]] + 120574119879(20)

The formula for estimating the weight of each leaf in thedecision tree is formulated as

120596lowast119895 = minus sum119894isin119868119895 119892119894sum119894isin119868119895 ℎ119894 + 119886 (21)

According to (21) as for the tree structure q the lossfunction at the leaf node j can be changed as

(119905) (119902) = minus12119879sum119895=1

(sum119894isin119868119895 119892119894)2sum119894isin119868119895 ℎ119894 + 119886 + 120574119879 (22)

Therefore the equation of the information gain afterbranching can be defined as

119866119886119894119899= 12 [[[

(sum119894isin119868119895119871 119892119894)2sum119894isin119868119895119871 ℎ119894 + 119886 + (sum119894isin119868119895119877 119892119894)2sum119894isin119868119895119877 ℎ119894 + 119886 minus (sum119894isin119868119895 119892119894)2sum119894isin119868119895 ℎ119894 + 119886]]]minus 120574(23)

where 119868119895119871 and 119868119895119877 are the sample sets of the left and right leafnode respectively after splitting the leaf node j

XGBOOST branches each leaf node and constructs basiclearners by the criterion of maximizing the information gain

With the help of Newton boosting the XGBOOSTcan deal with missing values by adaptively learning To acertain extent XGBOOST is based on the multiple additiveregression tree (MART) but it can get better tree structure bylearning with Newton boosting In addition XGBOOST canalso subsample among columns which reduces the relevanceof each weak learner [39]

3 The Proposed CEEMDAN-XGBOOSTApproach

From the existing literature we can see that CEEMDAN hasadvantages in time series decomposition while XGBOOST

does well in regression Therefore in this paper we inte-grated these two methods and proposed a novel approachso-called CEEMDAN-XGBOOST for forecasting crude oilprices The proposed CEEMDAN-XGBOOST includes threestages decomposition individual forecasting and ensembleIn the first stage CEEMDAN is used to decompose the rawseries of crude oil prices into k+1 components includingk IMFs and one residue Among the components someshow high-frequency characteristics while the others showlow-frequency ones of the raw series In the second stagefor each component a forecasting model is built usingXGBOOST and then the built model is applied to forecasteach component and then get an individual result Finally allthe results from the components are aggregated as the finalresult Although there exist a lot of methods to aggregatethe forecasting results from components in the proposedapproach we use the simplest way ie addition to sum-marize the results of all components The flowchart of theCEEMDAN-XGBOOST is shown in Figure 2

From Figure 2 it can be seen that the proposedCEEMDAN-XGBOOST based on the framework of ldquodecom-position and ensemblerdquo is also a typical strategy of ldquodivideand conquerrdquo that is the tough task of forecasting crude oilprices from the raw series is divided into several subtasks offorecasting from simpler components Since the raw seriesis extremely nonlinear and nonstationary while each decom-posed component has a relatively simple form for forecastingthe CEEMDAN-XGBOOST has the ability to achieve higheraccuracy of forecasting crude oil prices In short the advan-tages of the proposed CEEMDAN-XGBOOST are threefold(1) the challenging task of forecasting crude oil prices isdecomposed into several relatively simple subtasks (2) forforecasting each component XGBOOST can build modelswith different parameters according to the characteristics ofthe component and (3) a simple operation addition is usedto aggregate the results from subtasks as the final result

4 Experiments and Analysis

41 Data Description To demonstrate the performance ofthe proposed CEEMDAN-XGBOOST we use the crudeoil prices from the West Texas Intermediate (WTI) asexperimental data (the data can be downloaded fromhttpswwweiagovdnavpethistRWTCDhtm) We usethe daily closing prices covering the period from January 21986 to March 19 2018 with 8123 observations in total forempirical studies Among the observations the first 6498ones from January 2 1986 to September 21 2011 accountingfor 80 of the total observations are used as trainingsamples while the remaining 20 ones are for testing Theoriginal crude oil prices are shown in Figure 3

We perform multi-step-ahead forecasting in this paperFor a given time series 119909119905 (119905 = 1 2 119879) the m-step-aheadforecasting for 119909119905+119898 can be formulated as119909119905+119898 = 119891 (119909119905minus(119897minus1) 119909119905minus(119897minus2) 119909119905minus1 119909119905) (24)where 119909119905+119898 is the m-step-ahead predicted result at time t f isthe forecasting model 119909119894 is the true value at time i and l isthe lag order

6 Complexity

X[n]

CEEMDAN

IMF1 IMF2 IMF3 IMFk R[n]

XGBoost XGBoost XGBoost XGBoost

Decomposition

Individual forecasting

Ensemble

XGBoost

x[n]

R[n]IMF1IMF2

IMF3IMFk

k

i=1IMFi +

R[n]sum

Figure 2 The flowchart of the proposed CEEMDAN-XGBOOST

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 2018

Date

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

Pric

es (U

SDb

arre

l)

WTI crude oil prices

Figure 3 The original crude oil prices of WTI

For SVR and FNN we normalize each decomposed com-ponent before building the model to forecast the componentindividually In detail the normalization process can bedefined as

1199091015840119905 = 119909119905 minus 120583120590 (25)

where 1199091015840119905 is the normalized series of crude oil prices series 119909119905is the data before normalization 120583 is the mean of 119909119905 and 120590 isthe standard deviation of 119909119905 Meanwhile since normalizationis not necessary for XGBOOST and ARIMA for models withthese two algorithms we build forecasting models from eachof the decomposed components directly

Complexity 7

42 Evaluation Criteria When we evaluate the accuracy ofthe models we focus on not only the numerical accuracybut also the accuracy of forecasting direction Therefore weselect the root-mean-square error (RMSE) and the meanabsolute error (MAE) to evaluate the numerical accuracy ofthe models Besides we use directional statistic (Dstat) as thecriterion for evaluating the accuracy of forecasting directionThe RMSE MAE and Dstat are defined as

RMSE = radic 1119873119873sum119894=1

(119910119905 minus 119910119905)2 (26)

MAE = 1119873 ( 119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816) (27)

Dstat = sum119873119894=2 119889119894119873 minus 1 119889119894 =

1 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) gt 00 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) lt 0

(28)

where 119910119905 is the actual crude oil prices at the time t119910119905 is theprediction and N is the size of the test set

In addition we take the Wilcoxon signed rank test(WSRT) for proving the significant differences between theforecasts of the selectedmodels [44]TheWSRT is a nonpara-metric statistical hypothesis test that can be used to evaluatewhether the population mean ranks of two predictions fromdifferent models on the same sample differ Meanwhile it isa paired difference test which can be used as an alternative tothe paired Student t-test The null hypothesis of the WSRTis whether the median of the loss differential series d(t) =g(119890119886(t)) minus g(119890119887(t)) is equal to zero or not where 119890119886(t) and 119890119887(t)are the error series of model a and model b respectively andg() is a loss function If the p value of pairs of models is below005 the test rejects the null hypothesis (there is a significantdifference between the forecasts of this pair of models) underthe confidence level of 95 In this way we can prove thatthere is a significant difference between the optimal modeland the others

However the criteria defined above are global If somesingular points exist the optimal model chosen by thesecriteria may not be the best one Thus we make the modelconfidence set (MCS) [31 45] in order to choose the optimalmodel convincingly

In order to calculate the p-value of the statistics accuratelythe MCS performs bootstrap on the prediction series whichcan soften the impact of the singular points For the j-thmodel suppose that the size of a bootstrapped sample isT and the t-th bootstrapped sample has the loss functionsdefined as

119871119895119905 = 1119879ℎ+119879sum119905=ℎ+1

lfloor119910119905 minus 119910119905rfloor (29)

Suppose that a set M0=mi i = 1 2 3 n thatcontains n models for any two models j and k the relativevalues of the loss between these two models can be defined as

119889119895119896119905 = 119871119895119905 minus 119871119896119905 (30)

According to the above definitions the set of superiormodels can be defined as

119872lowast equiv 119898119895 isin M0 119864 (119889119895119896119905) le 0 forall119898119896 isin M0 (31)

where E() represents the average valueThe MCS repeatedly performs the significant test in

M0 At each time the worst prediction model in the set iseliminated In the test the hypothesis is the null hypothesisof equal predictive ability (EPA) defined as

1198670 119864 (119889119895119896119905) = 0 forall119898119895 119898119896 isin 119872 sub M0 (32)

The MCS mainly depends on the equivalence test andelimination criteria The specific process is as follows

Step 1 AssumingM=1198720 at the level of significance 120572 use theequivalence test to test the null hypothesis1198670119872Step 2 If it accepts the null hypothesis and then it defines119872lowast1minus120572 = 119872 otherwise it eliminates the model which rejectsthe null hypothesis from M according to the eliminationcriteria The elimination will not stop until there are not anymodels that reject the null hypothesis in the setM In the endthe models in119872lowast1minus120572 are thought as surviving models

Meanwhile the MCS has two kinds of statistics that canbe defined as

119879119877 = max119895119896isin119872

1003816100381610038161003816100381611990511989511989610038161003816100381610038161003816 (33)

119879119878119876 = max119895119896isinM

1199052119895119896 (34)

119905119895119896 = 119889119895119896radicvar (119889119895119896) (35)

119889119895119896 = 1119879ℎ+119879sum119905=ℎ+1

119889119895119896119905 (36)

where T119877 and 119879119878119876 stand for the range statistics and thesemiquadratic statistic respectively and both statistics arebased on the t-statistics as shown in (35)-(36) These twostatistics (T119877 and 119879119878119876) are mainly to remove the model whosep-value is less than the significance level 120572 When the p-valueis greater than the significance level120572 themodels can surviveThe larger the p-value the more accurate the prediction ofthe modelWhen the p-value is equal to 1 it indicates that themodel is the optimal forecasting model

43 Parameter Settings To test the performance ofXGBOOST and CEEMDAN-XGBOOST we conducttwo groups of experiments single models that forecast crude

8 Complexity

Table 1 The ranges of the parameters for XGBOOST by grid search

Parameter Description rangeBooster Booster to use lsquogblinearrsquo lsquogbtreersquoN estimators Number of boosted trees 100200300400500Max depth Maximum tree depth for base learners 345678Min child weight Maximum delta step we allow each treersquos weight estimation to be 123456Gamma Minimum loss reduction required to make a further partition on a leaf node of the tree 001 005010203 Subsample Subsample ratio of the training instance 060708091Colsample Subsample ratio of columns when constructing each tree 060708091Reg alpha L1 regularization term on weights 00100501Reg lambda L2 regularization term on weights 00100501Learning rate Boosting learning rate 0010050070102

oil prices with original sequence and ensemble models thatforecast crude oil prices based on the ldquodecomposition andensemblerdquo framework

For single models we compare XGBOOST with onestatistical model ARIMA and two widely used AI-modelsSVR and FNN Since the existing research has shownthat EEMD significantly outperforms EMD in forecastingcrude oil prices [24 31] in the experiments we onlycompare CEEMDAN with EEMD Therefore we com-pare the proposed CEEMDAN-XGBOOST with EEMD-SVR EEMD-FNN EEMD-XGBOOST CEEMDAN-SVRand CEEMDAN-FNN

For ARIMA we use the Akaike information criterion(AIC) [46] to select the parameters (p-d-q) For SVR weuse RBF as kernel function and use grid search to opti-mize C and gamma in the ranges of 2012345678 and2minus9minus8minus7minus6minus5minus4minus3minus2minus10 respectively We use one hiddenlayer with 20 nodes for FNNWe use a grid search to optimizethe parameters for XGBOOST the search ranges of theoptimized parameters are shown in Table 1

We set 002 and 005 as the standard deviation of theadded white noise and set 250 and 500 as the numberof realizations of EEMD and CEEMDAN respectively Thedecomposition results of the original crude oil prices byEEMD and CEEMDAN are shown in Figures 4 and 5respectively

It can be seen from Figure 4 that among the componentsdecomposed byEEMD the first six IMFs show characteristicsof high frequency while the remaining six components showcharacteristics of low frequency However regarding thecomponents by CEEMDAN the first seven ones show clearhigh-frequency and the last four show low-frequency asshown in Figure 5

The experiments were conducted with Python 27 andMATLAB 86 on a 64-bit Windows 7 with 34 GHz I7 CPUand 32 GB memory Specifically we run FNN and MCSwith MATLAB and for the remaining work we use PythonRegarding XGBoost we used a widely used Python pack-age (httpsxgboostreadthedocsioenlatestpython) in theexperiments

Table 2The RMSE MAE and Dstat by single models with horizon= 1 3 and 6

Horizon Model RMSE MAE Dstat

1

XGBOOST 12640 09481 04827SVR 12899 09651 04826FNN 13439 09994 04837

ARIMA 12692 09520 04883

3

XGBOOST 20963 16159 04839SVR 22444 17258 05080FNN 21503 16512 04837

ARIMA 21056 16177 04901

6

XGBOOST 29269 22945 05158SVR 31048 24308 05183FNN 30803 24008 05028

ARIMA 29320 22912 05151

44 Experimental Results In this subsection we use a fixedvalue 6 as the lag order andwe forecast crude oil prices with 1-step-ahead 3-step-ahead and 6-step-ahead forecasting thatis to say the horizons for these three forecasting tasks are 1 3and 6 respectively

441 Experimental Results of Single Models For single mod-els we compare XGBOOST with state-of-the-art SVR FNNand ARIMA and the results are shown in Table 2

It can be seen from Table 2 that XGBOOST outperformsother models in terms of RMSE and MAE with horizons 1and 3 For horizon 6 XGBOOST achieves the best RMSE andthe second best MAE which is slightly worse than that byARIMA For horizon 1 FNNachieves theworst results amongthe four models however for horizons 3 and 6 SVR achievesthe worst results Regarding Dstat none of the models canalways outperform others and the best result of Dstat isachieved by SVR with horizon 6 It can be found that theRMSE and MAE values gradually increase with the increaseof horizon However the Dstat values do not show suchdiscipline All the values of Dstat are around 05 ie from

Complexity 9

minus505

IMF1

minus4minus2

024

IMF2

minus505

IMF3

minus505

IMF4

minus4minus2

024

IMF5

minus505

IMF6

minus200

20

IMF7

minus100

1020

IMF8

minus100

10

IMF9

minus20minus10

010

IMF1

0

minus200

20

IMF1

1

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 4 The IMFs and residue of WTI crude oil prices by EEMD

04826 to 05183 which are very similar to the results ofrandom guesses showing that it is very difficult to accuratelyforecast the direction of the movement of crude oil priceswith raw crude oil prices directly

To further verify the advantages of XGBOOST over othermodels we report the results by WSRT and MCS as shownin Tables 3 and 4 respectively As for WSRT the p-valuebetween XGBOOST and other models except ARIMA isbelow 005 which means that there is a significant differenceamong the forecasting results of XGBOOST SVR and FNNin population mean ranks Besides the results of MCS showthat the p-value of119879119877 and119879119878119876 of XGBOOST is always equal to1000 and prove that XGBOOST is the optimal model among

Table 3Wilcoxon signed rank test between XGBOOST SVR FNNand ARIMA

XGBOOST SVR FNN ARIMAXGBOOST 1 40378e-06 22539e-35 57146e-01SVR 40378e-06 1 46786e-33 07006FNN 22539e-35 46786e-33 1 69095e-02ARIMA 57146e-01 07006 69095e-02 1

all the models in terms of global errors and most local errorsof different samples obtained by bootstrap methods in MCSAccording to MCS the p-values of 119879119877 and 119879119878119876 of SVR onthe horizon of 3 are greater than 02 so SVR becomes the

10 Complexity

minus505

IMF1

minus505

IMF2

minus505

IMF3

minus505

IMF4

minus505

IMF5

minus100

10

IMF6

minus100

10

IMF7

minus20

0

20

IMF8

minus200

20

IMF9

minus200

20

IMF1

0

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 5 The IMFs and residue of WTI crude oil prices by CEEMDAN

Table 4 MCS between XGBOOST SVR FNN and ARIMA

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876XGBOOST 10000 10000 10000 10000 10000 10000SVR 00004 00004 04132 04132 00200 00200FNN 00002 00002 00248 00538 00016 00022ARIMA 00000 00000 00000 00000 00000 00000

survival and the second best model on this horizon Whenit comes to ARIMA ARIMA almost performs as good asXGBOOST in terms of evaluation criteria of global errorsbut it does not pass the MCS It indicates that ARIMA doesnot perform better than other models in most local errors ofdifferent samples

442 Experimental Results of Ensemble Models With EEMDor CEEMDAN the results of forecasting the crude oil pricesby XGBOOST SVR and FNN with horizons 1 3 and 6 areshown in Table 5

It can be seen from Table 5 that the RMSE and MAEvalues of the CEEMDAN-XGBOOST are the lowest ones

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 5: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

Complexity 5

ℎ119894 = 1205972119897 (119910119894 119910119894(119905minus1))119910119894(119905minus1) (18)

119871 119905 = 119899sum119894=1

[119897 (119910119894 119910119894(119905minus1)) + 119892119894119891119905 (119909119894) + 12ℎ1198941198912119905 (119909119894)]+ Ω (119891119905)

(19)

Assume that the sample set 119868119895 in the leaf node j is definedas 119868119895=119894 | 119902(119909119894) = 119895 where q(119909119894) represents the tree structurefrom the root to the leaf node j in the decision tree (19) canbe transformed into the following formula as show in

(119905) = 119899sum119894=1

[119892119894119891119905 (119909119894) + 12ℎ1198941198912119905 (119909119894)] + 120574119879 + 12120572119879sum119895=1

1205962119895= 119879sum119895=1

[[(sum119894isin119868119895

119892119894)120596119895 + 12 (sum119894isin119868119895

ℎ119894 + 119886)1205962119895]] + 120574119879(20)

The formula for estimating the weight of each leaf in thedecision tree is formulated as

120596lowast119895 = minus sum119894isin119868119895 119892119894sum119894isin119868119895 ℎ119894 + 119886 (21)

According to (21) as for the tree structure q the lossfunction at the leaf node j can be changed as

(119905) (119902) = minus12119879sum119895=1

(sum119894isin119868119895 119892119894)2sum119894isin119868119895 ℎ119894 + 119886 + 120574119879 (22)

Therefore the equation of the information gain afterbranching can be defined as

119866119886119894119899= 12 [[[

(sum119894isin119868119895119871 119892119894)2sum119894isin119868119895119871 ℎ119894 + 119886 + (sum119894isin119868119895119877 119892119894)2sum119894isin119868119895119877 ℎ119894 + 119886 minus (sum119894isin119868119895 119892119894)2sum119894isin119868119895 ℎ119894 + 119886]]]minus 120574(23)

where 119868119895119871 and 119868119895119877 are the sample sets of the left and right leafnode respectively after splitting the leaf node j

XGBOOST branches each leaf node and constructs basiclearners by the criterion of maximizing the information gain

With the help of Newton boosting the XGBOOSTcan deal with missing values by adaptively learning To acertain extent XGBOOST is based on the multiple additiveregression tree (MART) but it can get better tree structure bylearning with Newton boosting In addition XGBOOST canalso subsample among columns which reduces the relevanceof each weak learner [39]

3 The Proposed CEEMDAN-XGBOOSTApproach

From the existing literature we can see that CEEMDAN hasadvantages in time series decomposition while XGBOOST

does well in regression Therefore in this paper we inte-grated these two methods and proposed a novel approachso-called CEEMDAN-XGBOOST for forecasting crude oilprices The proposed CEEMDAN-XGBOOST includes threestages decomposition individual forecasting and ensembleIn the first stage CEEMDAN is used to decompose the rawseries of crude oil prices into k+1 components includingk IMFs and one residue Among the components someshow high-frequency characteristics while the others showlow-frequency ones of the raw series In the second stagefor each component a forecasting model is built usingXGBOOST and then the built model is applied to forecasteach component and then get an individual result Finally allthe results from the components are aggregated as the finalresult Although there exist a lot of methods to aggregatethe forecasting results from components in the proposedapproach we use the simplest way ie addition to sum-marize the results of all components The flowchart of theCEEMDAN-XGBOOST is shown in Figure 2

From Figure 2 it can be seen that the proposedCEEMDAN-XGBOOST based on the framework of ldquodecom-position and ensemblerdquo is also a typical strategy of ldquodivideand conquerrdquo that is the tough task of forecasting crude oilprices from the raw series is divided into several subtasks offorecasting from simpler components Since the raw seriesis extremely nonlinear and nonstationary while each decom-posed component has a relatively simple form for forecastingthe CEEMDAN-XGBOOST has the ability to achieve higheraccuracy of forecasting crude oil prices In short the advan-tages of the proposed CEEMDAN-XGBOOST are threefold(1) the challenging task of forecasting crude oil prices isdecomposed into several relatively simple subtasks (2) forforecasting each component XGBOOST can build modelswith different parameters according to the characteristics ofthe component and (3) a simple operation addition is usedto aggregate the results from subtasks as the final result

4 Experiments and Analysis

41 Data Description To demonstrate the performance ofthe proposed CEEMDAN-XGBOOST we use the crudeoil prices from the West Texas Intermediate (WTI) asexperimental data (the data can be downloaded fromhttpswwweiagovdnavpethistRWTCDhtm) We usethe daily closing prices covering the period from January 21986 to March 19 2018 with 8123 observations in total forempirical studies Among the observations the first 6498ones from January 2 1986 to September 21 2011 accountingfor 80 of the total observations are used as trainingsamples while the remaining 20 ones are for testing Theoriginal crude oil prices are shown in Figure 3

We perform multi-step-ahead forecasting in this paperFor a given time series 119909119905 (119905 = 1 2 119879) the m-step-aheadforecasting for 119909119905+119898 can be formulated as119909119905+119898 = 119891 (119909119905minus(119897minus1) 119909119905minus(119897minus2) 119909119905minus1 119909119905) (24)where 119909119905+119898 is the m-step-ahead predicted result at time t f isthe forecasting model 119909119894 is the true value at time i and l isthe lag order

6 Complexity

X[n]

CEEMDAN

IMF1 IMF2 IMF3 IMFk R[n]

XGBoost XGBoost XGBoost XGBoost

Decomposition

Individual forecasting

Ensemble

XGBoost

x[n]

R[n]IMF1IMF2

IMF3IMFk

k

i=1IMFi +

R[n]sum

Figure 2 The flowchart of the proposed CEEMDAN-XGBOOST

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 2018

Date

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

Pric

es (U

SDb

arre

l)

WTI crude oil prices

Figure 3 The original crude oil prices of WTI

For SVR and FNN we normalize each decomposed com-ponent before building the model to forecast the componentindividually In detail the normalization process can bedefined as

1199091015840119905 = 119909119905 minus 120583120590 (25)

where 1199091015840119905 is the normalized series of crude oil prices series 119909119905is the data before normalization 120583 is the mean of 119909119905 and 120590 isthe standard deviation of 119909119905 Meanwhile since normalizationis not necessary for XGBOOST and ARIMA for models withthese two algorithms we build forecasting models from eachof the decomposed components directly

Complexity 7

42 Evaluation Criteria When we evaluate the accuracy ofthe models we focus on not only the numerical accuracybut also the accuracy of forecasting direction Therefore weselect the root-mean-square error (RMSE) and the meanabsolute error (MAE) to evaluate the numerical accuracy ofthe models Besides we use directional statistic (Dstat) as thecriterion for evaluating the accuracy of forecasting directionThe RMSE MAE and Dstat are defined as

RMSE = radic 1119873119873sum119894=1

(119910119905 minus 119910119905)2 (26)

MAE = 1119873 ( 119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816) (27)

Dstat = sum119873119894=2 119889119894119873 minus 1 119889119894 =

1 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) gt 00 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) lt 0

(28)

where 119910119905 is the actual crude oil prices at the time t119910119905 is theprediction and N is the size of the test set

In addition we take the Wilcoxon signed rank test(WSRT) for proving the significant differences between theforecasts of the selectedmodels [44]TheWSRT is a nonpara-metric statistical hypothesis test that can be used to evaluatewhether the population mean ranks of two predictions fromdifferent models on the same sample differ Meanwhile it isa paired difference test which can be used as an alternative tothe paired Student t-test The null hypothesis of the WSRTis whether the median of the loss differential series d(t) =g(119890119886(t)) minus g(119890119887(t)) is equal to zero or not where 119890119886(t) and 119890119887(t)are the error series of model a and model b respectively andg() is a loss function If the p value of pairs of models is below005 the test rejects the null hypothesis (there is a significantdifference between the forecasts of this pair of models) underthe confidence level of 95 In this way we can prove thatthere is a significant difference between the optimal modeland the others

However the criteria defined above are global If somesingular points exist the optimal model chosen by thesecriteria may not be the best one Thus we make the modelconfidence set (MCS) [31 45] in order to choose the optimalmodel convincingly

In order to calculate the p-value of the statistics accuratelythe MCS performs bootstrap on the prediction series whichcan soften the impact of the singular points For the j-thmodel suppose that the size of a bootstrapped sample isT and the t-th bootstrapped sample has the loss functionsdefined as

119871119895119905 = 1119879ℎ+119879sum119905=ℎ+1

lfloor119910119905 minus 119910119905rfloor (29)

Suppose that a set M0=mi i = 1 2 3 n thatcontains n models for any two models j and k the relativevalues of the loss between these two models can be defined as

119889119895119896119905 = 119871119895119905 minus 119871119896119905 (30)

According to the above definitions the set of superiormodels can be defined as

119872lowast equiv 119898119895 isin M0 119864 (119889119895119896119905) le 0 forall119898119896 isin M0 (31)

where E() represents the average valueThe MCS repeatedly performs the significant test in

M0 At each time the worst prediction model in the set iseliminated In the test the hypothesis is the null hypothesisof equal predictive ability (EPA) defined as

1198670 119864 (119889119895119896119905) = 0 forall119898119895 119898119896 isin 119872 sub M0 (32)

The MCS mainly depends on the equivalence test andelimination criteria The specific process is as follows

Step 1 AssumingM=1198720 at the level of significance 120572 use theequivalence test to test the null hypothesis1198670119872Step 2 If it accepts the null hypothesis and then it defines119872lowast1minus120572 = 119872 otherwise it eliminates the model which rejectsthe null hypothesis from M according to the eliminationcriteria The elimination will not stop until there are not anymodels that reject the null hypothesis in the setM In the endthe models in119872lowast1minus120572 are thought as surviving models

Meanwhile the MCS has two kinds of statistics that canbe defined as

119879119877 = max119895119896isin119872

1003816100381610038161003816100381611990511989511989610038161003816100381610038161003816 (33)

119879119878119876 = max119895119896isinM

1199052119895119896 (34)

119905119895119896 = 119889119895119896radicvar (119889119895119896) (35)

119889119895119896 = 1119879ℎ+119879sum119905=ℎ+1

119889119895119896119905 (36)

where T119877 and 119879119878119876 stand for the range statistics and thesemiquadratic statistic respectively and both statistics arebased on the t-statistics as shown in (35)-(36) These twostatistics (T119877 and 119879119878119876) are mainly to remove the model whosep-value is less than the significance level 120572 When the p-valueis greater than the significance level120572 themodels can surviveThe larger the p-value the more accurate the prediction ofthe modelWhen the p-value is equal to 1 it indicates that themodel is the optimal forecasting model

43 Parameter Settings To test the performance ofXGBOOST and CEEMDAN-XGBOOST we conducttwo groups of experiments single models that forecast crude

8 Complexity

Table 1 The ranges of the parameters for XGBOOST by grid search

Parameter Description rangeBooster Booster to use lsquogblinearrsquo lsquogbtreersquoN estimators Number of boosted trees 100200300400500Max depth Maximum tree depth for base learners 345678Min child weight Maximum delta step we allow each treersquos weight estimation to be 123456Gamma Minimum loss reduction required to make a further partition on a leaf node of the tree 001 005010203 Subsample Subsample ratio of the training instance 060708091Colsample Subsample ratio of columns when constructing each tree 060708091Reg alpha L1 regularization term on weights 00100501Reg lambda L2 regularization term on weights 00100501Learning rate Boosting learning rate 0010050070102

oil prices with original sequence and ensemble models thatforecast crude oil prices based on the ldquodecomposition andensemblerdquo framework

For single models we compare XGBOOST with onestatistical model ARIMA and two widely used AI-modelsSVR and FNN Since the existing research has shownthat EEMD significantly outperforms EMD in forecastingcrude oil prices [24 31] in the experiments we onlycompare CEEMDAN with EEMD Therefore we com-pare the proposed CEEMDAN-XGBOOST with EEMD-SVR EEMD-FNN EEMD-XGBOOST CEEMDAN-SVRand CEEMDAN-FNN

For ARIMA we use the Akaike information criterion(AIC) [46] to select the parameters (p-d-q) For SVR weuse RBF as kernel function and use grid search to opti-mize C and gamma in the ranges of 2012345678 and2minus9minus8minus7minus6minus5minus4minus3minus2minus10 respectively We use one hiddenlayer with 20 nodes for FNNWe use a grid search to optimizethe parameters for XGBOOST the search ranges of theoptimized parameters are shown in Table 1

We set 002 and 005 as the standard deviation of theadded white noise and set 250 and 500 as the numberof realizations of EEMD and CEEMDAN respectively Thedecomposition results of the original crude oil prices byEEMD and CEEMDAN are shown in Figures 4 and 5respectively

It can be seen from Figure 4 that among the componentsdecomposed byEEMD the first six IMFs show characteristicsof high frequency while the remaining six components showcharacteristics of low frequency However regarding thecomponents by CEEMDAN the first seven ones show clearhigh-frequency and the last four show low-frequency asshown in Figure 5

The experiments were conducted with Python 27 andMATLAB 86 on a 64-bit Windows 7 with 34 GHz I7 CPUand 32 GB memory Specifically we run FNN and MCSwith MATLAB and for the remaining work we use PythonRegarding XGBoost we used a widely used Python pack-age (httpsxgboostreadthedocsioenlatestpython) in theexperiments

Table 2The RMSE MAE and Dstat by single models with horizon= 1 3 and 6

Horizon Model RMSE MAE Dstat

1

XGBOOST 12640 09481 04827SVR 12899 09651 04826FNN 13439 09994 04837

ARIMA 12692 09520 04883

3

XGBOOST 20963 16159 04839SVR 22444 17258 05080FNN 21503 16512 04837

ARIMA 21056 16177 04901

6

XGBOOST 29269 22945 05158SVR 31048 24308 05183FNN 30803 24008 05028

ARIMA 29320 22912 05151

44 Experimental Results In this subsection we use a fixedvalue 6 as the lag order andwe forecast crude oil prices with 1-step-ahead 3-step-ahead and 6-step-ahead forecasting thatis to say the horizons for these three forecasting tasks are 1 3and 6 respectively

441 Experimental Results of Single Models For single mod-els we compare XGBOOST with state-of-the-art SVR FNNand ARIMA and the results are shown in Table 2

It can be seen from Table 2 that XGBOOST outperformsother models in terms of RMSE and MAE with horizons 1and 3 For horizon 6 XGBOOST achieves the best RMSE andthe second best MAE which is slightly worse than that byARIMA For horizon 1 FNNachieves theworst results amongthe four models however for horizons 3 and 6 SVR achievesthe worst results Regarding Dstat none of the models canalways outperform others and the best result of Dstat isachieved by SVR with horizon 6 It can be found that theRMSE and MAE values gradually increase with the increaseof horizon However the Dstat values do not show suchdiscipline All the values of Dstat are around 05 ie from

Complexity 9

minus505

IMF1

minus4minus2

024

IMF2

minus505

IMF3

minus505

IMF4

minus4minus2

024

IMF5

minus505

IMF6

minus200

20

IMF7

minus100

1020

IMF8

minus100

10

IMF9

minus20minus10

010

IMF1

0

minus200

20

IMF1

1

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 4 The IMFs and residue of WTI crude oil prices by EEMD

04826 to 05183 which are very similar to the results ofrandom guesses showing that it is very difficult to accuratelyforecast the direction of the movement of crude oil priceswith raw crude oil prices directly

To further verify the advantages of XGBOOST over othermodels we report the results by WSRT and MCS as shownin Tables 3 and 4 respectively As for WSRT the p-valuebetween XGBOOST and other models except ARIMA isbelow 005 which means that there is a significant differenceamong the forecasting results of XGBOOST SVR and FNNin population mean ranks Besides the results of MCS showthat the p-value of119879119877 and119879119878119876 of XGBOOST is always equal to1000 and prove that XGBOOST is the optimal model among

Table 3Wilcoxon signed rank test between XGBOOST SVR FNNand ARIMA

XGBOOST SVR FNN ARIMAXGBOOST 1 40378e-06 22539e-35 57146e-01SVR 40378e-06 1 46786e-33 07006FNN 22539e-35 46786e-33 1 69095e-02ARIMA 57146e-01 07006 69095e-02 1

all the models in terms of global errors and most local errorsof different samples obtained by bootstrap methods in MCSAccording to MCS the p-values of 119879119877 and 119879119878119876 of SVR onthe horizon of 3 are greater than 02 so SVR becomes the

10 Complexity

minus505

IMF1

minus505

IMF2

minus505

IMF3

minus505

IMF4

minus505

IMF5

minus100

10

IMF6

minus100

10

IMF7

minus20

0

20

IMF8

minus200

20

IMF9

minus200

20

IMF1

0

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 5 The IMFs and residue of WTI crude oil prices by CEEMDAN

Table 4 MCS between XGBOOST SVR FNN and ARIMA

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876XGBOOST 10000 10000 10000 10000 10000 10000SVR 00004 00004 04132 04132 00200 00200FNN 00002 00002 00248 00538 00016 00022ARIMA 00000 00000 00000 00000 00000 00000

survival and the second best model on this horizon Whenit comes to ARIMA ARIMA almost performs as good asXGBOOST in terms of evaluation criteria of global errorsbut it does not pass the MCS It indicates that ARIMA doesnot perform better than other models in most local errors ofdifferent samples

442 Experimental Results of Ensemble Models With EEMDor CEEMDAN the results of forecasting the crude oil pricesby XGBOOST SVR and FNN with horizons 1 3 and 6 areshown in Table 5

It can be seen from Table 5 that the RMSE and MAEvalues of the CEEMDAN-XGBOOST are the lowest ones

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 6: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

6 Complexity

X[n]

CEEMDAN

IMF1 IMF2 IMF3 IMFk R[n]

XGBoost XGBoost XGBoost XGBoost

Decomposition

Individual forecasting

Ensemble

XGBoost

x[n]

R[n]IMF1IMF2

IMF3IMFk

k

i=1IMFi +

R[n]sum

Figure 2 The flowchart of the proposed CEEMDAN-XGBOOST

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 2018

Date

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

Pric

es (U

SDb

arre

l)

WTI crude oil prices

Figure 3 The original crude oil prices of WTI

For SVR and FNN we normalize each decomposed com-ponent before building the model to forecast the componentindividually In detail the normalization process can bedefined as

1199091015840119905 = 119909119905 minus 120583120590 (25)

where 1199091015840119905 is the normalized series of crude oil prices series 119909119905is the data before normalization 120583 is the mean of 119909119905 and 120590 isthe standard deviation of 119909119905 Meanwhile since normalizationis not necessary for XGBOOST and ARIMA for models withthese two algorithms we build forecasting models from eachof the decomposed components directly

Complexity 7

42 Evaluation Criteria When we evaluate the accuracy ofthe models we focus on not only the numerical accuracybut also the accuracy of forecasting direction Therefore weselect the root-mean-square error (RMSE) and the meanabsolute error (MAE) to evaluate the numerical accuracy ofthe models Besides we use directional statistic (Dstat) as thecriterion for evaluating the accuracy of forecasting directionThe RMSE MAE and Dstat are defined as

RMSE = radic 1119873119873sum119894=1

(119910119905 minus 119910119905)2 (26)

MAE = 1119873 ( 119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816) (27)

Dstat = sum119873119894=2 119889119894119873 minus 1 119889119894 =

1 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) gt 00 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) lt 0

(28)

where 119910119905 is the actual crude oil prices at the time t119910119905 is theprediction and N is the size of the test set

In addition we take the Wilcoxon signed rank test(WSRT) for proving the significant differences between theforecasts of the selectedmodels [44]TheWSRT is a nonpara-metric statistical hypothesis test that can be used to evaluatewhether the population mean ranks of two predictions fromdifferent models on the same sample differ Meanwhile it isa paired difference test which can be used as an alternative tothe paired Student t-test The null hypothesis of the WSRTis whether the median of the loss differential series d(t) =g(119890119886(t)) minus g(119890119887(t)) is equal to zero or not where 119890119886(t) and 119890119887(t)are the error series of model a and model b respectively andg() is a loss function If the p value of pairs of models is below005 the test rejects the null hypothesis (there is a significantdifference between the forecasts of this pair of models) underthe confidence level of 95 In this way we can prove thatthere is a significant difference between the optimal modeland the others

However the criteria defined above are global If somesingular points exist the optimal model chosen by thesecriteria may not be the best one Thus we make the modelconfidence set (MCS) [31 45] in order to choose the optimalmodel convincingly

In order to calculate the p-value of the statistics accuratelythe MCS performs bootstrap on the prediction series whichcan soften the impact of the singular points For the j-thmodel suppose that the size of a bootstrapped sample isT and the t-th bootstrapped sample has the loss functionsdefined as

119871119895119905 = 1119879ℎ+119879sum119905=ℎ+1

lfloor119910119905 minus 119910119905rfloor (29)

Suppose that a set M0=mi i = 1 2 3 n thatcontains n models for any two models j and k the relativevalues of the loss between these two models can be defined as

119889119895119896119905 = 119871119895119905 minus 119871119896119905 (30)

According to the above definitions the set of superiormodels can be defined as

119872lowast equiv 119898119895 isin M0 119864 (119889119895119896119905) le 0 forall119898119896 isin M0 (31)

where E() represents the average valueThe MCS repeatedly performs the significant test in

M0 At each time the worst prediction model in the set iseliminated In the test the hypothesis is the null hypothesisof equal predictive ability (EPA) defined as

1198670 119864 (119889119895119896119905) = 0 forall119898119895 119898119896 isin 119872 sub M0 (32)

The MCS mainly depends on the equivalence test andelimination criteria The specific process is as follows

Step 1 AssumingM=1198720 at the level of significance 120572 use theequivalence test to test the null hypothesis1198670119872Step 2 If it accepts the null hypothesis and then it defines119872lowast1minus120572 = 119872 otherwise it eliminates the model which rejectsthe null hypothesis from M according to the eliminationcriteria The elimination will not stop until there are not anymodels that reject the null hypothesis in the setM In the endthe models in119872lowast1minus120572 are thought as surviving models

Meanwhile the MCS has two kinds of statistics that canbe defined as

119879119877 = max119895119896isin119872

1003816100381610038161003816100381611990511989511989610038161003816100381610038161003816 (33)

119879119878119876 = max119895119896isinM

1199052119895119896 (34)

119905119895119896 = 119889119895119896radicvar (119889119895119896) (35)

119889119895119896 = 1119879ℎ+119879sum119905=ℎ+1

119889119895119896119905 (36)

where T119877 and 119879119878119876 stand for the range statistics and thesemiquadratic statistic respectively and both statistics arebased on the t-statistics as shown in (35)-(36) These twostatistics (T119877 and 119879119878119876) are mainly to remove the model whosep-value is less than the significance level 120572 When the p-valueis greater than the significance level120572 themodels can surviveThe larger the p-value the more accurate the prediction ofthe modelWhen the p-value is equal to 1 it indicates that themodel is the optimal forecasting model

43 Parameter Settings To test the performance ofXGBOOST and CEEMDAN-XGBOOST we conducttwo groups of experiments single models that forecast crude

8 Complexity

Table 1 The ranges of the parameters for XGBOOST by grid search

Parameter Description rangeBooster Booster to use lsquogblinearrsquo lsquogbtreersquoN estimators Number of boosted trees 100200300400500Max depth Maximum tree depth for base learners 345678Min child weight Maximum delta step we allow each treersquos weight estimation to be 123456Gamma Minimum loss reduction required to make a further partition on a leaf node of the tree 001 005010203 Subsample Subsample ratio of the training instance 060708091Colsample Subsample ratio of columns when constructing each tree 060708091Reg alpha L1 regularization term on weights 00100501Reg lambda L2 regularization term on weights 00100501Learning rate Boosting learning rate 0010050070102

oil prices with original sequence and ensemble models thatforecast crude oil prices based on the ldquodecomposition andensemblerdquo framework

For single models we compare XGBOOST with onestatistical model ARIMA and two widely used AI-modelsSVR and FNN Since the existing research has shownthat EEMD significantly outperforms EMD in forecastingcrude oil prices [24 31] in the experiments we onlycompare CEEMDAN with EEMD Therefore we com-pare the proposed CEEMDAN-XGBOOST with EEMD-SVR EEMD-FNN EEMD-XGBOOST CEEMDAN-SVRand CEEMDAN-FNN

For ARIMA we use the Akaike information criterion(AIC) [46] to select the parameters (p-d-q) For SVR weuse RBF as kernel function and use grid search to opti-mize C and gamma in the ranges of 2012345678 and2minus9minus8minus7minus6minus5minus4minus3minus2minus10 respectively We use one hiddenlayer with 20 nodes for FNNWe use a grid search to optimizethe parameters for XGBOOST the search ranges of theoptimized parameters are shown in Table 1

We set 002 and 005 as the standard deviation of theadded white noise and set 250 and 500 as the numberof realizations of EEMD and CEEMDAN respectively Thedecomposition results of the original crude oil prices byEEMD and CEEMDAN are shown in Figures 4 and 5respectively

It can be seen from Figure 4 that among the componentsdecomposed byEEMD the first six IMFs show characteristicsof high frequency while the remaining six components showcharacteristics of low frequency However regarding thecomponents by CEEMDAN the first seven ones show clearhigh-frequency and the last four show low-frequency asshown in Figure 5

The experiments were conducted with Python 27 andMATLAB 86 on a 64-bit Windows 7 with 34 GHz I7 CPUand 32 GB memory Specifically we run FNN and MCSwith MATLAB and for the remaining work we use PythonRegarding XGBoost we used a widely used Python pack-age (httpsxgboostreadthedocsioenlatestpython) in theexperiments

Table 2The RMSE MAE and Dstat by single models with horizon= 1 3 and 6

Horizon Model RMSE MAE Dstat

1

XGBOOST 12640 09481 04827SVR 12899 09651 04826FNN 13439 09994 04837

ARIMA 12692 09520 04883

3

XGBOOST 20963 16159 04839SVR 22444 17258 05080FNN 21503 16512 04837

ARIMA 21056 16177 04901

6

XGBOOST 29269 22945 05158SVR 31048 24308 05183FNN 30803 24008 05028

ARIMA 29320 22912 05151

44 Experimental Results In this subsection we use a fixedvalue 6 as the lag order andwe forecast crude oil prices with 1-step-ahead 3-step-ahead and 6-step-ahead forecasting thatis to say the horizons for these three forecasting tasks are 1 3and 6 respectively

441 Experimental Results of Single Models For single mod-els we compare XGBOOST with state-of-the-art SVR FNNand ARIMA and the results are shown in Table 2

It can be seen from Table 2 that XGBOOST outperformsother models in terms of RMSE and MAE with horizons 1and 3 For horizon 6 XGBOOST achieves the best RMSE andthe second best MAE which is slightly worse than that byARIMA For horizon 1 FNNachieves theworst results amongthe four models however for horizons 3 and 6 SVR achievesthe worst results Regarding Dstat none of the models canalways outperform others and the best result of Dstat isachieved by SVR with horizon 6 It can be found that theRMSE and MAE values gradually increase with the increaseof horizon However the Dstat values do not show suchdiscipline All the values of Dstat are around 05 ie from

Complexity 9

minus505

IMF1

minus4minus2

024

IMF2

minus505

IMF3

minus505

IMF4

minus4minus2

024

IMF5

minus505

IMF6

minus200

20

IMF7

minus100

1020

IMF8

minus100

10

IMF9

minus20minus10

010

IMF1

0

minus200

20

IMF1

1

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 4 The IMFs and residue of WTI crude oil prices by EEMD

04826 to 05183 which are very similar to the results ofrandom guesses showing that it is very difficult to accuratelyforecast the direction of the movement of crude oil priceswith raw crude oil prices directly

To further verify the advantages of XGBOOST over othermodels we report the results by WSRT and MCS as shownin Tables 3 and 4 respectively As for WSRT the p-valuebetween XGBOOST and other models except ARIMA isbelow 005 which means that there is a significant differenceamong the forecasting results of XGBOOST SVR and FNNin population mean ranks Besides the results of MCS showthat the p-value of119879119877 and119879119878119876 of XGBOOST is always equal to1000 and prove that XGBOOST is the optimal model among

Table 3Wilcoxon signed rank test between XGBOOST SVR FNNand ARIMA

XGBOOST SVR FNN ARIMAXGBOOST 1 40378e-06 22539e-35 57146e-01SVR 40378e-06 1 46786e-33 07006FNN 22539e-35 46786e-33 1 69095e-02ARIMA 57146e-01 07006 69095e-02 1

all the models in terms of global errors and most local errorsof different samples obtained by bootstrap methods in MCSAccording to MCS the p-values of 119879119877 and 119879119878119876 of SVR onthe horizon of 3 are greater than 02 so SVR becomes the

10 Complexity

minus505

IMF1

minus505

IMF2

minus505

IMF3

minus505

IMF4

minus505

IMF5

minus100

10

IMF6

minus100

10

IMF7

minus20

0

20

IMF8

minus200

20

IMF9

minus200

20

IMF1

0

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 5 The IMFs and residue of WTI crude oil prices by CEEMDAN

Table 4 MCS between XGBOOST SVR FNN and ARIMA

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876XGBOOST 10000 10000 10000 10000 10000 10000SVR 00004 00004 04132 04132 00200 00200FNN 00002 00002 00248 00538 00016 00022ARIMA 00000 00000 00000 00000 00000 00000

survival and the second best model on this horizon Whenit comes to ARIMA ARIMA almost performs as good asXGBOOST in terms of evaluation criteria of global errorsbut it does not pass the MCS It indicates that ARIMA doesnot perform better than other models in most local errors ofdifferent samples

442 Experimental Results of Ensemble Models With EEMDor CEEMDAN the results of forecasting the crude oil pricesby XGBOOST SVR and FNN with horizons 1 3 and 6 areshown in Table 5

It can be seen from Table 5 that the RMSE and MAEvalues of the CEEMDAN-XGBOOST are the lowest ones

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 7: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

Complexity 7

42 Evaluation Criteria When we evaluate the accuracy ofthe models we focus on not only the numerical accuracybut also the accuracy of forecasting direction Therefore weselect the root-mean-square error (RMSE) and the meanabsolute error (MAE) to evaluate the numerical accuracy ofthe models Besides we use directional statistic (Dstat) as thecriterion for evaluating the accuracy of forecasting directionThe RMSE MAE and Dstat are defined as

RMSE = radic 1119873119873sum119894=1

(119910119905 minus 119910119905)2 (26)

MAE = 1119873 ( 119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816) (27)

Dstat = sum119873119894=2 119889119894119873 minus 1 119889119894 =

1 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) gt 00 (119910119905 minus 119910119905minus1) (119910119905 minus 119910119905minus1) lt 0

(28)

where 119910119905 is the actual crude oil prices at the time t119910119905 is theprediction and N is the size of the test set

In addition we take the Wilcoxon signed rank test(WSRT) for proving the significant differences between theforecasts of the selectedmodels [44]TheWSRT is a nonpara-metric statistical hypothesis test that can be used to evaluatewhether the population mean ranks of two predictions fromdifferent models on the same sample differ Meanwhile it isa paired difference test which can be used as an alternative tothe paired Student t-test The null hypothesis of the WSRTis whether the median of the loss differential series d(t) =g(119890119886(t)) minus g(119890119887(t)) is equal to zero or not where 119890119886(t) and 119890119887(t)are the error series of model a and model b respectively andg() is a loss function If the p value of pairs of models is below005 the test rejects the null hypothesis (there is a significantdifference between the forecasts of this pair of models) underthe confidence level of 95 In this way we can prove thatthere is a significant difference between the optimal modeland the others

However the criteria defined above are global If somesingular points exist the optimal model chosen by thesecriteria may not be the best one Thus we make the modelconfidence set (MCS) [31 45] in order to choose the optimalmodel convincingly

In order to calculate the p-value of the statistics accuratelythe MCS performs bootstrap on the prediction series whichcan soften the impact of the singular points For the j-thmodel suppose that the size of a bootstrapped sample isT and the t-th bootstrapped sample has the loss functionsdefined as

119871119895119905 = 1119879ℎ+119879sum119905=ℎ+1

lfloor119910119905 minus 119910119905rfloor (29)

Suppose that a set M0=mi i = 1 2 3 n thatcontains n models for any two models j and k the relativevalues of the loss between these two models can be defined as

119889119895119896119905 = 119871119895119905 minus 119871119896119905 (30)

According to the above definitions the set of superiormodels can be defined as

119872lowast equiv 119898119895 isin M0 119864 (119889119895119896119905) le 0 forall119898119896 isin M0 (31)

where E() represents the average valueThe MCS repeatedly performs the significant test in

M0 At each time the worst prediction model in the set iseliminated In the test the hypothesis is the null hypothesisof equal predictive ability (EPA) defined as

1198670 119864 (119889119895119896119905) = 0 forall119898119895 119898119896 isin 119872 sub M0 (32)

The MCS mainly depends on the equivalence test andelimination criteria The specific process is as follows

Step 1 AssumingM=1198720 at the level of significance 120572 use theequivalence test to test the null hypothesis1198670119872Step 2 If it accepts the null hypothesis and then it defines119872lowast1minus120572 = 119872 otherwise it eliminates the model which rejectsthe null hypothesis from M according to the eliminationcriteria The elimination will not stop until there are not anymodels that reject the null hypothesis in the setM In the endthe models in119872lowast1minus120572 are thought as surviving models

Meanwhile the MCS has two kinds of statistics that canbe defined as

119879119877 = max119895119896isin119872

1003816100381610038161003816100381611990511989511989610038161003816100381610038161003816 (33)

119879119878119876 = max119895119896isinM

1199052119895119896 (34)

119905119895119896 = 119889119895119896radicvar (119889119895119896) (35)

119889119895119896 = 1119879ℎ+119879sum119905=ℎ+1

119889119895119896119905 (36)

where T119877 and 119879119878119876 stand for the range statistics and thesemiquadratic statistic respectively and both statistics arebased on the t-statistics as shown in (35)-(36) These twostatistics (T119877 and 119879119878119876) are mainly to remove the model whosep-value is less than the significance level 120572 When the p-valueis greater than the significance level120572 themodels can surviveThe larger the p-value the more accurate the prediction ofthe modelWhen the p-value is equal to 1 it indicates that themodel is the optimal forecasting model

43 Parameter Settings To test the performance ofXGBOOST and CEEMDAN-XGBOOST we conducttwo groups of experiments single models that forecast crude

8 Complexity

Table 1 The ranges of the parameters for XGBOOST by grid search

Parameter Description rangeBooster Booster to use lsquogblinearrsquo lsquogbtreersquoN estimators Number of boosted trees 100200300400500Max depth Maximum tree depth for base learners 345678Min child weight Maximum delta step we allow each treersquos weight estimation to be 123456Gamma Minimum loss reduction required to make a further partition on a leaf node of the tree 001 005010203 Subsample Subsample ratio of the training instance 060708091Colsample Subsample ratio of columns when constructing each tree 060708091Reg alpha L1 regularization term on weights 00100501Reg lambda L2 regularization term on weights 00100501Learning rate Boosting learning rate 0010050070102

oil prices with original sequence and ensemble models thatforecast crude oil prices based on the ldquodecomposition andensemblerdquo framework

For single models we compare XGBOOST with onestatistical model ARIMA and two widely used AI-modelsSVR and FNN Since the existing research has shownthat EEMD significantly outperforms EMD in forecastingcrude oil prices [24 31] in the experiments we onlycompare CEEMDAN with EEMD Therefore we com-pare the proposed CEEMDAN-XGBOOST with EEMD-SVR EEMD-FNN EEMD-XGBOOST CEEMDAN-SVRand CEEMDAN-FNN

For ARIMA we use the Akaike information criterion(AIC) [46] to select the parameters (p-d-q) For SVR weuse RBF as kernel function and use grid search to opti-mize C and gamma in the ranges of 2012345678 and2minus9minus8minus7minus6minus5minus4minus3minus2minus10 respectively We use one hiddenlayer with 20 nodes for FNNWe use a grid search to optimizethe parameters for XGBOOST the search ranges of theoptimized parameters are shown in Table 1

We set 002 and 005 as the standard deviation of theadded white noise and set 250 and 500 as the numberof realizations of EEMD and CEEMDAN respectively Thedecomposition results of the original crude oil prices byEEMD and CEEMDAN are shown in Figures 4 and 5respectively

It can be seen from Figure 4 that among the componentsdecomposed byEEMD the first six IMFs show characteristicsof high frequency while the remaining six components showcharacteristics of low frequency However regarding thecomponents by CEEMDAN the first seven ones show clearhigh-frequency and the last four show low-frequency asshown in Figure 5

The experiments were conducted with Python 27 andMATLAB 86 on a 64-bit Windows 7 with 34 GHz I7 CPUand 32 GB memory Specifically we run FNN and MCSwith MATLAB and for the remaining work we use PythonRegarding XGBoost we used a widely used Python pack-age (httpsxgboostreadthedocsioenlatestpython) in theexperiments

Table 2The RMSE MAE and Dstat by single models with horizon= 1 3 and 6

Horizon Model RMSE MAE Dstat

1

XGBOOST 12640 09481 04827SVR 12899 09651 04826FNN 13439 09994 04837

ARIMA 12692 09520 04883

3

XGBOOST 20963 16159 04839SVR 22444 17258 05080FNN 21503 16512 04837

ARIMA 21056 16177 04901

6

XGBOOST 29269 22945 05158SVR 31048 24308 05183FNN 30803 24008 05028

ARIMA 29320 22912 05151

44 Experimental Results In this subsection we use a fixedvalue 6 as the lag order andwe forecast crude oil prices with 1-step-ahead 3-step-ahead and 6-step-ahead forecasting thatis to say the horizons for these three forecasting tasks are 1 3and 6 respectively

441 Experimental Results of Single Models For single mod-els we compare XGBOOST with state-of-the-art SVR FNNand ARIMA and the results are shown in Table 2

It can be seen from Table 2 that XGBOOST outperformsother models in terms of RMSE and MAE with horizons 1and 3 For horizon 6 XGBOOST achieves the best RMSE andthe second best MAE which is slightly worse than that byARIMA For horizon 1 FNNachieves theworst results amongthe four models however for horizons 3 and 6 SVR achievesthe worst results Regarding Dstat none of the models canalways outperform others and the best result of Dstat isachieved by SVR with horizon 6 It can be found that theRMSE and MAE values gradually increase with the increaseof horizon However the Dstat values do not show suchdiscipline All the values of Dstat are around 05 ie from

Complexity 9

minus505

IMF1

minus4minus2

024

IMF2

minus505

IMF3

minus505

IMF4

minus4minus2

024

IMF5

minus505

IMF6

minus200

20

IMF7

minus100

1020

IMF8

minus100

10

IMF9

minus20minus10

010

IMF1

0

minus200

20

IMF1

1

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 4 The IMFs and residue of WTI crude oil prices by EEMD

04826 to 05183 which are very similar to the results ofrandom guesses showing that it is very difficult to accuratelyforecast the direction of the movement of crude oil priceswith raw crude oil prices directly

To further verify the advantages of XGBOOST over othermodels we report the results by WSRT and MCS as shownin Tables 3 and 4 respectively As for WSRT the p-valuebetween XGBOOST and other models except ARIMA isbelow 005 which means that there is a significant differenceamong the forecasting results of XGBOOST SVR and FNNin population mean ranks Besides the results of MCS showthat the p-value of119879119877 and119879119878119876 of XGBOOST is always equal to1000 and prove that XGBOOST is the optimal model among

Table 3Wilcoxon signed rank test between XGBOOST SVR FNNand ARIMA

XGBOOST SVR FNN ARIMAXGBOOST 1 40378e-06 22539e-35 57146e-01SVR 40378e-06 1 46786e-33 07006FNN 22539e-35 46786e-33 1 69095e-02ARIMA 57146e-01 07006 69095e-02 1

all the models in terms of global errors and most local errorsof different samples obtained by bootstrap methods in MCSAccording to MCS the p-values of 119879119877 and 119879119878119876 of SVR onthe horizon of 3 are greater than 02 so SVR becomes the

10 Complexity

minus505

IMF1

minus505

IMF2

minus505

IMF3

minus505

IMF4

minus505

IMF5

minus100

10

IMF6

minus100

10

IMF7

minus20

0

20

IMF8

minus200

20

IMF9

minus200

20

IMF1

0

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 5 The IMFs and residue of WTI crude oil prices by CEEMDAN

Table 4 MCS between XGBOOST SVR FNN and ARIMA

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876XGBOOST 10000 10000 10000 10000 10000 10000SVR 00004 00004 04132 04132 00200 00200FNN 00002 00002 00248 00538 00016 00022ARIMA 00000 00000 00000 00000 00000 00000

survival and the second best model on this horizon Whenit comes to ARIMA ARIMA almost performs as good asXGBOOST in terms of evaluation criteria of global errorsbut it does not pass the MCS It indicates that ARIMA doesnot perform better than other models in most local errors ofdifferent samples

442 Experimental Results of Ensemble Models With EEMDor CEEMDAN the results of forecasting the crude oil pricesby XGBOOST SVR and FNN with horizons 1 3 and 6 areshown in Table 5

It can be seen from Table 5 that the RMSE and MAEvalues of the CEEMDAN-XGBOOST are the lowest ones

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 8: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

8 Complexity

Table 1 The ranges of the parameters for XGBOOST by grid search

Parameter Description rangeBooster Booster to use lsquogblinearrsquo lsquogbtreersquoN estimators Number of boosted trees 100200300400500Max depth Maximum tree depth for base learners 345678Min child weight Maximum delta step we allow each treersquos weight estimation to be 123456Gamma Minimum loss reduction required to make a further partition on a leaf node of the tree 001 005010203 Subsample Subsample ratio of the training instance 060708091Colsample Subsample ratio of columns when constructing each tree 060708091Reg alpha L1 regularization term on weights 00100501Reg lambda L2 regularization term on weights 00100501Learning rate Boosting learning rate 0010050070102

oil prices with original sequence and ensemble models thatforecast crude oil prices based on the ldquodecomposition andensemblerdquo framework

For single models we compare XGBOOST with onestatistical model ARIMA and two widely used AI-modelsSVR and FNN Since the existing research has shownthat EEMD significantly outperforms EMD in forecastingcrude oil prices [24 31] in the experiments we onlycompare CEEMDAN with EEMD Therefore we com-pare the proposed CEEMDAN-XGBOOST with EEMD-SVR EEMD-FNN EEMD-XGBOOST CEEMDAN-SVRand CEEMDAN-FNN

For ARIMA we use the Akaike information criterion(AIC) [46] to select the parameters (p-d-q) For SVR weuse RBF as kernel function and use grid search to opti-mize C and gamma in the ranges of 2012345678 and2minus9minus8minus7minus6minus5minus4minus3minus2minus10 respectively We use one hiddenlayer with 20 nodes for FNNWe use a grid search to optimizethe parameters for XGBOOST the search ranges of theoptimized parameters are shown in Table 1

We set 002 and 005 as the standard deviation of theadded white noise and set 250 and 500 as the numberof realizations of EEMD and CEEMDAN respectively Thedecomposition results of the original crude oil prices byEEMD and CEEMDAN are shown in Figures 4 and 5respectively

It can be seen from Figure 4 that among the componentsdecomposed byEEMD the first six IMFs show characteristicsof high frequency while the remaining six components showcharacteristics of low frequency However regarding thecomponents by CEEMDAN the first seven ones show clearhigh-frequency and the last four show low-frequency asshown in Figure 5

The experiments were conducted with Python 27 andMATLAB 86 on a 64-bit Windows 7 with 34 GHz I7 CPUand 32 GB memory Specifically we run FNN and MCSwith MATLAB and for the remaining work we use PythonRegarding XGBoost we used a widely used Python pack-age (httpsxgboostreadthedocsioenlatestpython) in theexperiments

Table 2The RMSE MAE and Dstat by single models with horizon= 1 3 and 6

Horizon Model RMSE MAE Dstat

1

XGBOOST 12640 09481 04827SVR 12899 09651 04826FNN 13439 09994 04837

ARIMA 12692 09520 04883

3

XGBOOST 20963 16159 04839SVR 22444 17258 05080FNN 21503 16512 04837

ARIMA 21056 16177 04901

6

XGBOOST 29269 22945 05158SVR 31048 24308 05183FNN 30803 24008 05028

ARIMA 29320 22912 05151

44 Experimental Results In this subsection we use a fixedvalue 6 as the lag order andwe forecast crude oil prices with 1-step-ahead 3-step-ahead and 6-step-ahead forecasting thatis to say the horizons for these three forecasting tasks are 1 3and 6 respectively

441 Experimental Results of Single Models For single mod-els we compare XGBOOST with state-of-the-art SVR FNNand ARIMA and the results are shown in Table 2

It can be seen from Table 2 that XGBOOST outperformsother models in terms of RMSE and MAE with horizons 1and 3 For horizon 6 XGBOOST achieves the best RMSE andthe second best MAE which is slightly worse than that byARIMA For horizon 1 FNNachieves theworst results amongthe four models however for horizons 3 and 6 SVR achievesthe worst results Regarding Dstat none of the models canalways outperform others and the best result of Dstat isachieved by SVR with horizon 6 It can be found that theRMSE and MAE values gradually increase with the increaseof horizon However the Dstat values do not show suchdiscipline All the values of Dstat are around 05 ie from

Complexity 9

minus505

IMF1

minus4minus2

024

IMF2

minus505

IMF3

minus505

IMF4

minus4minus2

024

IMF5

minus505

IMF6

minus200

20

IMF7

minus100

1020

IMF8

minus100

10

IMF9

minus20minus10

010

IMF1

0

minus200

20

IMF1

1

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 4 The IMFs and residue of WTI crude oil prices by EEMD

04826 to 05183 which are very similar to the results ofrandom guesses showing that it is very difficult to accuratelyforecast the direction of the movement of crude oil priceswith raw crude oil prices directly

To further verify the advantages of XGBOOST over othermodels we report the results by WSRT and MCS as shownin Tables 3 and 4 respectively As for WSRT the p-valuebetween XGBOOST and other models except ARIMA isbelow 005 which means that there is a significant differenceamong the forecasting results of XGBOOST SVR and FNNin population mean ranks Besides the results of MCS showthat the p-value of119879119877 and119879119878119876 of XGBOOST is always equal to1000 and prove that XGBOOST is the optimal model among

Table 3Wilcoxon signed rank test between XGBOOST SVR FNNand ARIMA

XGBOOST SVR FNN ARIMAXGBOOST 1 40378e-06 22539e-35 57146e-01SVR 40378e-06 1 46786e-33 07006FNN 22539e-35 46786e-33 1 69095e-02ARIMA 57146e-01 07006 69095e-02 1

all the models in terms of global errors and most local errorsof different samples obtained by bootstrap methods in MCSAccording to MCS the p-values of 119879119877 and 119879119878119876 of SVR onthe horizon of 3 are greater than 02 so SVR becomes the

10 Complexity

minus505

IMF1

minus505

IMF2

minus505

IMF3

minus505

IMF4

minus505

IMF5

minus100

10

IMF6

minus100

10

IMF7

minus20

0

20

IMF8

minus200

20

IMF9

minus200

20

IMF1

0

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 5 The IMFs and residue of WTI crude oil prices by CEEMDAN

Table 4 MCS between XGBOOST SVR FNN and ARIMA

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876XGBOOST 10000 10000 10000 10000 10000 10000SVR 00004 00004 04132 04132 00200 00200FNN 00002 00002 00248 00538 00016 00022ARIMA 00000 00000 00000 00000 00000 00000

survival and the second best model on this horizon Whenit comes to ARIMA ARIMA almost performs as good asXGBOOST in terms of evaluation criteria of global errorsbut it does not pass the MCS It indicates that ARIMA doesnot perform better than other models in most local errors ofdifferent samples

442 Experimental Results of Ensemble Models With EEMDor CEEMDAN the results of forecasting the crude oil pricesby XGBOOST SVR and FNN with horizons 1 3 and 6 areshown in Table 5

It can be seen from Table 5 that the RMSE and MAEvalues of the CEEMDAN-XGBOOST are the lowest ones

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 9: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

Complexity 9

minus505

IMF1

minus4minus2

024

IMF2

minus505

IMF3

minus505

IMF4

minus4minus2

024

IMF5

minus505

IMF6

minus200

20

IMF7

minus100

1020

IMF8

minus100

10

IMF9

minus20minus10

010

IMF1

0

minus200

20

IMF1

1

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 4 The IMFs and residue of WTI crude oil prices by EEMD

04826 to 05183 which are very similar to the results ofrandom guesses showing that it is very difficult to accuratelyforecast the direction of the movement of crude oil priceswith raw crude oil prices directly

To further verify the advantages of XGBOOST over othermodels we report the results by WSRT and MCS as shownin Tables 3 and 4 respectively As for WSRT the p-valuebetween XGBOOST and other models except ARIMA isbelow 005 which means that there is a significant differenceamong the forecasting results of XGBOOST SVR and FNNin population mean ranks Besides the results of MCS showthat the p-value of119879119877 and119879119878119876 of XGBOOST is always equal to1000 and prove that XGBOOST is the optimal model among

Table 3Wilcoxon signed rank test between XGBOOST SVR FNNand ARIMA

XGBOOST SVR FNN ARIMAXGBOOST 1 40378e-06 22539e-35 57146e-01SVR 40378e-06 1 46786e-33 07006FNN 22539e-35 46786e-33 1 69095e-02ARIMA 57146e-01 07006 69095e-02 1

all the models in terms of global errors and most local errorsof different samples obtained by bootstrap methods in MCSAccording to MCS the p-values of 119879119877 and 119879119878119876 of SVR onthe horizon of 3 are greater than 02 so SVR becomes the

10 Complexity

minus505

IMF1

minus505

IMF2

minus505

IMF3

minus505

IMF4

minus505

IMF5

minus100

10

IMF6

minus100

10

IMF7

minus20

0

20

IMF8

minus200

20

IMF9

minus200

20

IMF1

0

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 5 The IMFs and residue of WTI crude oil prices by CEEMDAN

Table 4 MCS between XGBOOST SVR FNN and ARIMA

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876XGBOOST 10000 10000 10000 10000 10000 10000SVR 00004 00004 04132 04132 00200 00200FNN 00002 00002 00248 00538 00016 00022ARIMA 00000 00000 00000 00000 00000 00000

survival and the second best model on this horizon Whenit comes to ARIMA ARIMA almost performs as good asXGBOOST in terms of evaluation criteria of global errorsbut it does not pass the MCS It indicates that ARIMA doesnot perform better than other models in most local errors ofdifferent samples

442 Experimental Results of Ensemble Models With EEMDor CEEMDAN the results of forecasting the crude oil pricesby XGBOOST SVR and FNN with horizons 1 3 and 6 areshown in Table 5

It can be seen from Table 5 that the RMSE and MAEvalues of the CEEMDAN-XGBOOST are the lowest ones

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 10: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

10 Complexity

minus505

IMF1

minus505

IMF2

minus505

IMF3

minus505

IMF4

minus505

IMF5

minus100

10

IMF6

minus100

10

IMF7

minus20

0

20

IMF8

minus200

20

IMF9

minus200

20

IMF1

0

Jan 02 1986 Dec 22 1989 Dec 16 1993 Dec 26 1997 Jan 15 2002 Feb 07 2006 Feb 22 2010 Mar 03 2014 Mar 19 201820406080

Resid

ue

Figure 5 The IMFs and residue of WTI crude oil prices by CEEMDAN

Table 4 MCS between XGBOOST SVR FNN and ARIMA

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876XGBOOST 10000 10000 10000 10000 10000 10000SVR 00004 00004 04132 04132 00200 00200FNN 00002 00002 00248 00538 00016 00022ARIMA 00000 00000 00000 00000 00000 00000

survival and the second best model on this horizon Whenit comes to ARIMA ARIMA almost performs as good asXGBOOST in terms of evaluation criteria of global errorsbut it does not pass the MCS It indicates that ARIMA doesnot perform better than other models in most local errors ofdifferent samples

442 Experimental Results of Ensemble Models With EEMDor CEEMDAN the results of forecasting the crude oil pricesby XGBOOST SVR and FNN with horizons 1 3 and 6 areshown in Table 5

It can be seen from Table 5 that the RMSE and MAEvalues of the CEEMDAN-XGBOOST are the lowest ones

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 11: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

Complexity 11

Table 5 The RMSE MAE and Dstat by the models with EEMD orCEEMDAN

Horizon Model RMSE MAE Dstat

1

CEEMDAN-XGBOOST 04151 03023 08783EEMD-XGBOOST 09941 07685 07109CEEMDAN-SVR 08477 07594 09054

EEMD-SVR 11796 09879 08727CEEMDAN-FNN 12574 10118 07597

EEMD-FNN 26835 19932 07361

3

CEEMDAN-XGBOOST 08373 06187 06914EEMD-XGBOOST 14007 10876 06320CEEMDAN-SVR 12399 10156 07092

EEMD-SVR 12366 10275 07092CEEMDAN-FNN 12520 09662 07061

EEMD-FNN 12046 08637 06959

6

CEEMDAN-XGBOOST 12882 09831 06196EEMD-XGBOOST 17719 13765 06165CEEMDAN-SVR 13453 10296 06683

EEMD-SVR 13730 11170 06485CEEMDAN-FNN 18024 13647 06422

EEMD-FNN 27786 20495 06337

among those by all methods with each horizon For examplewith horizon 1 the values of RMSE and MAE are 04151 and03023which are far less than the second values of RMSE andMAE ie 08477 and 07594 respectively With the horizonincreases the corresponding values of each model increasein terms of RMSE and MAE However the CEEMDAN-XGBOOST still achieves the lowest values of RMSE andMAEwith each horizon Regarding the values of Dstat all thevalues are far greater than those by random guesses showingthat the ldquodecomposition and ensemblerdquo framework is effectivefor directional forecasting Specifically the values of Dstatare in the range between 06165 and 09054 The best Dstatvalues in all horizons are achieved by CEEMDAN-SVR orEEMD-SVR showing that among the forecasters SVR is thebest one for directional forecasting although correspondingvalues of RMSE and MAE are not the best As for thedecomposition methods when the forecasters are fixedCEEMDAN outperforms EEMD in 8 8 and 8 out of 9 casesin terms of RMSE MAE and Dstat respectively showingthe advantages of CEEMDAN over EEMD Regarding theforecasters when combined with CEEMDAN XGBOOST isalways superior to other forecasters in terms of RMSE andMAE However when combined with EEMD XGBOOSToutperforms SVR and FNN with horizon 1 and FNN withhorizon 6 in terms of RMSE and MAE With horizons 1and 6 FNN achieves the worst results of RMSE and MAEThe results also show that good values of RMSE usually areassociated with good values of MAE However good valuesof RMSE or MAE do not always mean good Dstat directly

For the ensemble models we also took aWilcoxon signedrank test and an MCS test based on the errors of pairs ofmodels We set 02 as the level of significance in MCS and005 as the level of significance in WSRT The results areshown in Tables 6 and 7

From these two tables it can be seen that regardingthe results of WSRT the p-value between CEEMDAN-XGBOOST and any other models except EEMD-FNN arebelow 005 demonstrating that there is a significant differ-ence on the population mean ranks between CEEMDAN-XGBOOST and any other models except EEMD-FNNWhatis more the MCS shows that the p-value of 119879119877 and 119879119878119876of CEEMDAN-XGBOOST is always equal to 1000 anddemonstrates that CEEMDAN-XGBOOST is the optimalmodel among all models in terms of global errors and localerrorsMeanwhile the p-values of119879119877 and119879119878119876 of EEMD-FNNare greater than othermodels except CEEMDAN-XGBOOSTand become the second best model with horizons 3 and 6in MCS Meanwhile with the horizon 6 the CEEMDAN-SVR is also the second best model Besides the p-values of119879119877 and 119879119878119876 of EEMD-SVR and CEEMDAN-SVR are up to02 and they become the surviving models with horizon 6 inMCS

From the results of single models and ensemble modelswe can draw the following conclusions (1) single modelsusually cannot achieve satisfactory results due to the non-linearity and nonstationarity of raw crude oil prices Asa single forecaster XGBOOST can achieve slightly betterresults than some state-of-the-art algorithms (2) ensemblemodels can significantly improve the forecasting accuracy interms of several evaluation criteria following the ldquodecom-position and ensemblerdquo framework (3) as a decompositionmethod CEEMDAN outperforms EEMD in most cases (4)the extensive experiments demonstrate that the proposedCEEMDAN-XGBOOST is promising for forecasting crudeoil prices

45 Discussion In this subsection we will study the impactsof several parameters related to the proposed CEEMDAN-XGBOOST

451The Impact of the Number of Realizations in CEEMDANIn (2) it is shown that there are N realizations 119909119894[119905] inCEEMDAN We explore how the number of realizations inCEEMDAN can influence on the results of forecasting crudeoil prices by CEEMDAN-XGBOOST with horizon 1 and lag6 And we set the number of realizations in CEEMDAN inthe range of 102550751002505007501000 The resultsare shown in Figure 6

It can be seen from Figure 6 that for RMSE and MAEthe bigger the number of realizations is the more accurateresults the CEEMDAN-XGBOOST can achieve When thenumber of realizations is less than or equal to 500 the valuesof both RMSE and MAE decrease with increasing of thenumber of realizations However when the number is greaterthan 500 these two values are increasing slightly RegardingDstat when the number increases from 10 to 25 the value ofDstat increases rapidly and then it increases slowly with thenumber increasing from 25 to 500 After that Dstat decreasesslightly It is shown that the value of Dstat reaches the topvalues with the number of realizations 500 Therefore 500is the best value for the number of realizations in terms ofRMSE MAE and Dstat

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 12: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

12 Complexity

Table 6 Wilcoxon signed rank test between XGBOOST SVR and FNN with EEMD or CEEMDAN

CEEMDAN-XGBOOST

EEMD-XGBOOST

CEEMDAN-SVR

EEMD-SVR

CEEMDAN-FNN

EEMD-FNN

CEEMDAN-XGBOOST 1 35544e-05 18847e-50 0 0028 16039e-187 00726EEMD-XGBOOST 35544e-05 1 45857e-07 0 3604 82912e-82 00556CEEMDAN-SVR 18847e-50 45857e-07 1 49296e-09 57753e-155 86135e-09EEMD-SVR 00028 0 3604 49296e-09 1 25385e-129 00007

CEEMDAN-FNN 16039e-187 82912e-82 57753e-155 25385e-129 1 81427e-196

EEMD-FNN 00726 00556 86135e-09 00007 81427e-196 1

Table 7 MCS between XGBOOST SVR and FNN with EEMD or CEEMDAN

HORIZON=1 HORIZON=3 HORIZON=6119879119877 119879119878119876 119879119877 119879119878119876 119879119877 119879119878119876CEEMDAN-XGBOOST 1 1 1 1 1 1EEMD-XGBOOST 0 0 0 0 0 00030CEEMDAN-SVR 0 00002 00124 00162 0 8268 08092EEMD-SVR 0 0 00008 0004 07872 07926CEEMDAN-FNN 0 0 00338 00532 02924 03866EEMD-FNN 0 00002 04040 04040 0 8268 08092

452 The Impact of the Lag Orders In this section weexplore how the number of lag orders impacts the predictionaccuracy of CEEMDAN-XGBOOST on the horizon of 1 Inthis experiment we set the number of lag orders from 1 to 10and the results are shown in Figure 7

According to the empirical results shown in Figure 7 itcan be seen that as the lag order increases from 1 to 2 thevalues of RMSE andMAEdecrease sharply while that of Dstatincreases drastically After that the values of RMSE of MAEremain almost unchanged (or increase very slightly) withthe increasing of lag orders However for Dstat the valueincreases sharply from 1 to 2 and then decreases from 2 to3 After the lag order increases from 3 to 5 the Dstat staysalmost stationary Overall when the value of lag order is upto 5 it reaches a good tradeoff among the values of RMSEMAE and Dstat

453 The Impact of the Noise Strength in CEEMDAN Apartfrom the number of realizations the noise strength inCEEMDAN which stands for the standard deviation of thewhite noise in CEEMDAN also affects the performance ofCEEMDAN-XGBOOST Thus we set the noise strength inthe set of 001 002 003 004 005 006 007 to explorehow the noise strength in CEEMDAN affects the predictionaccuracy of CEEMDAN-XGBOOST on a fixed horizon 1 anda fixed lag 6

As shown in Figure 8 when the noise strength in CEEM-DAN is equal to 005 the values of RMSE MAE and Dstatachieve the best results simultaneously When the noisestrength is less than or equal to 005 except 003 the valuesof RMSE MAE and Dstat become better and better with theincrease of the noise strength However when the strengthis greater than 005 the values of RMSE MAE and Dstat

become worse and worse The figure indicates that the noisestrength has a great impact on forecasting results and an idealrange for it is about 004-006

5 Conclusions

In this paper we propose a novelmodel namely CEEMDAN-XGBOOST to forecast crude oil prices At first CEEMDAN-XGBOOST decomposes the sequence of crude oil prices intoseveral IMFs and a residue with CEEMDAN Then it fore-casts the IMFs and the residue with XGBOOST individuallyFinally CEEMDAN-XGBOOST computes the sum of theprediction of the IMFs and the residue as the final forecastingresults The experimental results show that the proposedCEEMDAN-XGBOOST significantly outperforms the com-pared methods in terms of RMSE and MAE Although theperformance of the CEEMDAN-XGBOOST on forecastingthe direction of crude oil prices is not the best theMCS showsthat the CEEMDAN-XGBOOST is still the optimal modelMeanwhile it is proved that the number of the realizationslag and the noise strength for CEEMDAN are the vitalfactors which have great impacts on the performance of theCEEMDAN-XGBOOST

In the future we will study the performance of theCEEMDAN-XGBOOST on forecasting crude oil prices withdifferent periods We will also apply the proposed approachfor forecasting other energy time series such as wind speedelectricity load and carbon emissions prices

Data Availability

The data used to support the findings of this study areincluded within the article

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 13: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

Complexity 13

RMSEMAEDstat

Dsta

t

RMSE

MA

E

070

065

060

050

055

045

040

035

03010 25 50 500 750 1000 75 100 250

088

087

086

085

084

083

082

081

080

The number of realizations in CEEMDAN

Figure 6 The impact of the number of IMFs with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

RMSE

MA

E

2 4 6 8 10lag order

088

086

084

082

080

078

076

074

12

10

08

06

04

Figure 7 The impact of the number of lag orders with CEEMDAN-XGBOOST

RMSEMAEDstat

Dsta

t

The white noise strength of CEEMDAN

088

086

084

082

080

078

076

074

10

08

09

06

07

04

05

03

001 002 003 004 005 006 007

RMSE

MA

E

Figure 8 The impact of the value of the white noise of CEEMDAN with CEEMDAN-XGBOOST

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 14: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

14 Complexity

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work was supported by the Fundamental ResearchFunds for the Central Universities (Grants no JBK1902029no JBK1802073 and no JBK170505) the Natural ScienceFoundation of China (Grant no 71473201) and the ScientificResearch Fund of Sichuan Provincial Education Department(Grant no 17ZB0433)

References

[1] H Miao S Ramchander T Wang and D Yang ldquoInfluentialfactors in crude oil price forecastingrdquo Energy Economics vol 68pp 77ndash88 2017

[2] L Yu S Wang and K K Lai ldquoForecasting crude oil price withan EMD-based neural network ensemble learning paradigmrdquoEnergy Economics vol 30 no 5 pp 2623ndash2635 2008

[3] M Ye J Zyren C J Blumberg and J Shore ldquoA short-run crudeoil price forecast model with ratchet effectrdquo Atlantic EconomicJournal vol 37 no 1 pp 37ndash50 2009

[4] C Morana ldquoA semiparametric approach to short-term oil priceforecastingrdquo Energy Economics vol 23 no 3 pp 325ndash338 2001

[5] H Naser ldquoEstimating and forecasting the real prices of crudeoil A data richmodel using a dynamic model averaging (DMA)approachrdquo Energy Economics vol 56 pp 75ndash87 2016

[6] XGong andB Lin ldquoForecasting the goodandbaduncertaintiesof crude oil prices using a HAR frameworkrdquo Energy Economicsvol 67 pp 315ndash327 2017

[7] F Wen X Gong and S Cai ldquoForecasting the volatility of crudeoil futures using HAR-type models with structural breaksrdquoEnergy Economics vol 59 pp 400ndash413 2016

[8] H Chiroma S Abdul-Kareem A Shukri Mohd Noor et alldquoA Review on Artificial Intelligence Methodologies for theForecasting of Crude Oil Pricerdquo Intelligent Automation and SoftComputing vol 22 no 3 pp 449ndash462 2016

[9] S Wang L Yu and K K Lai ldquoA novel hybrid AI systemframework for crude oil price forecastingrdquo in Data Miningand Knowledge Management Y Shi W Xu and Z Chen Edsvol 3327 of Lecture Notes in Computer Science pp 233ndash242Springer Berlin Germany 2004

[10] J Barunık and B Malinska ldquoForecasting the term structure ofcrude oil futures prices with neural networksrdquo Applied Energyvol 164 pp 366ndash379 2016

[11] Y Chen K He and G K F Tso ldquoForecasting crude oil prices-a deep learning based modelrdquo Procedia Computer Science vol122 pp 300ndash307 2017

[12] R Tehrani and F Khodayar ldquoA hybrid optimized artificialintelligent model to forecast crude oil using genetic algorithmrdquoAfrican Journal of Business Management vol 5 no 34 pp13130ndash13135 2011

[13] L Yu Y Zhao and L Tang ldquoA compressed sensing basedAI learning paradigm for crude oil price forecastingrdquo EnergyEconomics vol 46 no C pp 236ndash245 2014

[14] L Yu H Xu and L Tang ldquoLSSVR ensemble learning withuncertain parameters for crude oil price forecastingrdquo AppliedSoft Computing vol 56 pp 692ndash701 2017

[15] Y-L Qi and W-J Zhang ldquoThe improved SVM method forforecasting the fluctuation of international crude oil pricerdquo inProceedings of the 2009 International Conference on ElectronicCommerce and Business Intelligence (ECBI rsquo09) pp 269ndash271IEEE China June 2009

[16] M S AL-Musaylh R C Deo Y Li and J F AdamowskildquoTwo-phase particle swarm optimized-support vector regres-sion hybrid model integrated with improved empirical modedecomposition with adaptive noise for multiple-horizon elec-tricity demand forecastingrdquo Applied Energy vol 217 pp 422ndash439 2018

[17] LHuang and JWang ldquoForecasting energy fluctuationmodel bywavelet decomposition and stochastic recurrent wavelet neuralnetworkrdquo Neurocomputing vol 309 pp 70ndash82 2018

[18] H Zhao M Sun W Deng and X Yang ldquoA new featureextraction method based on EEMD and multi-scale fuzzyentropy for motor bearingrdquo Entropy vol 19 no 1 Article ID14 2017

[19] W Deng S Zhang H Zhao and X Yang ldquoA Novel FaultDiagnosis Method Based on Integrating Empirical WaveletTransform and Fuzzy Entropy for Motor Bearingrdquo IEEE Accessvol 6 pp 35042ndash35056 2018

[20] Q Fu B Jing P He S Si and Y Wang ldquoFault FeatureSelection and Diagnosis of Rolling Bearings Based on EEMDand Optimized Elman AdaBoost Algorithmrdquo IEEE SensorsJournal vol 18 no 12 pp 5024ndash5034 2018

[21] T Li and M Zhou ldquoECG classification using wavelet packetentropy and random forestsrdquo Entropy vol 18 no 8 p 285 2016

[22] M Blanco-Velasco B Weng and K E Barner ldquoECG signaldenoising and baseline wander correction based on the empir-ical mode decompositionrdquo Computers in Biology and Medicinevol 38 no 1 pp 1ndash13 2008

[23] J Lee D D McManus S Merchant and K H ChonldquoAutomatic motion and noise artifact detection in holterECG data using empirical mode decomposition and statisticalapproachesrdquo IEEE Transactions on Biomedical Engineering vol59 no 6 pp 1499ndash1506 2012

[24] L Fan S Pan Z Li and H Li ldquoAn ICA-based supportvector regression scheme for forecasting crude oil pricesrdquoTechnological Forecasting amp Social Change vol 112 pp 245ndash2532016

[25] E Jianwei Y Bao and J Ye ldquoCrude oil price analysis andforecasting based on variationalmode decomposition and inde-pendent component analysisrdquo Physica A Statistical Mechanicsand Its Applications vol 484 pp 412ndash427 2017

[26] N E Huang Z Shen S R Long et al ldquoThe empiricalmode decomposition and the Hilbert spectrum for nonlinearand non-stationary time series analysisrdquo Proceedings of theRoyal Society of London Series A Mathematical Physical andEngineering Sciences vol 454 pp 903ndash995 1998

[27] Z Wu and N E Huang ldquoEnsemble empirical mode decom-position a noise-assisted data analysis methodrdquo Advances inAdaptive Data Analysis vol 1 no 1 pp 1ndash41 2009

[28] L Yu W Dai L Tang and J Wu ldquoA hybrid grid-GA-basedLSSVR learning paradigm for crude oil price forecastingrdquoNeural Computing andApplications vol 27 no 8 pp 2193ndash22152016

[29] L Tang W Dai L Yu and S Wang ldquoA novel CEEMD-based eelm ensemble learning paradigm for crude oil priceforecastingrdquo International Journal of Information Technology ampDecision Making vol 14 no 1 pp 141ndash169 2015

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 15: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

Complexity 15

[30] T Li M Zhou C Guo et al ldquoForecasting crude oil price usingEEMD and RVM with adaptive PSO-based kernelsrdquo Energiesvol 9 no 12 p 1014 2016

[31] T Li Z Hu Y Jia J Wu and Y Zhou ldquoForecasting CrudeOil PricesUsing Ensemble EmpiricalModeDecomposition andSparse Bayesian Learningrdquo Energies vol 11 no 7 2018

[32] M E TorresMA Colominas G Schlotthauer andP FlandrinldquoA complete ensemble empirical mode decomposition withadaptive noiserdquo in Proceedings of the 36th IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo11) pp 4144ndash4147 IEEE Prague Czech Republic May 2011

[33] M A Colominas G Schlotthauer andM E Torres ldquoImprovedcomplete ensemble EMD a suitable tool for biomedical signalprocessingrdquo Biomedical Signal Processing and Control vol 14no 1 pp 19ndash29 2014

[34] T Peng J Zhou C Zhang and Y Zheng ldquoMulti-step aheadwind speed forecasting using a hybrid model based on two-stage decomposition technique and AdaBoost-extreme learn-ing machinerdquo Energy Conversion and Management vol 153 pp589ndash602 2017

[35] S Dai D Niu and Y Li ldquoDaily peak load forecasting basedon complete ensemble empirical mode decomposition withadaptive noise and support vector machine optimized bymodified grey Wolf optimization algorithmrdquo Energies vol 11no 1 2018

[36] Y Lv R Yuan TWangH Li andG Song ldquoHealthDegradationMonitoring and Early Fault Diagnosis of a Rolling BearingBased on CEEMDAN and Improved MMSErdquo Materials vol11 no 6 p 1009 2018

[37] R Abdelkader A Kaddour A Bendiabdellah and Z Der-ouiche ldquoRolling Bearing FaultDiagnosis Based on an ImprovedDenoising Method Using the Complete Ensemble EmpiricalMode Decomposition and the OptimizedThresholding Opera-tionrdquo IEEE Sensors Journal vol 18 no 17 pp 7166ndash7172 2018

[38] Y Lei Z Liu J Ouazri and J Lin ldquoA fault diagnosis methodof rolling element bearings based on CEEMDANrdquo Proceedingsof the Institution of Mechanical Engineers Part C Journal ofMechanical Engineering Science vol 231 no 10 pp 1804ndash18152017

[39] T Chen and C Guestrin ldquoXGBoost a scalable tree boostingsystemrdquo in Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining KDD2016 pp 785ndash794 ACM New York NY USA August 2016

[40] B Zhai and J Chen ldquoDevelopment of a stacked ensemblemodelfor forecasting and analyzing daily average PM25 concentra-tions in Beijing Chinardquo Science of the Total Environment vol635 pp 644ndash658 2018

[41] S P Chatzis V Siakoulis A Petropoulos E Stavroulakis andN Vlachogiannakis ldquoForecasting stock market crisis eventsusing deep and statistical machine learning techniquesrdquo ExpertSystems with Applications vol 112 pp 353ndash371 2018

[42] J Ke H Zheng H Yang and X Chen ldquoShort-term forecastingof passenger demand under on-demand ride services A spatio-temporal deep learning approachrdquoTransportation Research PartC Emerging Technologies vol 85 pp 591ndash608 2017

[43] J Friedman T Hastie and R Tibshirani ldquoSpecial Invited PaperAdditive Logistic Regression A Statistical View of BoostingRejoinderdquo The Annals of Statistics vol 28 no 2 pp 400ndash4072000

[44] J D Gibbons and S Chakraborti ldquoNonparametric statisticalinferencerdquo in International Encyclopedia of Statistical ScienceM Lovric Ed pp 977ndash979 Springer Berlin Germany 2011

[45] P RHansen A Lunde and JMNason ldquoThemodel confidencesetrdquo Econometrica vol 79 no 2 pp 453ndash497 2011

[46] H Liu H Q Tian and Y F Li ldquoComparison of two newARIMA-ANN and ARIMA-Kalman hybrid methods for windspeed predictionrdquo Applied Energy vol 98 no 1 pp 415ndash4242012

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 16: A CEEMDAN and XGBOOST-Based Approach to Forecast Crude …downloads.hindawi.com/journals/complexity/2019/4392785.pdf · XGBo XGBo XGBo XGBoost Decompositio Individual orecasting Esemb

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom