Combining LSTM Network Ensemble via Adaptive Weighting …downloads.hindawi.com/journals/mpe/2018/2470171.pdfApr 06, 2018 · forecasting instability. To overcome the above challenges,

Research ArticleCombining LSTM Network Ensemble via Adaptive Weighting forImproved Time Series Forecasting

Jae Young Choi1 and Bumshik Lee 2

1Division of Computer amp Electronic Systems Engineering Hankuk University of Foreign Studies Yongin-si Republic of Korea2Department of Information and Communications Engineering Chosun University Gwangju Republic of Korea

Correspondence should be addressed to Bumshik Lee bsleechosunackr

Received 6 April 2018 Revised 16 July 2018 Accepted 26 July 2018 Published 5 August 2018

Academic Editor Wlodzimierz Ogryczak

Copyright copy 2018 Jae Young Choi and Bumshik Lee This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Time series forecasting is essential for various engineering applications in finance geology and information technology etcLong Short-Term Memory (LSTM) networks are nowadays gaining renewed interest and they are replacing many practicalimplementations of the time series forecasting systems This paper presents a novel LSTM ensemble forecasting algorithm thateffectively combines multiple forecast (prediction) results from a set of individual LSTM networks The main advantages of ourLSTM ensemble method over other state-of-the-art ensemble techniques are summarized as follows (1) we develop a novel wayof dynamically adjusting the combining weights that are used for combining multiple LSTM models to produce the compositeprediction output for this ourmethod is devised for updating combining weights at each time step in an adaptive and recursivewayby using both past prediction errors and forgetting weight factor (2) our method is capable of well capturing nonlinear statisticalproperties in the time series which considerably improves the forecasting accuracy (3) ourmethod is straightforward to implementand computationally efficient when it comes to runtime performance because it does not require the complex optimization in theprocess of finding combining weights Comparative experiments demonstrate that our proposed LSTM ensemble method achievesstate-of-the-art forecasting performance on four real-life time series datasets publicly available

1 Introduction

Time series is a set of values wherein all values of one indexare arranged in chronological order The objective of timeseries forecasting is to estimate the next value of a sequencegiven a number of previously observed values To this endforecast (prediction) models are needed to predict the futurebased on historical data [1] The traditional mathematical(statistical) models such as Least Square Regression (LSR)[2] Autoregressive Moving Average [3ndash5] and Neural Net-works [6] were widely used and reported in literature fortheir utility in practical time series forecasting

Time series forecasting has fundamental importance innumerous practical engineering fields such as energy financegeology and information technology [7ndash12] For instanceforecasting of electricity consumption is of great importancein deregulated electricity markets for all of the stakeholdersenergy wholesalers traders retailers and consumers [10]

The ability to accurately forecast the future electricity con-sumption will allow them to perform effective planning andefficient operations leading to ultimate financial profits forthem Moreover energy-related time series forecasting playsan important role in the planning and working of the powergrid system [7 8] for instance accurate and stablewind speedforecast has primary importance in the wind power industryand make an influence on power-system management andthe stability of market economics [11]

However the time series forecasting in the aforemen-tioned applications is an inherently challenging problem dueto the characteristics of dynamicity and nonstationarity [16 13 14] Additionally any data volatility leads to increasedforecasting instability To overcome the above challengesthere is a growing consensus that ensemble forecasting [3ndash6 13 14] ie forecasting model combination has advantageover using a single individual model in terms of enhanc-ing forecasting accuracy The most common approach of

HindawiMathematical Problems in EngineeringVolume 2018 Article ID 2470171 8 pageshttpsdoiorg10115520182470171

2 Mathematical Problems in Engineering

Validation time-series data

Training time-series data

Learning

Sequence length lsquo1rsquo


Learning


Learning

Sequence length lsquoMrsquo

LSTM 2

LSTM 1

LSTM 3

LSTM M

LSTM Ensemble Training

Weighted combination of forecast output

EnsembleForecast results

Adaptive weight determination based on

past prediction error and forgetting factor

LSTM Ensemble Forecasting

Weights

Learning

1ＭＮ LSTM network



MＭＮ LSTM network

Figure 1 Overall framework of the proposed LSTM ensemble method for the purpose of time series forecasting

ensemble forecasting is simple averaging that assigns equalweights to all forecasting component models [3ndash5] The sim-ple averaging approach is sensitive to extreme values (ie out-liers) and unreliable for skewed distributions [6 14] To copewith this limitation weighted combination schemes havebeen proposedThe authors in [2] proposed the Least SquareRegression (LSR) that attempts to find the optimal weightsby minimizing the Sum of Squared Error (SSE) The authorsin [15] adopted the Average of In-sample Weights (AIW)schemewhere each weight is simply computed as the normal-ized inverse absolute forecasting error of an individual model[16]The authors in [6] developed a so-called Neural NetworkBased Linear Ensemble (NNLE) method that determines thecombining weights through a neural network structure

Recently a class of mathematical models called Recur-rent Neural Networks (RNNs) [17] are nowadays gainingrenew interest among researchers and they are replacingmany kinds of practical implementation of the forecastingsystems previously based on statistical models [1] In par-ticular Long Short-TermMemory (LSTM) networksmdashwhichare a variation of RNNmdashhave proven to be one of the mostpowerful RNN models for time series forecasting and otherrelated applications [1 15] The LSTM networks can be con-structed in such a way that they are able to remember long-term relationships in the data The LSTM networks have beenshown to model temporal sequences and their long-rangedependencies more accurately than original RNNmodel [1]However despite the recent popularity of the LSTM net-works their applicability in the context of ensemble forecast-ing has not been investigated yet Hence to our knowledgehow to best combine multiple forecast results of individualLSTMmodels still remains a challenging and open question

In this paper we present a novel LSTM ensemble forecast-ingmethod for improved time series forecasting which effec-tively combines multiple forecasts (predictions)(throughout

the remainder of this paper both terms of ldquoforecastrdquo andldquopredictionrdquo will be used interchangeably) inferred from thedifferent and diverse LSTM models Especially we develop anovel way of dynamically adjusting the so-called combiningweights that are used for combining multiple LSTM modelsto produce the composite prediction output The main ideaof our proposed method is to update combining weightsat each time step in an adaptive and recursive way Forthis the weights can be determined by using both pastprediction errors (measured up to the current time step)and forgetting weight factor The weights are assigned toindividual LSTM models which improve the forecastingaccuracy to a large extent The overall framework of ourproposed method is illustrated in Figure 1 Results showthat the proposed LSTM ensemble achieves state-of-the-artforecasting performance on real-world time series datasetpublicly available and it is considerably better than otherrecently developed ensemble forecasting methods as it will beshown in Section 4

The rest of this paper is organized as follows Section 2describes our proposed approach for building an ensembleof LSTMs that is well-suited for use in time series forecastingThen we discuss how to find combining weights for thepurpose of adaptively combining LSTM models Section 4presents and discusses our comparative experimental resultsFinally the conclusion is given in Section 5

2 Building LSTM Ensemble forTime Series Forecasting

In the proposed method an ensemble of LSTM networksshould be first constructed in an effective way of maximizinga complementary effect during the combination of multipleLSTM forecast results aiming to improve forecasting accu-racy Before explaining our LSTM ensemble construction

Mathematical Problems in Engineering 3

we present a brief review on LSTM network for the sake ofcompleteness as follows LSTM networks and their variantshave been successfully applied to time series forecasting [1]LSTM networks are applied on sequential data as inputwhich without loss of generality means data samples thatchange over time Input into LSTM networks involves aso-called sequence length parameter (ie the number oftime steps) that is defined by the sample values over afinite time window [19] Thus sequence length is how werepresent the change in the input vector over time this isthe time series aspect to the input data The architectureof LSTM networks is generally composed of units calledmemory blocks The memory block contains memory cellsand three controlling gates ie input gate forget gate andoutput gate The memory cell is designed for preservingthe knowledge of previous step with self-connections thatcan store (remember) the temporal state of the networkwhile the gates control the update and flow direction ofinformation

For time series prediction of the input 119909119905 the LSTMupdates the memory cell 119888119905 and outputs a hidden state ℎ119905according to the following calculations which are performedat each time step 119905 The below equations give the fullmechanism for a modern LSTM with forget gates [15]

119894119905 = 120590 (W119909119894119909119905 +Wℎ119894119909119905minus1 + 119887119894) 119891119905 = 120590 (W119909119891119909119905 +Wℎ119891119909119905minus1 + 119887119891) 119900119905 = 120590 (W119909119900119909119905 +Wℎ119900119909119905minus1 + 119887119900) 119888119905 = 119891119905 otimes 119888119905minus1 + 119894119905 otimes 120601 (W119909119888119909119905 +Wℎ119888ℎ119905minus1 + 119887119888) ℎ119905 = 119900119905 otimes 120601 (119888119905)

(1)

In (1) W119898119899 denotes the weight of the connection from gate119898 and gate n and 119887119899 is bias parameter to learn where119898 isin 119909 ℎ and 119899 isin 119894 119891 119900 119888 In addition otimes represents theelement-wise multiplication (Hadamard product) 120590 standsfor the standard logistic sigmoid function and 120601 denotes thetanh function 120590(119909) = 1(1 + 119890minus119909) and 120601(119909) = (119890119909 minus 119890minus119909)(119890119909 +119890minus119909)The input forget and output gates are denoted by 119894119905 119891119905and 119900119905 respectively while 119888119905 represents the internal state of thememory cell 119888 at time 119905 The value of the hidden layer of theLSTM at time 119905 is the vector ℎ119905 while ℎ119905minus1 is the values outputby each memory cell in the hidden layer at the previous time

The underlying mechanism behind LSTM model (usedfor building our LSTM ensemble) mainly comes from theldquogaterdquo components (shown in (1)) that are designed forlearning when to let prediction error in and when to let itout As such LSTM gates provide an effective mechanism interms of quickly modifying the memory content of the cellsand the internal dynamics in a cooperative way [20 21] Inthis sense the LSTM may have a superior ability to learnnonlinear statistical dependencies of real-world time seriesdata in comparison to conventional forecasting models

In the proposed ensemble forecasting method each of theindividual LSTM networks is used as ldquobase (component) pre-dictor modelrdquo [6] Note that ensemble forecasting approachcan be generally effective when there is considerable extent of

diversities among individual base models namely ensemblemembers [6 13 14] In light of this fact we vary the LSTMmodel parameter [15] namely sequence length to increasediversity among individual LSTM networks as ensemblemembers For this we learn on each LSTM network for aparticular sequence length Underlying idea for using differentsequence length parameters is to inject diversity duringthe generation of individual LSTM models This approachmay be beneficial for effectively modelling highly nonlin-ear statistical dependencies since using multiple sequencelength values allows for creating multiple LSTMmodels withvarious number of memory cells which is likely to providecomplementary effect on learning the internal dynamics andcharacteristics of time series data to be predicted In this wayan ensemble of LSTM models with varying sequence lengthis capable of handling the dynamics and nonstationarities ofreal-world time series

Assuming that a total of119872 LSTMmodels in an ensembleare provided their ensemble forecast result for time seriesdenoted as (119910(1) 119910(2) 119910(119873)) with 119873 observations is givenby

119910(119896) = 119872sum119898=1

119908119898119910(119896)119898 for 119896 = 1 119873 (2)

where 119910(119896)119898 denotes the forecast output (at the 119896th time step)obtained using the119898th LSTMmodel and119908119898 is the associatedcombining weight In (2) each weight 119908119898 is assigned to acorresponding LSTM modelrsquos forecast output Note that 0 le119908119898 le 1 and sum119872119898=1119908119898 = 13 Finding Adaptive Weights forCombining LSTMModels

An important factor that affects the forecasting performanceof our LSTM ensemble is a set of combining weights119908119898119898 =1 119872 We develop a novel weight determination schemewhich accounts for capturing the time varying dynamics ofthe underlying time series in an adaptive manner

We now describe the details of our weight determinationsolution which has been developed based on the followingproperty if the regression errors of individual estimatorsare uncorrelated and zero mean their weighted averaging(shown in (2)) has minimum variance when the weightsare inversely proportional to variances of the estimators[22] This property provides the theoretical foundation fordeveloping our weight determination solution for detailson this proof please refer to the Appendix section Thecombining weights are computed in the following recursiveway

119908(119896+1)119898 = 119908(119896)119898 + 120582Δ119908(119896)119898 for 119898 = 1 119872 (3)

where we suggest 120582 = 03Note that Δ119908(119896)119898 is computed basedon the inverse prediction error of the respective LSTM basemodel as follows

Δ119908(119896)119898 = 1120576(119896)1198981120576(1)119898 + 1120576(2)119898 + sdot sdot sdot + 1120576(119896)119898 (4)


Table 1 Description of the four different time series datasets [18] used in our experimentation

Time Series Type Total size Training size Validation size Testing sizeRiver flow Stationary nonseasonal 600 360 90 150Vehicles Nonstationary nonseasonal 252 151 38 63Wine Monthly seasonal 187 112 29 46Airline Monthly seasonal 144 86 22 36

The 120576(119896)119898 is related to past prediction error measured up to the119896th time step in the following way

120576(119896)119898 = 119896sum119905=119896minusV+1

120574119896minus119905119890(119905)119898 (5)

where 0 lt 120574 le 1 1 le V le 119896 and 119890(119905)119898 is the predictionerror at each time step 119905 of the 119898th LSTM model In (5)120576(119896)119898 is calculated by defining a sliding time window formedby the last V prediction errors Herein the forgetting factor120574 is devised for reducing the influence of old predictionerrors By performing weight update in time as described in(3) we can find the weights by analyzing intrinsic patternsin successive data forecasting trials this would be benefi-cial for coping with nonstationary properties in the timeseries or avoiding complex optimization for finding adaptiveweights

Using (3) (4) and (5) the weights are updated overtime series with 119879 time steps (119896 = 1 119879) leading to thefinal weights 119908(119879)119898 (119898 = 1 119872) Finally the weight to beassigned to each LSTMmodel in an ensemble is computed asfollows

119908119898 = 119908(119879)119898sum119872119899=1119908(119879)119899 for 119898 = 1 119872 (6)

Note that the weights computed using (6) satisfy the con-straints such that 0 le 119908119898 le 1 and sum119872119898=1119908119898 = 14 Experimentation

41 Experimental Setup and Condition The proposed LSTMensemble method was implemented using TensorFlow [23]andwas trainedwith the stochastic gradient descent (SGD)Atotal of ten LSTM base models (ie ensemble members) wereused to build our LSTM ensemble For each LSTM networkwe used one hidden layer and the same number of memorycells as in the assigned sequence length value (eg LSTMwithsequence length ldquo5rdquo has five memory cells) We set the 120574 andV parameters [shown in (5)] as 085 and 4 respectively Alsothe initial value of 119908(119896)119898 in (3) was set to zero

We tested our proposed LSTM ensemble on four discretetime series datasets (please refer to Table 1) namely ldquoRiverflowrdquo ldquoVehiclesrdquo ldquoWinerdquo and ldquoAirlinerdquo representing real-world phenomena which are publicly available at the TimeSeriesData Library (TSDL) [18] For each time series we usedaround 60 as training dataset the successive 15 as vali-dation dataset and the remaining 25 as test dataset Notethat validation dataset was used for finding the combining

Table 2 Demonstrating effectiveness of our proposed LSTMensemble on improving forecasting performance (a) MAE and (b)MSE

(a)

Time series Mean plusmn Std Best individualLSTM

Proposed LSTMensemble

River Flow 067 plusmn 005 064 053Vehicles 255 plusmn 032 222 185Wine 127 plusmn 029 109 094Airline 822 plusmn 181 643 558

(b)




Note MeanplusmnStd indicates the average and standard deviation of MAE (orMSE) measures computed over all of the individual LSTM base models inan ensemble

weights and all the results reported here were obtained usingtest dataset

In our experimental study we used the following twoerror measures used to evaluate the forecasting accuracynamely mean absolute error (MAE) and the mean squareerror (MSE) [1 6] as follows

MAE = sum119873119896=1 10038161003816100381610038161003816119910(119896) minus 119910(119896)10038161003816100381610038161003816119873 (7)

MSE = sum119873119896=1 (119910(119896) minus 119910(119896))2119873 (8)

where 119910(119896) and 119910(119896) are the target and forecast values(outcomes) respectively of time series with a total of 119873observations Both MAE and MSE performances of theforecasting algorithms considered were computed based onone-step-ahead forecasts as suggested in [1 6 13]

42 Experimental Results

421 Effectiveness of the Proposed LSTM Ensemble Forecast-ing The results in Table 2 demonstrate the effectiveness ofour LSTM ensemble algorithm on improving forecasting


0 5 10 15 20 25 30

Actual (ground-truth) time seriesForecast (prediction) time series

300350400450500550600650

No

of p

asse

nger

s

(a)

0 5 10 15 20 25 30


0

50

100

150

200

Mon

thly

rive

r flow

(b)

0 2 4 6 8 10 12 14 16 18


400600800

10001200140016001800

Sale

of v

ehic

les

(c)

0 5 10 15 20 25 30 35


times104

115

225

335

445

Mon

thly

sale

of r

ed w

ine

(d)

Figure 2 Time series forecasts of (a) Airline (b) River flow (c) Vehicles and (d) Wine obtained using our proposed LSTM ensembleforecasting algorithm

performance (in terms of both MAE and MSE) against thecase of using only a single individual LSTM base model Wecan see that our LSTM ensemble greatly outperforms all theindividual LSTM base models in an ensemble To supportthis compared to the best individual LSTMmodel predictionerror ldquoMSErdquo can be significantly reduced using our ensemblemethod of up to about 22 29 46 and 21 for respectiveldquoRiver flowrdquo ldquoVehiclesrdquo ldquoWinerdquo and ldquoAirlinerdquo time seriesLikewise for using MAE prediction errors can be reducedas much as 16 16 14 and 13 in the same order ofthe aforementioned time series by using the proposed LSTMensemble method

Figure 2 depicts the actual and predicted time seriesobtained using our LSTM ensemble forecasting method Itis seen that in each plot the closeness between the actualobservations and their forecasts is clearly evident The resultsshown in Table 2 and Figure 2 confirm the advantage ofour proposed LSTM ensemble method for notably improvingforecasting accuracy compared to the approach of using asingle individual LSTM network

422 Comparison with Other State-of-the-Art EnsembleForecasting Methods We compared our proposed methodwith other state-of-the-art ensemble forecasting algorithmsincluding simple averaging (Avg) [3ndash5] Median [15] LeastSquare Regression (LSR) [2] Average of In-sample Weights

(AIW) [16] and Neural Network Based Linear Ensemble(NNLE) [6] Table 3 presents comparative results Notethat all the results for comparison were directly cited fromcorresponding papers recently published (for details pleaserefer to [6]) In Table 3 the proposed LSTM ensembleachieves the lowest prediction errors namely the highestprediction accuracy for both MAE and MSE measuresIn particular from Table 3 it is obvious that our LSTMensemble approach can achieve the best performance forchallenging time series ldquoAirlinerdquo and ldquoVehiclesrdquo each ofwhich is composed of nonstationary and quite irregularpatterns (movements)

To further guarantee stable and robust comparison withother ensemble forecasting methods we calculate the so-called their worth values [1 6] Note that the worth valuesare computed as the average percentage reductions in theforecasting errors of the worst one of all ensemble forecastingmethods (under consideration) over all four time seriesdatasets used This measure shows the extent to which anensemble forecasting method performs better than the worstensemble and hence represents the overall ldquogoodnessrdquo ofthe ensemble forecasting method Let us denote by ldquoerror119894119895rdquothe forecasting error (calculated via MAE or MSE) obtainedfor the 119895th ensemble method for the 119894th time series and byldquomax error119894rdquo the maximum (or worst) error obtained by aparticular method for the 119894th time series at handThen worth


Table 3 Comparison with other state-of-the-art ensemble forecasting methods (a) MAE and (b) MSE(a)

Time series Ensemble forecasting methodsAvg[3ndash5] Median[15] LSR[2] AIW[16] NNLE[6] Proposed

River Flow 0751 0676 0736 0749 0638 0536Vehicles 2087 2139 2059 2071 2001 1851Wine 2075 2173 2466 2372 1923 0937Airline 1163 1173 1068 1022 7434 5582

(b)



Note Bold values denote the best results of forecasting each time series

Table 4 Comparison of the worth values obtained for differentensemble forecasting methods

Methods Worth values ()MAE MSE

Avg [3ndash5] 47848 82202Median [15] 54671 56206LSR [2] 36722 31835AIW [16] 50325 136540NNLE [6] 200354 313260Proposed LSTM Ensemble 391271 495803

value denoted as worth119895 for the 119895th ensemble forecastingmethod is defined as follows

worth119895 = 1119870119870sum119894=1

[(max error119894 minus error119894119895max error119894

) times 100]for 119895 = 1 119875

(9)

where 119870 and 119875 are the total number of time series datasetsand ensemble forecasting methods respectively used in ourexperiments In (9) worth119895 measures the percentage errorreduction of the 119895th ensemble method compared to the worstmethod The worth values of all the ensemble methods forboth MAE and MSE error measures are listed in Table 4From this table it can be seen that the proposed LSTMensemble method achieves the best worth values which areabout 39 and 49 for MAE and MSE respectively Thisresult indicates that for four different time series our LSTMensemble forecasting method can provide around 39 and49 more accurate results than a worst ensemble method interms of MAE andMSE respectively The comparison resultsinTables 3 and 4 validate the feasibility of our proposed LSTMensemble method with regard to improving state-of-the-artforecasting performance

423 Effect of LSTM Ensemble Size on Forecasting AccuracyIn the proposedmethod the size of LSTM network ensemble(ie the number of LSTM models within an ensemble) isone of the important factors for determining the forecastingaccuracy In this subsection we discuss experimental resultsby investigating the impact of our LSTM ensemble size onforecasting accuracy

Figures 3(a) and 3(b) show the training and testing fore-casting accuracy (in terms of MAE and MSE respectively)with respect to different number of ensemble members forthe ldquoWinerdquo dataset We can see that training forecastingaccuracy for both MAE and MSE continues to increase asthe size of LSTM ensemble becomes large and quickly levelsoff Considering testing forecasting accuracy it seems togenerally improve as the size of ensemble increases up toa particular number and repeat increasing and decreasingand finally converges to nearly the same constant value Itcan be also observed that in Figure 3 testing forecastingaccuracy for both datasets is the best when the numberof ensemble members is around 10 The above-mentionedobservations indicate that larger LSTM ensemble size doesnot always guarantee improved (generalization [24]) fore-casting accuracy Moreover the least number of ensemblemembers is more beneficial for reducing the computationalcosts required especiallywhenour proposed LSTMensembleis applied to forecast (predict) a large number of time serieswhich is common in energy- and finance-related forecastingapplications [7ndash9]

424 Runtime Performance Furthermore we assess theruntime performance of our LSTM ensemble forecastingframework using the ldquoRiver flowrdquo dataset Our hardwareconfiguration comprises a 33-GHz CPU and a 64G RAMwith the NVidia Titan X GPU It is found that the totaltime needed to train a single LSTM model is about 29minutes while the training time required to build ouroverall LSTM ensemble framework (for the case of tenensemble members) is about 283 minutes However it


2

18

16

14

12

1

08

06

04

02

02 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Fore

casti

ng er

ror (

MA

E)

MAE for training datasetMAE for testing dataset

Size of LSTM ensemble

(a)

Fore

casti

ng er

ror (

MSE

)

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

MSE for training datasetMSE for testing dataset


14

12

10

8

6

4

2

0

(b)

Figure 3 Impact of LSTM ensemble size in the proposed method on forecasting accuracy (a) MAE and (b) MSE

should be pointed out that the average time required toforecast across all time points (steps) is as low as 0003seconds It should be also noted that the time requiredto construct our LSTM ensemble framework should notbe considered in the measurement of the execution timesbecause this process can be executed offline in real-lifeforecasting applications In light of this fact the proposedLSTM ensemble method can be feasible for practical engi-neering applications by considering the balance betweenforecasting accuracy lower testing time and straightforwardimplementation

5 Conclusions

We have proposed a novel LSTM ensemble forecastingmethodWe have shown that our LSTM ensemble forecastingapproach is capable of well capturing the dynamic behaviorof real-world time series Comparative experiments on thefour challenging time series indicate that the proposedmethod achieves superior performance compared with otherpopular forecasting algorithms This can be achieved bydeveloping a novel scheme that can adjust combining weightsbased on time-dependent reasoning and self-adjustmentIt is also shown that our LSTM ensemble forecasting caneffectively model highly nonlinear statistical dependenciessince their gating mechanisms enable quickly modifyingthe memory content of the cells and the internal dynamicsin a cooperative way [20 21] In addition our complexityanalysis demonstrates that our LSTM ensemble is able tohave a runtime which is competitive with the approachto use only a single LSTM network Consequently ourproposed LSTM ensemble forecasting solution can be readilyapplied in time series forecasting (prediction) problemsboth in terms of forecasting accuracy and fast runtimeperformance

Appendix

Weighted averaging defined in (2) obtains the combinedoutput 119910(119896) by averaging the outputs 119910(119896)119898 119898 = 1 119872of individual estimator models with different weights If wedefine prediction (or regression) error as 119890(119896)119898 = 119910(119896)119898 minus 119910(119896) foreach estimator 119910(119896) can be expressed as

119910(119896) = 119872sum119898=1

119908119898119910(119896)119898 = 119910(119896) + 119872sum119898=1

119908119898119890(119896)119898 (A1)

where the weights satisfy the constraint that 119908119898 ge 0 andsum119872119898=1119908119898 = 1 The mean square error (MSE) of combinedoutput 119910(119896) with respect to the target value 119910(119896) can be writtenas follows [22]

MSE [119910(119896)] = 119872sum119898=1

119872sum119899=1

119908119898119908119899119862119898119899 (A2)

where 119862119898119899 stands for the symmetric covariance matrixdefined by 119862119898119899 = E[119890(119896)119898 119890(119896)119899 ] Our goal is to find the optimalweights that minimize the aforementioned MSE which canbe solved by applying the Lagrangemultiplier as follows [22]

120597120597119908 [119872sum119899=1

119908119898119908119899119862119898119899 minus 2120582( 119872sum119898=1

119908119898 minus 1)] = 0 (A3)

By imposing the constraint sum119872119898=1119908119898 = 1 we find that

119908119898 = sum119872119899=1 119862minus1119898119899sum119872119896=1sum119872119899=1 119862minus1119896119899 (A4)


Under the assumption that the errors 119890(119896)119898 are uncorrelatedand zero mean ie 119862119898119899 = 0 forall 119898 = 119899 and the optimal 119908119898can be obtained [22]

119908119898 = 120590minus2119898sum119872119899=1 120590minus2119899 (A5)

where 1205902119898 = 119862119898119898 = E[(119890(119896)119898 )2] The result in (A5) shows thattheweights should be inversely proportional to variances (ieerrors) of the respective estimators In other words it meansthat the weights should be in proportion to the prediction(regression) performance of individual estimators

Data Availability

The datasets generated andor analyzed during the currentstudy are available from the corresponding author on reason-able request

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research was supported by Basic Science ResearchProgram through theNational Research Foundation of Korea(NRF) funded by the Ministry of Science and ICT [NRF-2017R1C1B1008646] This work was supported by HankukUniversity of Foreign Studies Research Fund This work wasalso supported by Basic Science Research Program throughthe National Research Foundation of Korea (NRF) funded bythe Ministry of Education (no 2015R1D1A1A01057420)

References

[1] F M Bianchi E Maiorino M C Kampffmeyer A Rizzi andR Jenssen ldquoAn overview and comparative analysis of recurrentneural networks for short term load forecastingrdquo 2017

[2] R Adhikari and R K Agrawal ldquoPerformance evaluation ofweights selection schemes for linear combination of multipleforecastsrdquo Artificial Intelligence Review vol 42 no 4 pp 529ndash548 2012

[3] V R R Jose and R L Winkler ldquoSimple robust averagesof forecasts Some empirical resultsrdquo International Journal ofForecasting vol 24 no 1 pp 163ndash169 2008

[4] J G de Gooijer and R J Hyndman ldquo25 years of time seriesforecastingrdquo International Journal of Forecasting vol 22 no 3pp 443ndash473 2006

[5] S Makridakis and R L Winkler ldquoAverages of forecasts someempirical resultsrdquo Management Science vol 29 no 9 pp 987ndash996 1983

[6] R Adhikari ldquoA neural network based linear ensemble frame-work for time series forecastingrdquo Neurocomputing vol 157 pp231ndash242 2015

[7] R K Jain K M Smith P J Culligan and J E TaylorldquoForecasting energy consumption of multi-family residentialbuildings using support vector regression Investigating the

impact of temporal and spatial monitoring granularity onperformance accuracyrdquo Applied Energy vol 123 pp 168ndash1782014

[8] V P Utgikar and J P Scott ldquoEnergy forecasting Predictionsreality and analysis of causes of errorrdquo Energy Policy vol 34 no17 pp 3087ndash3092 2006

[9] W Huang K K Lai Y Nakamori S Wang and L Yu ldquoNeuralnetworks in finance and economics forecastingrdquo InternationalJournal of Information TechnologyampDecisionMaking vol 6 no1 pp 113ndash140 2007

[10] F Kaytez M C Taplamacioglu E Cam and F HardalacldquoForecasting electricity consumption a comparison of regres-sion analysis neural networks and least squares support vectormachinesrdquo International Journal of Electrical Power amp EnergySystems vol 67 pp 431ndash438 2015

[11] H Yang Z Jiang andH Lu ldquoAHybridWind Speed ForecastingSystem Based on a lsquoDecomposition and Ensemblersquo Strategy andFuzzy Time Seriesrdquo Energies vol 10 no 9 p 1422 2017

[12] J Kitchen and R Monaco ldquoReal-time forecasting in practiceThe US Treasury Staff rsquos Real-Time GDP Forecast SystemrdquoBusiness Economics The Journal of the National Association ofBusiness Economists vol 38 no 4 pp 10ndash19 October 2003

[13] N Kourentzes D K Barrow and S F Crone ldquoNeural networkensemble operators for time series forecastingrdquo Expert Systemswith Applications vol 41 no 9 pp 4235ndash4244 2014

[14] P S Freitas and A J Rodrigues ldquoModel combination in neural-based forecastingrdquo European Journal of Operational Researchvol 173 no 3 pp 801ndash814 2006

[15] S Hochreiter and J Schmidhuber ldquoLong short-termmemoryrdquoNeural Computation vol 9 no 8 pp 1735ndash1780 1997

[16] J S Armstrong Ed Principles of forecasting a handbookfor researchers and practitioners vol 30 Springer Science ampBusiness Media New York NY USA 2001

[17] T G Barbounis J B Theocharis M C Alexiadis and P SDokopoulos ldquoLong-term wind speed and power forecastingusing local recurrent neural network modelsrdquo IEEE Transac-tions on Energy Conversion vol 21 no 1 pp 273ndash284 2006

[18] ldquoTime Series Data Library (TSDL)rdquo httprobjhyndmancomTSDL

[19] Z C Lipton J Berkowitz and C Elkan ldquoA critical reviewof recurrent neural networks for sequence learningrdquo httpsarxivorgabs150600019

[20] V Veeriah N Zhuang and G-J Qi ldquoDifferential recurrentneural networks for action recognitionrdquo in Proceedings of the15th IEEE International Conference on Computer Vision ICCV2015 pp 4041ndash4049 Chile December 2015

[21] N Srivastava E Mansimov and R Salakhutdinov ldquoUnsu-pervised learning of video representations using LSTMsrdquo inProceedings of the 32nd International Conference on MachineLearning ICML 2015 pp 843ndash852 France July 2015

[22] M P Perrone and L N Cooper ldquoWhen networks disagreeEnsemble methods for hybrid neural networksrdquo in How WeLearn How We Remember Toward an Understanding of BrainandNeural Systems SelectedPapers of Leon vol 10 pp 342ndash358World Scientific 1995

[23] M Abadi A Agarwal P Barham et al Tensorflow Large-scalemachine learning on heterogeneous distributed systems 2016

[24] C M Bishop Pattern Recognition and Machine LearningSpringer New York NY USA 2006

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of


Mathematical Problems in Engineering

Applied MathematicsJournal of


Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of


Mathematical PhysicsAdvances in

Complex AnalysisJournal of


OptimizationJournal of



Engineering Mathematics

International Journal of


Operations ResearchAdvances in

Journal of


Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences


Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018


Decision SciencesAdvances in


AnalysisInternational Journal of


Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom


Validation time-series data

Training time-series data

Learning



Learning


Learning

Sequence length lsquoMrsquo

LSTM 2

LSTM 1

LSTM 3

LSTM M

LSTM Ensemble Training

Weighted combination of forecast output

EnsembleForecast results

Adaptive weight determination based on

past prediction error and forgetting factor

LSTM Ensemble Forecasting

Weights

Learning




MＭＮ LSTM network

Figure 1 Overall framework of the proposed LSTM ensemble method for the purpose of time series forecasting

ensemble forecasting is simple averaging that assigns equalweights to all forecasting component models [3ndash5] The sim-ple averaging approach is sensitive to extreme values (ie out-liers) and unreliable for skewed distributions [6 14] To copewith this limitation weighted combination schemes havebeen proposedThe authors in [2] proposed the Least SquareRegression (LSR) that attempts to find the optimal weightsby minimizing the Sum of Squared Error (SSE) The authorsin [15] adopted the Average of In-sample Weights (AIW)schemewhere each weight is simply computed as the normal-ized inverse absolute forecasting error of an individual model[16]The authors in [6] developed a so-called Neural NetworkBased Linear Ensemble (NNLE) method that determines thecombining weights through a neural network structure

Recently a class of mathematical models called Recur-rent Neural Networks (RNNs) [17] are nowadays gainingrenew interest among researchers and they are replacingmany kinds of practical implementation of the forecastingsystems previously based on statistical models [1] In par-ticular Long Short-TermMemory (LSTM) networksmdashwhichare a variation of RNNmdashhave proven to be one of the mostpowerful RNN models for time series forecasting and otherrelated applications [1 15] The LSTM networks can be con-structed in such a way that they are able to remember long-term relationships in the data The LSTM networks have beenshown to model temporal sequences and their long-rangedependencies more accurately than original RNNmodel [1]However despite the recent popularity of the LSTM net-works their applicability in the context of ensemble forecast-ing has not been investigated yet Hence to our knowledgehow to best combine multiple forecast results of individualLSTMmodels still remains a challenging and open question

In this paper we present a novel LSTM ensemble forecast-ingmethod for improved time series forecasting which effec-tively combines multiple forecasts (predictions)(throughout

the remainder of this paper both terms of ldquoforecastrdquo andldquopredictionrdquo will be used interchangeably) inferred from thedifferent and diverse LSTM models Especially we develop anovel way of dynamically adjusting the so-called combiningweights that are used for combining multiple LSTM modelsto produce the composite prediction output The main ideaof our proposed method is to update combining weightsat each time step in an adaptive and recursive way Forthis the weights can be determined by using both pastprediction errors (measured up to the current time step)and forgetting weight factor The weights are assigned toindividual LSTM models which improve the forecastingaccuracy to a large extent The overall framework of ourproposed method is illustrated in Figure 1 Results showthat the proposed LSTM ensemble achieves state-of-the-artforecasting performance on real-world time series datasetpublicly available and it is considerably better than otherrecently developed ensemble forecasting methods as it will beshown in Section 4

The rest of this paper is organized as follows Section 2describes our proposed approach for building an ensembleof LSTMs that is well-suited for use in time series forecastingThen we discuss how to find combining weights for thepurpose of adaptively combining LSTM models Section 4presents and discusses our comparative experimental resultsFinally the conclusion is given in Section 5

2 Building LSTM Ensemble forTime Series Forecasting

In the proposed method an ensemble of LSTM networksshould be first constructed in an effective way of maximizinga complementary effect during the combination of multipleLSTM forecast results aiming to improve forecasting accu-racy Before explaining our LSTM ensemble construction





(1)






119910(119896) = 119872sum119898=1

119908119898119910(119896)119898 for 119896 = 1 119873 (2)




119908(119896+1)119898 = 119908(119896)119898 + 120582Δ119908(119896)119898 for 119898 = 1 119872 (3)


Δ119908(119896)119898 = 1120576(119896)1198981120576(1)119898 + 1120576(2)119898 + sdot sdot sdot + 1120576(119896)119898 (4)





120576(119896)119898 = 119896sum119905=119896minusV+1

120574119896minus119905119890(119905)119898 (5)



119908119898 = 119908(119879)119898sum119872119899=1119908(119879)119899 for 119898 = 1 119872 (6)





(a)




(b)







MAE = sum119873119896=1 10038161003816100381610038161003816119910(119896) minus 119910(119896)10038161003816100381610038161003816119873 (7)

MSE = sum119873119896=1 (119910(119896) minus 119910(119896))2119873 (8)





0 5 10 15 20 25 30


300350400450500550600650

No

of p

asse

nger

s

(a)

0 5 10 15 20 25 30


0

50

100

150

200

Mon

thly

rive

r flow

(b)

0 2 4 6 8 10 12 14 16 18


400600800

10001200140016001800

Sale

of v

ehic

les

(c)

0 5 10 15 20 25 30 35


times104

115

225

335

445

Mon

thly

sale

of r

ed w

ine

(d)











(b)








worth119895 = 1119870119870sum119894=1


) times 100]for 119895 = 1 119875

(9)






2

18

16

14

12

1

08

06

04

02

02 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Fore

casti

ng er

ror (

MA

E)



(a)

Fore

casti

ng er

ror (

MSE

)

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30



14

12

10

8

6

4

2

0

(b)



5 Conclusions


Appendix


119910(119896) = 119872sum119898=1

119908119898119910(119896)119898 = 119910(119896) + 119872sum119898=1

119908119898119890(119896)119898 (A1)


MSE [119910(119896)] = 119872sum119898=1

119872sum119899=1

119908119898119908119899119862119898119899 (A2)


120597120597119908 [119872sum119899=1

119908119898119908119899119862119898119899 minus 2120582( 119872sum119898=1

119908119898 minus 1)] = 0 (A3)





119908119898 = 120590minus2119898sum119872119899=1 120590minus2119899 (A5)


Data Availability




Acknowledgments


References

































Journal of












Journal of







Volume 2018






Volume 2018












(1)






119910(119896) = 119872sum119898=1

119908119898119910(119896)119898 for 119896 = 1 119873 (2)




119908(119896+1)119898 = 119908(119896)119898 + 120582Δ119908(119896)119898 for 119898 = 1 119872 (3)


Δ119908(119896)119898 = 1120576(119896)1198981120576(1)119898 + 1120576(2)119898 + sdot sdot sdot + 1120576(119896)119898 (4)





120576(119896)119898 = 119896sum119905=119896minusV+1

120574119896minus119905119890(119905)119898 (5)



119908119898 = 119908(119879)119898sum119872119899=1119908(119879)119899 for 119898 = 1 119872 (6)





(a)




(b)







MAE = sum119873119896=1 10038161003816100381610038161003816119910(119896) minus 119910(119896)10038161003816100381610038161003816119873 (7)

MSE = sum119873119896=1 (119910(119896) minus 119910(119896))2119873 (8)





0 5 10 15 20 25 30


300350400450500550600650

No

of p

asse

nger

s

(a)

0 5 10 15 20 25 30


0

50

100

150

200

Mon

thly

rive

r flow

(b)

0 2 4 6 8 10 12 14 16 18


400600800

10001200140016001800

Sale

of v

ehic

les

(c)

0 5 10 15 20 25 30 35


times104

115

225

335

445

Mon

thly

sale

of r

ed w

ine

(d)











(b)








worth119895 = 1119870119870sum119894=1


) times 100]for 119895 = 1 119875

(9)






2

18

16

14

12

1

08

06

04

02

02 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Fore

casti

ng er

ror (

MA

E)



(a)

Fore

casti

ng er

ror (

MSE

)

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30



14

12

10

8

6

4

2

0

(b)



5 Conclusions


Appendix


119910(119896) = 119872sum119898=1

119908119898119910(119896)119898 = 119910(119896) + 119872sum119898=1

119908119898119890(119896)119898 (A1)


MSE [119910(119896)] = 119872sum119898=1

119872sum119899=1

119908119898119908119899119862119898119899 (A2)


120597120597119908 [119872sum119899=1

119908119898119908119899119862119898119899 minus 2120582( 119872sum119898=1

119908119898 minus 1)] = 0 (A3)





119908119898 = 120590minus2119898sum119872119899=1 120590minus2119899 (A5)


Data Availability




Acknowledgments


References

































Journal of












Journal of







Volume 2018






Volume 2018












120576(119896)119898 = 119896sum119905=119896minusV+1

120574119896minus119905119890(119905)119898 (5)



119908119898 = 119908(119879)119898sum119872119899=1119908(119879)119899 for 119898 = 1 119872 (6)





(a)




(b)







MAE = sum119873119896=1 10038161003816100381610038161003816119910(119896) minus 119910(119896)10038161003816100381610038161003816119873 (7)

MSE = sum119873119896=1 (119910(119896) minus 119910(119896))2119873 (8)





0 5 10 15 20 25 30


300350400450500550600650

No

of p

asse

nger

s

(a)

0 5 10 15 20 25 30


0

50

100

150

200

Mon

thly

rive

r flow

(b)

0 2 4 6 8 10 12 14 16 18


400600800

10001200140016001800

Sale

of v

ehic

les

(c)

0 5 10 15 20 25 30 35


times104

115

225

335

445

Mon

thly

sale

of r

ed w

ine

(d)











(b)








worth119895 = 1119870119870sum119894=1


) times 100]for 119895 = 1 119875

(9)






2

18

16

14

12

1

08

06

04

02

02 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Fore

casti

ng er

ror (

MA

E)



(a)

Fore

casti

ng er

ror (

MSE

)

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30



14

12

10

8

6

4

2

0

(b)



5 Conclusions


Appendix


119910(119896) = 119872sum119898=1

119908119898119910(119896)119898 = 119910(119896) + 119872sum119898=1

119908119898119890(119896)119898 (A1)


MSE [119910(119896)] = 119872sum119898=1

119872sum119899=1

119908119898119908119899119862119898119899 (A2)


120597120597119908 [119872sum119899=1

119908119898119908119899119862119898119899 minus 2120582( 119872sum119898=1

119908119898 minus 1)] = 0 (A3)





119908119898 = 120590minus2119898sum119872119899=1 120590minus2119899 (A5)


Data Availability




Acknowledgments


References

































Journal of












Journal of







Volume 2018






Volume 2018









0 5 10 15 20 25 30


300350400450500550600650

No

of p

asse

nger

s

(a)

0 5 10 15 20 25 30


0

50

100

150

200

Mon

thly

rive

r flow

(b)

0 2 4 6 8 10 12 14 16 18


400600800

10001200140016001800

Sale

of v

ehic

les

(c)

0 5 10 15 20 25 30 35


times104

115

225

335

445

Mon

thly

sale

of r

ed w

ine

(d)











(b)








worth119895 = 1119870119870sum119894=1


) times 100]for 119895 = 1 119875

(9)






2

18

16

14

12

1

08

06

04

02

02 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Fore

casti

ng er

ror (

MA

E)



(a)

Fore

casti

ng er

ror (

MSE

)

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30



14

12

10

8

6

4

2

0

(b)



5 Conclusions


Appendix


119910(119896) = 119872sum119898=1

119908119898119910(119896)119898 = 119910(119896) + 119872sum119898=1

119908119898119890(119896)119898 (A1)


MSE [119910(119896)] = 119872sum119898=1

119872sum119899=1

119908119898119908119899119862119898119899 (A2)


120597120597119908 [119872sum119899=1

119908119898119908119899119862119898119899 minus 2120582( 119872sum119898=1

119908119898 minus 1)] = 0 (A3)





119908119898 = 120590minus2119898sum119872119899=1 120590minus2119899 (A5)


Data Availability




Acknowledgments


References

































Journal of












Journal of







Volume 2018






Volume 2018












(b)








worth119895 = 1119870119870sum119894=1


) times 100]for 119895 = 1 119875

(9)






2

18

16

14

12

1

08

06

04

02

02 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Fore

casti

ng er

ror (

MA

E)



(a)

Fore

casti

ng er

ror (

MSE

)

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30



14

12

10

8

6

4

2

0

(b)



5 Conclusions


Appendix


119910(119896) = 119872sum119898=1

119908119898119910(119896)119898 = 119910(119896) + 119872sum119898=1

119908119898119890(119896)119898 (A1)


MSE [119910(119896)] = 119872sum119898=1

119872sum119899=1

119908119898119908119899119862119898119899 (A2)


120597120597119908 [119872sum119899=1

119908119898119908119899119862119898119899 minus 2120582( 119872sum119898=1

119908119898 minus 1)] = 0 (A3)





119908119898 = 120590minus2119898sum119872119899=1 120590minus2119899 (A5)


Data Availability




Acknowledgments


References

































Journal of












Journal of







Volume 2018






Volume 2018









2

18

16

14

12

1

08

06

04

02

02 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Fore

casti

ng er

ror (

MA

E)



(a)

Fore

casti

ng er

ror (

MSE

)

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30



14

12

10

8

6

4

2

0

(b)



5 Conclusions


Appendix


119910(119896) = 119872sum119898=1

119908119898119910(119896)119898 = 119910(119896) + 119872sum119898=1

119908119898119890(119896)119898 (A1)


MSE [119910(119896)] = 119872sum119898=1

119872sum119899=1

119908119898119908119899119862119898119899 (A2)


120597120597119908 [119872sum119899=1

119908119898119908119899119862119898119899 minus 2120582( 119872sum119898=1

119908119898 minus 1)] = 0 (A3)





119908119898 = 120590minus2119898sum119872119899=1 120590minus2119899 (A5)


Data Availability




Acknowledgments


References

































Journal of












Journal of







Volume 2018






Volume 2018










119908119898 = 120590minus2119898sum119872119899=1 120590minus2119899 (A5)


Data Availability




Acknowledgments


References

































Journal of












Journal of







Volume 2018






Volume 2018















Journal of












Journal of







Volume 2018






Volume 2018








Documents

Combining LSTM Network Ensemble via Adaptive Weighting …downloads.hindawi.com/journals/mpe/2018/2470171.pdfApr 06, 2018 · forecasting instability. To overcome the above challenges,