12
Research Article A Time-Series Water Level Forecasting Model Based on Imputation and Variable Selection Method Jun-He Yang, Ching-Hsue Cheng, and Chia-Pan Chan Department of Information Management, National Yunlin University of Science and Technology, Yunlin, Taiwan Correspondence should be addressed to Ching-Hsue Cheng; [email protected] Received 20 June 2017; Revised 18 September 2017; Accepted 27 September 2017; Published 9 November 2017 Academic Editor: Amparo Alonso-Betanzos Copyright © 2017 Jun-He Yang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Reservoirs are important for households and impact the national economy. is paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir’s water level. is study collected data from the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015. e two datasets are concatenated into an integrated dataset based on ordering of the data as a research dataset. e proposed time-series forecasting model summarily has three foci. First, this study uses five imputation methods to directly delete the missing value. Second, we identified the key variable via factor analysis and then deleted the unimportant variables sequentially via the variable selection method. Finally, the proposed model uses a Random Forest to build the forecasting model of the reservoir’s water level. is was done to compare with the listing method under the forecasting error. ese experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing model. In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability. 1. Introduction Shimen Reservoir is located between Taoyuan City and Hsinchu County in Taiwan. e Shimen Reservoir offers irri- gation, hydroelectricity, water supply, flood control, tourism, and so on. is reservoir is very important to the area and offers livelihood, agriculture, flood control, and economic development. us, the authorities should plan and manage water resources comprehensively via accurate forecasting. Previous studies of reservoir water levels have identified three important problems: (1) ere are few studies of reservoir water levels: related studies [1–4] in the hydrological field use machine learning methods to forecast water levels. ey focused on water level forecasting of the flood stages in pumping stations, reservoirs, lakes, basins, and so on. Most of the water level forecasting of these flood stages collected the data about typhoons, specific climate, seasonal rainfall, or water levels. (2) Only a few variables have been used in reservoir water level forecasting. e literature shows only a few related studies of forecasting [5, 6]. ese used water level as the dependent variable, and the independent variable only has rainfall, water level, and the time lag of the combined two variables. us, a few independent variables were selected. It is difficult to determine the key variable set in the reservoir water level. (3) No imputation method used in datasets of reservoir water level: previous studies of water level forecasting in hydrological fields have shown that the collected data are noninterruptible and long-term, but most of them did not explain how to deal with the missing values from human error or mechanical failure. To improve these problems, this study collected data on Taiwan Shimen Reservoir and the corresponding informa- tion on daily atmospheric datasets. e two datasets were concatenated into single dataset based on the date. Next, this study imputed missing values and selected a better imputation method to further build forecast models. We then evaluated the variables based on different models. Hindawi Computational Intelligence and Neuroscience Volume 2017, Article ID 8734214, 11 pages https://doi.org/10.1155/2017/8734214

A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

Research ArticleA Time-Series Water Level Forecasting Model Based onImputation and Variable Selection Method

Jun-He Yang Ching-Hsue Cheng and Chia-Pan Chan

Department of Information Management National Yunlin University of Science and Technology Yunlin Taiwan

Correspondence should be addressed to Ching-Hsue Cheng chchengyuntechedutw

Received 20 June 2017 Revised 18 September 2017 Accepted 27 September 2017 Published 9 November 2017

Academic Editor Amparo Alonso-Betanzos

Copyright copy 2017 Jun-He Yang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Reservoirs are important for households and impact the national economy This paper proposed a time-series forecasting modelbased on estimating a missing value followed by variable selection to forecast the reservoirrsquos water level This study collected datafrom the Taiwan Shimen Reservoir as well as daily atmospheric data from 2008 to 2015 The two datasets are concatenated into anintegrated dataset based on ordering of the data as a research dataset The proposed time-series forecasting model summarily hasthree foci First this study uses five imputation methods to directly delete the missing value Second we identified the key variablevia factor analysis and then deleted the unimportant variables sequentially via the variable selection method Finally the proposedmodel uses a Random Forest to build the forecasting model of the reservoirrsquos water level This was done to compare with the listingmethod under the forecasting errorThese experimental results indicate that the Random Forest forecasting model when applied tovariable selection with full variables has better forecasting performance than the listing model In addition this experiment showsthat the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability

1 Introduction

Shimen Reservoir is located between Taoyuan City andHsinchu County in TaiwanThe Shimen Reservoir offers irri-gation hydroelectricity water supply flood control tourismand so on This reservoir is very important to the area andoffers livelihood agriculture flood control and economicdevelopment Thus the authorities should plan and managewater resources comprehensively via accurate forecasting

Previous studies of reservoir water levels have identifiedthree important problems

(1) There are few studies of reservoir water levels relatedstudies [1ndash4] in the hydrological field use machinelearning methods to forecast water levels Theyfocused on water level forecasting of the flood stagesin pumping stations reservoirs lakes basins and soon Most of the water level forecasting of these floodstages collected the data about typhoons specificclimate seasonal rainfall or water levels

(2) Only a few variables have been used in reservoirwater level forecastingThe literature shows only a few

related studies of forecasting [5 6] These used waterlevel as the dependent variable and the independentvariable only has rainfall water level and the timelag of the combined two variables Thus a fewindependent variables were selected It is difficult todetermine the key variable set in the reservoir waterlevel

(3) No imputation method used in datasets of reservoirwater level previous studies of water level forecastingin hydrological fields have shown that the collecteddata are noninterruptible and long-term but most ofthem did not explain how to deal with the missingvalues from human error or mechanical failure

To improve these problems this study collected data onTaiwan Shimen Reservoir and the corresponding informa-tion on daily atmospheric datasets The two datasets wereconcatenated into single dataset based on the date Nextthis study imputed missing values and selected a betterimputationmethod to further build forecast modelsWe thenevaluated the variables based on different models

HindawiComputational Intelligence and NeuroscienceVolume 2017 Article ID 8734214 11 pageshttpsdoiorg10115520178734214

2 Computational Intelligence and Neuroscience

This paper includes five sections Section 2 is relatedworkSection 3 proposes research methodology and introducesthe concepts imputation methods variable selection andforecasting model Section 4 verifies the proposed model andcompares with the listing models Section 5 concludes

2 Related Work

This section introduces a forecast method of machine learn-ing imputation techniques and variable selection

21 Machine Learning Forecast (Regression)

211 RBF Network Radial Basis Function Networks wereproposed by Broomhead and Lowe in 1988 [7] RBF is asimple supervised learning feed forward network that avoidsiterative training processes and trains the data at one stage[8] The RBF Network is a type of ANN for applications tosolve problems of supervised learning for example regres-sion classification and time-series prediction [9] The RBFNetwork consists of three layers input layer hidden layer andoutput layer The input layer is the set of source nodes thesecond layer is a hidden layer high dimension and the outputlayer gives the response of the network to the activationpatterns applied to the input layer [10] The advantages ofthe RBF approach are the (partial) linearity in the parametersand the availability of fast and efficient training methods [11]The use of radial basis functions results from a number ofdifferent concepts including function approximation noisyinterpolation density estimation and optimal classificationtheory [12]

212 Kstar The Kstar is an instance-based classifier thatdiffers from other instance-based learners in that it uses anentropy-based distance function [13] The Lazy Family DataMining Classifiers supports incremental learning It containssome classifiers such as Kstar and it takes less time fortraining and more time for predicting [14] It provides a con-sistent approach to handling symbolic attributes real valuedattributes and missing values [15] Kstar uses an entropy-based distance function for instance-based regression Thepredicted class value of a test instance comes from values oftraining instances that are similar to the Kstar [16]

213 KNN The 119896-Nearest-Neighbor classifier offers a goodclassification accuracy rate for activity classification [17] ThekNN algorithm is based on the notion that similar instanceshave similar behavior and thus the new input instances arepredicted according to the stored most similar neighboringinstances [18]

22 Random Forest A Random Forest can be applied forclassification regression and unsupervised learning [19] It issimilar to the baggingmethod RandomForest is an ensemblelearning method A decision tree represents the classifierRandom Forest gets 119873 outputs through 119873 decision treesThese are forecast by voting for all of the predicted results Itcan solve the classification and regression problems RandomForest is simple and easily parallelized [20]

23 Random Tree Random Tree is an ensemble learn-ing algorithm that generates many individual learners andemploys a bagging idea to produce a random set of data in theconstruction of a decision tree [21] Random Tree classifierscan deal with regression and classification problems Randomtrees can be generated efficiently and can be combined intolarge sets of random trees This generally leads to accuratemodels [22] The Random Tree classifier takes the inputfeature vector and classifies it with every tree in the forestIt then outputs the class label that received themajority of thevotes [23]

24 Imputation The daily atmospheric data may have miss-ing values due to human error or machine failure Manyprevious studies have shown that the statistical bias occurredwhen the missing values were directly deleted Thus imput-ing data can significantly improve the quality of the datasetOtherwise biased results may cause poor performance inthe ensuing constructs [24] Single imputation methods haveseveral advantages such as a wider scope than multipleimputation methods Sometimes it is more important to findthe missing values than to estimate the parameters [25] Themedian of nearby point imputation methods uses nearbyvalues for ordering and then selects the median to replacethe missing value The advantage of the median imputationmethod is that its replaced value is actually a real valuein the data [26] Series mean imputation methods replacethe average of the variables directly Regression imputationmethod uses simple linear regression to estimate missingvalues and replace them The mean of the nearby pointimputation methods is the mean of nearby values Thenumber of nearby values can be found by using a ldquospan ofnearby pointsrdquo option [27] The linear imputation is mostreadily applicable to continuous explanatory variables [28]

25 Variable Selection The variable selectionmethodmainlyidentifies the key variable that actually influences the fore-casting target from several variables It then deletes theunimportant variables to improve the modelrsquos efficiency Itcan solve high dimensional and complex problems Previousstudies in several field have shown that variable selectioncan improve the forecasting efficiency of machine learningmethods [29ndash32]

Variable selection is an important technique in datapreprocessing It removes irrelevant data and improves theaccuracy and comprehensibility of the results [33] Variableselection methods can be categorized into three classes filterwrapper and embedded Filter uses statistic methods toselect variables It has better generalization ability and lowercomputational demands Wrapper methods use classifiers toidentify the best subset components The embedded methodhas a deeper interaction between variable selection andconstruction of the classifier [34]

Filter models utilize statistical techniques such as prin-cipal component analysis (PCA) factor analysis (FA) inde-pendent component analysis and discriminate analysis in theinvestigation of other indirect performance measures Theseare mostly based on distance and information measures [35]

Computational Intelligence and Neuroscience 3

PCA transforms a set of feature columns in the datasetinto a projection of the feature space with lower dimen-sionality FA is a generalization of PCA the main differencebetween PCA and FA is that FA allows noise to have non-spherical shape while transforming the data The main goalof both PCA and FA is to transform the coordinate systemsuch that correlation between system variables is minimized[36]

There are several methods to decide how many factorshave to be extracted The most widely used method fordetermining the number of factors is using eigenvaluesgreater than one [10]

3 Proposed Model

Reservoirs are important domestically as well as in thenational defense and for economic development Thus thereservoir water levels should be forecast over a long periodof time and water resources should be planned and man-aged comprehensively to reach great cost-effectiveness Thispaper proposes a time-series forecasting model based on theimputation of missing values and variable selection First theproposed model used five imputation methods (ie medianof nearby points series mean mean of nearby points linearand regression imputation) It then compares these findingswith a delete strategy to estimate the missing values Secondby identifying the key variable that influences the daily waterlevels the proposed method ranked the importance of theatmospheric variables via factor analysis It then sequentiallyremoves the unimportant variables Finally the proposedmodel uses a Random Forest machine learning method tobuild a forecasting model of the reservoir water level tocompare it to other methods The proposed model could bepartitioned into four parts data preprocessing imputationand feature selection model building and accuracy evalua-tion The procedure is shown in Figure 1

31 Computational Step To understand the proposed modelmore easily this section partitioned the proposed model intofour steps

Step 1 (data preprocessing) The related water level and atmo-spheric data were collected from the reservoir administrationwebsite and the Taoyuan weather station The two collecteddatasets are concatenated into an integrated dataset basedon the date There are nine independent variables and onedependent variable in the integrated datasetThe variables aredefined in the integrated dataset and are listed in Table 1

Step 2 (imputation) After Step 1 we found some variablesand records with missing values in the integrated datasetdue to mechanical measurement failure or human operationerror Previous studies showed that deleting missing valuesdirectly will impact the results To identify that one thatbetter fits with the imputationmethod this paper utilized fiveimputation methods to estimate the missing values and thencompared it with no imputation method to directly deletethe missing value The five imputation methods were themedian of the nearby points series mean mean of nearby

Table 1 Description of variables in the research dataset

Output Shimen Reservoir daily discharge releaseInput Shimen Reservoir daily inflow dischargeTemperature Daily temperature in Daxi Taoyuan

Rainfall The previous day Shimen Reservoiraccumulated rainfall

Pressure Daily barometric pressure in DaxiTaoyuan

Relative Humidity Daily relative humidity in Daxi TaoyuanWind Speed Daily wind speed in Daxi TaoyuanDirection Daily wind direction in Daxi TaoyuanRainfall Dasi Daily rainfall in Daxi Taoyuan

points linear imputation and regression imputation Thisstudy had six processed datasets after processing the missingvalues problem The problem is then how to identify theimputationmethod that is a better fit to the integrated datasetTo determine this the following steps were followed

(i) In order to rescale all numeric values in the range[0 1] this step normalized each variable value bydividing the maximal value for the five imputeddatasets and then deleted the missing value datasetAll independent and dependent variables are positivevalues

(ii) The six normalized datasets are partitioned into 66training datasets and 34 testing datasets We alsoemployed a 10-fold cross-validation approach to iden-tify the imputation dataset that has best predictionperformance

(iii) We utilized five forecast methods including RandomForest RBF Network Kstar IBK (KNN) and Ran-dom Tree via five evaluation indices These includecorrelation coefficient (CC) root mean squared error(RMSE) mean absolute error (MAE) relative abso-lute error (RAE) and root relative squared error(RRSE) We identified the normalized dataset withthe smaller index value over five evaluation indices aswell as the more fit imputation Section 4 shows thatthe better imputation method is the mean of nearbypoints

Step 3 (variable selection and model building) Based onStep 2 this study will now determine the better imputationmethod (ie mean of nearby points) Then the importantproblem is to determine the key variables that influencethe reservoir water level Therefore this step utilized factoranalysis to rank the importance of the variables for buildingthe best forecast model Based on the ordering of variable wefirst deleted the least variable of importance We then builtthe forecast model via a Random Forest RBFNetwork KstarKNN and RandomTree Next we repeatedly delete the lowervariable of importance to build the forecast model until theRMSE can no longer improve After many iterative experi-ments this study used five evaluation indices to determine

4 Computational Intelligence and Neuroscience

ree imputation methods used to estimate the missing values

(1) Dataset imputation and normalization

(2) Partition into 66 training data and 34 testing data

(3) Select better imputation method based on CC RMSE RRSE MAE and RAE

(1) e imputed dataset partition into 66training data and 34 testing data

(2) Rank the importance of variables

(3) Delete the least important variable

(4) Determine key variables

Imputation

Variable selection and modelbuilding

Dataset 1

Concatenate datasets

Dataset 2

e research dataset

Data preprocessing

Evaluation andcomparison

Model evaluation and comparison(1) Correlation coecient(2) Root mean squared error(3) Root relative squared error(4) Mean absolute error(5) Relative absolute error

Figure 1 The procedure of proposed model

the best forecast methodThis step could be introduced step-by-step as follows

(i) The imputed integrated datasets are partitioned into66 training datasets and 34 testing datasets

(ii) Factor analysis ranked the importance of the vari-ables

(iii) The variable ranking of factor analysis was used toiteratively delete the least important variable Theremaining variables were studied with Random For-est RBF Network Kstar KNN and Random Treeuntil the RMSE can no longer improve

(iv) Based on the previous Step 3 the key variables arefound when the lowest RMSE is achieved

(v) Concurrently we used five evaluation indices (CCRMSE RRSE MAE and RAE) to determine whichforecast method is a good forecasting model

Step 4 (evaluation and comparison) To verify the perfor-mance of the reservoir water level forecastingmodel this stepuses the superior imputed datasets with different variablesselected to evaluate the proposed method It then comparesthe results with the listing methods This study used CCRMSE MAE RAE and RRSE to evaluate the forecast modeThe five criteria indices are listed as equations (1)ndash(5)

RMSE = radic 1119899119899sum119894=1

(119910119894minus 119910119894)2 (1)

Computational Intelligence and Neuroscience 5

Table 2 The partial collected data

Date Rainfall Input Output Rainfall Dasi Temperature Wind Speed Direction Pressure Relative Humidity Water level200811 01 836 0 102 47 65 10015 56 24409200812 01 9608 28624 0 104 63 38 10007 59 24393200813 0 8272 8217 0 145 45 50 9977 67 24381200814 0 13332 26222 0 153 34 40 9965 82 24378200815 0 1256 30594 0 16 29 46 9963 77 24355200816 03 9874 19233 0 167 12 170 9954 83 24332200817 0 1166 19276 0 184 19 46 9949 77 24334200818 0 9312 10973 0 199 13 311 9929 78 24333200819 0 10757 12398 0 199 22 11 9922 77 243232008110 0 6515 27674 0 196 16 357 9916 80 2432008111 0 5564 24909 0 215 13 185 9902 71 242782008112 0 9167 19181 0 198 42 37 9921 75 242742008113 09 10734 18222 15 142 71 39 9962 85 242532008114 52 8009 14662 1 127 71 36 9977 85 242492008115 4 8577 24382 0 135 71 35 9984 86 24238

where 119910119894is the actual observation value of the data 119910

119894is the

forecast value of the model and 119899 is the sample number

Correlation Coefficient (CC)

CC = sum119873119894=1(119909119894minus 119909) sdot (119910

119894minus 119910)

radicsum119873119894=1(119909119894minus 119909)2 sdot radicsum119873

119894=1(119910119894minus 119910)2 (2)

where 119909119894and 119910

119894are the observed and predicted values

respectively 119909 and 119910 are the mean of the observed andpredicted values

Root Relative Squared Error (RRSE)

RRSE = radicsum119899119894=1 (119909119894 minus 119910119894)2sum119899119894=1(119910119894minus 119910)2 (3)

where 119909119894is the predicted value 119910

119894is the actual value and 119910 is

the mean of the actual value

Mean Absolute Error (MAE)

MAE = 1119899 times119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816 (4)

Here 119899 is the number of observation datasets 119910119905is the

forecast value at time 119905 and 119910119905is the actual value at time 119905

Relative Absolute Error (RAE)

RAE = sum119899119894=1 1003816100381610038161003816119910119894 minus 1199101198941003816100381610038161003816sum119899119894=1

1003816100381610038161003816119910119894 minus 1199101198941003816100381610038161003816 (5)

where 119910119894is the actual observation value of the data 119910

119894is the

mean value of the 119910119894 119910119894is the forecast value of the model and119899 is the sample number

4 Experimental Results

This section verifies the performance of the proposed forecastmodel and compares the results with the listing meth-ods To determine which imputation method has the bestperformance for the collected dataset this study collecteddaily atmospheric data from the monitoring station andthe website of Water Resources Agency in Taiwan ShimenReservoirThis work also compares the proposed model withthe listing models withwithout variable selection

41 Experimental Data The research dataset consisted oftwo historical datasets one was collected form the websiteof Taiwan Water Resources Agency and the other was fromDasi monitoring station nearest to the Shimen ReservoirThe two datasets were collected from January 1 2008 toOctober 31 2015 The two collected data are concatenatedinto an integrated dataset based on the date There are nineindependent variables and one dependent variable in theintegrated dataset that has 2854 daily records The studymainly forecasts the water level of the reservoir The waterlevel is the dependent variable The independent variablesare Reservoir OUT Reservoir IN Temperature RainfallPressure Relative Humidity Wind Speed Direction andRainfall Dasi respectivelyThe detailed data types are shownas Table 2

42 Forecast and Comparison Based on the computationalstep in Section 3 this section will employ the practically col-lected dataset to illustrate the proposedmodel and compare itwith the listing method A detailed description is introducedin the following section

(1) To achieve better processing of the missing valuesdataset this study applies series mean regressionmean of nearby points linear or themedian of nearbypointsrsquo imputation methods to estimate missing val-ues It then directly deletes missing values to deter-mine which method has the best performance After

6 Computational Intelligence and Neuroscience

Table 3 The results of listing models with the five imputation methods under percentage spilt (dataset partition into 66 training data and34 testing data) before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0085 0546 0728 0288 0534MAE 0183 0133 0111 0186 0145RMSE 0227 0200 0157 0271 0226RAE 0998 0727 0607 1015 0789RRSE 0997 0879 0690 1193 0995

Serial mean

CC 0052 0557 0739 0198 0563MAE 0174 0123 0102 0191 0126RMSE 0222 0189 0151lowast 0283 0202RAE 1001 0705 0587 1098 0722RRSE 0999 0850 0679 1276 0908

Linear

CC 0054 0565 0734 0200 0512MAE 0175 0121 0101 0189 0138RMSE 0222 0188 0152 0281 0218RAE 1000 0690 0575lowast 1082 0787RRSE 0999 0844 0684 1264 0980

Near median

CC 0054 0571 0737 0227 0559MAE 0175 0120 0101 0188 0126RMSE 0222 0186 0152 0277 0202RAE 1000 0689 0577 1074 0719RRSE 0999 0838 0681 1244 0907

Near mean

CC 0053 0572 0740lowast 0232 0512MAE 0175 0121 0101lowast 0186 0132RMSE 0222 0186 0151lowast 0275 0217RAE 1000 0690 0575lowast 1062 0756RRSE 0999 0837 0678lowast 1235 0975

Regression

CC 0052 0564 0739 0200 0509MAE 0174 0121 0102 0191 0133RMSE 0222 0188 0151 0283 0216RAE 1001 0695 0586 1096 0762RRSE 0999 0845 0680 1275 0974

lowast denotes the best performance among 5 imputation methods

normalizing the six processedmissing values datasetswe used two approaches to estimate the datasets per-centage spilt (dataset partition into 66 training dataand 34 testing data) and 10-fold cross-validationThe two approaches employ Random Forest RBFNetwork Kstar KNN and Random Tree to forecastwater levels for evaluating the six processed missingvalues methods under five evaluation indices CCRMSE MAE RAE and RRSE The results are shownin Tables 3 and 4 and we can see that the mean ofthe nearby pointsrsquo method wins versus other methodsin CC MAE RAE RRSE and RMSE Therefore themean of the nearby pointsrsquo imputation method betterestimates the Shimen Reservoir water level

(2) Variable selection andmodel building uses the resultsfrom above The best imputation method is themean of nearby points Therefore this study will usethe imputed dataset of the mean of nearby points

to select the variable and build the model For vari-able selection this study utilizes factor analysis torank the importance of independent variables for animproved imputed dataset Table 5 indicates that theordering of important variables is Reservoir IN Tem-perature Reservoir OUT Pressure Rainfall RainfallDasi Relative Humidity Wind Speed and Direction

Next we determine the key variables and build aforecast model This study utilizes the ordering ofimportant variables and iteratively deletes the leastimportant variable to iteratively implement the pro-posed forecasting model when the minimal RMSEis reached First all independent variables are usedto build the water level forecasting model Secondthe least important variables are removed one-by-one The remaining variables serve as a forecastmodel until the minimal RMSE is reached Afterthese iterative experiments the optimal forecast

Computational Intelligence and Neuroscience 7

Table 4 The results of listing models with the five imputation methods under 10-folds cross-validation before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0041 0590 0737 0246 0505MAE 0184 0126 0109 0195 0143RMSE 0227 0188 0154 0281 0225RAE 1000 0682 0592 1059 0775RRSE 0999 0825 0678 1235 0986

Serial mean

CC 0038 0612 0755 0237 0574MAE 0171 0113 0098 0181 0125RMSE 0217 0175 0143lowast 0270 0202RAE 1001 0660 0575 1058 0731RRSE 0999 0802 0658 1241 0929

Linear

CC 0042 0615 0753 0243 0551MAE 0173 0112 0098 0181 0127RMSE 0218 0175 0144 0269 0207RAE 1000 0649 0568 1057 0736RRSE 0999 0800 0660 1233 0948

Near median

CC 0043 0614 0752 0251 0535MAE 0173 0113 0098 0180 0131RMSE 0218 0175 0144 0268 0211RAE 1000 0653 0568 1041 0757RRSE 0999 0801 0661 1227 0967

Near mean

CC 0043 0613 0756lowast 0250 0558MAE 0173 0113 0098lowast 0179 0125RMSE 0218 0175 0144 0268 0205RAE 1000 0654 0565lowast 1039 0725RRSE 0999 0802 0657lowast 1226 0937

Regression

CC 0038 0618 0754 0240 0522MAE 0171 0112 0098 0181 0133RMSE 0217 0174 0143lowast 0270 0214RAE 1001 0653 0574 1055 0778RRSE 0999 0798 0659 1239 0983

lowast denotes the best performance among 5 imputation methods

Table 5 The results of variable selection

Factor1 2 3

Input 983 072 164Output 918 037 071Rainfall 717 064 503Temperature 092 965 minus149Pressure minus254 minus844 minus182Wind Speed 096 minus377 052Direction 041 351 minus067Rainfall Dasi 290 068 693Relative Humidity 025 minus196 413Note Each factorrsquos highest factor loading appears in bold

model is achieved when the Wind Speed and Direc-tion are deleted The key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

As a rule of thumb we recommend interpreting onlyfactor loadingswith an absolute value greater than 05which explains around 25 of the variance [37] Wecan see that the loadings of the two deleted variablesare smaller than 05 as seen in Table 5 Finally thisstudy employs the Random Forest forecast methodbased on variable selection and full variables to builda forecast model for water level forecasting in ShimenReservoir respectively

(3) Model comparison this study compares the proposedforecast model (using Random Forest) with the Ran-dom Forest RBF Network Kstar IBK (KNN) andRandomTree forecast models (Tables 6 and 7) Tables6 and 7 show that before variable selection (full vari-ables) the Random Forest forecast model has the bestforecast performance in CC RMSE MAE RAE andRRSE indices After variable selection all five forecastmodels have improved forecast performance underCC RMSE MAE RAE and RRSE Therefore theresults show that variable selection could improve the

8 Computational Intelligence and Neuroscience

Table 6 The results of compare forecasting models under percentage spilt (dataset partition into 66 training data and 34 testing data)after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0033 0638a 0729a 0251 0545a

MAE 0182a 0121a 0111a 0199 0135a

RMSE 0229 0176a 0156a 0287 0212a

RAE 0992a 0657a 0602a 1083 0736a

RRSE 1007 0775a 0688a 1262 0935a

Series mean

CC 0107a 0661a 0739a 0242a 0551MAE 0172a 0107a 0101a 0179a 0129RMSE 0221a 0167a 0151a 0268a 0205RAE 0988a 0615a 0579a 1027a 0740RRSE 0995a 0753a 0678a 1208a 0923

Linear

CC 0105a 0666a 0735a 0258a 0596a

MAE 0173a 0106a 0100a 0175a 0120a

RMSE 0221a 0166a 0151a 0266a 0196a

RAE 0987a 0606a 0572a 1002a 0683a

RRSE 0995a 0748a 0681a 1198a 0883a

Median of nearby points

CC 0106a 0666a 0740a 0264a 0553MAE 0173a 0107a 0100a 0177a 0127RMSE 0221a 0166a 0151a 0266a 0207RAE 0987a 0611a 0571a 1013a 0723RRSE 0995a 0747a 0677a 1195a 0932

Mean of nearby points

CC 01059a 0667a 0745ab 0249a 0540a

MAE 0173a 0107a 0099ab 0179a 0129a

RMSE 0221a 0166a 0149ab 0268a 0214a

RAE 0987a 0611a 0565ab 1025a 0735a

RRSE 0995a 0747a 0672ab 1207a 0962a

Regression

CC 0107a 0663a 0739a 0242a 0559a

MAE 0172a 0106a 0101a 0179a 0126a

RMSE 0221a 0167a 0151a 0268a 0200a

RAE 0987a 0610a 0581a 1027a 0723a

RRSE 0994a 0752a 0678a 1207a 0900a

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

forecasting performance The Random Forest modelas applied to variable selection with full variables isbetter than the listing model

43 Findings After variable selection and model buildingsome key findings can be highlighted

(1) Imputation after the two collected datasets wereconcatenated into an integrated dataset there aremissing values in the integrated dataset due to humanerror or mechanical failure Tables 3 and 4 show theintegrated dataset that uses the median of nearbypoints series mean mean of nearby points linearregression imputation and the delete strategy to eval-uate their accuracy via five machine learning forecastmodels The results show that the integrated datasetthat uses the mean of the nearby pointsrsquo imputationmethod has better forecasting performance

(2) Variable selection this study uses factor analysis torank the ordering of variables and then sequen-tially deletes the least important variables until theforecasting performance no longer improves Tables6 and 7 show that the variable selection couldsignificantly improve the performance of all fiveforecasting models After iterative experiments andvariable selection the key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

(3) Forecasting model this study proposed a time-seriesforecastingmodel based on estimatingmissing valuesand variable selection to forecast the water level inthe reservoirThese experimental results indicate thatthe Random Forest forecasting model when appliedto variable selection with full variables has betterforecasting performance than the listing models in

Computational Intelligence and Neuroscience 9

Table 7 The results of compare forecasting models under 10-folds cross-validation after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0103a 0665a 0737a 0233 0529a

MAE 0181a 0115a 0108a 0193a 0143a

RMSE 0226a 0171a 0154a 0282a 0223a

RAE 0984a 0627a 0589a 1047a 0774a

RRSE 0994a 0749a 0677a 1238 0977a

Series mean

CC 0081a 0688a 0751a 0295a 0547MAE 0169a 0103a 0098a 0170a 0131RMSE 0217a 0158a 0144 0260a 0209RAE 0988a 0600a 0571a 0990a 0767RRSE 0996a 0727a 0661 1193a 0960

Linear

CC 0081a 0692a 0750 0286a 0551a

MAE 0171a 0102a 0098a 0169a 0128RMSE 0218a 0158a 0145 0261a 0207a

RAE 0988a 0590a 0566a 0981a 0740RRSE 0996a 0723a 0662 1196a 0948a

Median of nearby points

CC 0083a 0692a 0752a 0305a 0555a

MAE 0171a 0102a 0097ab 0169a 0126a

RMSE 0218a 0158a 0144a 0259a 0208a

RAE 0987a 0593a 0563ab 0980a 0732a

RRSE 0996a 0722a 0660a 1186a 0951a

Mean of nearby points

CC 0082a 0694a 0753b 0276a 0537MAE 0171a 0102a 0097ab 0171a 0129RMSE 0218a 0157a 0144ab 0263a 0210RAE 0988a 0593a 0564a 0993a 0747RRSE 0996a 0721a 0659b 1204a 0960

Regression

CC 0081a 0690a 0753b 0295a 0572MAE 0169a 0102a 0098a 0169a 0126RMSE 0217a 0158a 0144ab 0259a 0204RAE 0988a 0595a 0571a 0989a 0735RRSE 0996a 0725a 0659ab 1193a 0938

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

the five evaluation indices The proposed time-seriesforecasting model is feasible for forecasting waterlevels in Shimen Reservoir

5 Conclusion

This study proposed a time-series forecastingmodel for waterlevel forecasting in Taiwanrsquos Shimen Reservoir The experi-ments showed that the mean of nearby pointsrsquo imputationmethod has the best performanceThese experimental resultsindicate that the Random Forest forecasting model whenapplied to variable selection with full variables has betterforecasting performance than the listing model The keyvariables are Reservoir IN Temperature Reservoir OUTPressure Rainfall Rainfall Dasi and Relative Humidity Theproposed time-series forecasting model withwithout vari-able selection has better forecasting performance than thelisting models using the five evaluation indices This showsthat the proposed time-series forecastingmodel is feasible for

forecastingwater levels in ShimenReservoir Futureworkwilladdress the following issues

(1) The reservoirrsquos utility includes irrigation domesticwater supply and electricity generation The keyvariables identified here could improve forecasting inthese fields

(2) We might apply the proposed time-series forecastingmodel based on imputation and variable selection toforecast the water level of lakes salt water bodiesreservoirs and so on

In addition this experiment shows that the proposed variableselection can help determine five forecast methods used hereto improve the forecasting capability

Conflicts of Interest

The authors declare that they have no conflicts of interest

10 Computational Intelligence and Neuroscience

References

[1] F-J Chang and Y-T Chang ldquoAdaptive neuro-fuzzy inferencesystem for prediction of water level in reservoirrdquo Advances inWater Resources vol 29 no 1 pp 1ndash10 2006

[2] M K Tiwari and C Chatterjee ldquoDevelopment of an accurateand reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approachrdquo Journal ofHydrology vol 394 no 3-4 pp 458ndash470 2010

[3] C-S Chen Y-D Jhong T-Y Wu and S-T Chen ldquoTyphoonevent-based evolutionary fuzzy inference model for flood stageforecastingrdquo Journal of Hydrology vol 490 pp 134ndash143 2013

[4] F-J Chang P-A Chen Y-R Lu E Huang and K-Y ChangldquoReal-time multi-step-ahead water level forecasting by recur-rent neural networks for urban flood controlrdquo Journal ofHydrology vol 517 pp 836ndash846 2014

[5] A Hipni A El-shafie A Najah O A Karim A Hussain andM Mukhlisin ldquoDaily Forecasting of Dam Water Levels Com-paring a Support Vector Machine (SVM)Model With AdaptiveNeuro Fuzzy Inference System (ANFIS)rdquo Water ResourcesManagement vol 27 no 10 pp 3803ndash3823 2013

[6] Y Seo S Kim O Kisi and V P Singh ldquoDaily water level fore-casting using wavelet decomposition and artificial intelligencetechniquesrdquo Journal of Hydrology vol 520 pp 224ndash243 2015

[7] D S Broomhead and D Lowe ldquoMultivariable functional inter-polation and adaptive networksrdquo Complex Systems vol 2 no 3pp 321ndash355 1988

[8] K Ganapathy V Vaidehi and J B Chandrasekar ldquoOptimumsteepest descent higher level learning radial basis functionnetworkrdquo Expert Systems with Applications vol 42 no 21 pp8064ndash8077 2015

[9] P Brandstetter and M Kuchar ldquoSensorless control of variablespeed induction motor drive using RBF neural networkrdquoJournal of Applied Logic vol 24 no part A pp 97ndash108 2017

[10] C M Anish and B Majhi ldquoHybrid nonlinear adaptive schemefor stock market prediction using feedback FLANN and factoranalysisrdquo Journal of the Korean Statistical Society vol 45 no 1pp 64ndash76 2016

[11] M Pottmann and H P Jorgl ldquoRadial basis function networksfor internal model controlrdquo Applied Mathematics and Compu-tation vol 70 no 2-3 pp 283ndash298 1995

[12] I Morlini ldquoRadial basis function networks with partiallyclassified datardquo Ecological Modelling vol 120 no 2-3 pp 109ndash118 1999

[13] A Nawahda ldquoAn assessment of adding value of traffic informa-tion and other attributes as part of its classifiers in a dataminingtool set for predicting surface ozone levelsrdquo Process Safety andEnvironmental Protection vol 99 pp 149ndash158 2016

[14] N R Kasat and S D Thepade ldquoNovel Content Based ImageClassificationMethod Using LBGVector QuantizationMethodwith Bayes and Lazy Family Data Mining Classifiersrdquo in Pro-ceedings of the 7th International Conference on CommunicationComputing and Virtualization ICCCV 2016 pp 483ndash489February 2016

[15] E Męzyk and O Unold ldquoMachine learning approach to modelsport trainingrdquoComputers in Human Behavior vol 27 no 5 pp1499ndash1506 2011

[16] J Abawajy A Kelarev M Chowdhury A Stranieri and H FJelinek ldquoPredicting cardiac autonomic neuropathy category fordiabetic data with missing valuesrdquo Computers in Biology andMedicine vol 43 no 10 pp 1328ndash1333 2013

[17] M A Ayu S A Ismail A F A Matin and T Mantoro ldquoAcomparison study of classifier algorithms for mobile-phonersquosaccelerometer based activity recognitionrdquoProcedia Engineeringvol 41 pp 224ndash229 2012

[18] A Tamvakis V Trygonis J Miritzis G Tsirtsis and S SpatharisldquoOptimizing biodiversity prediction from abiotic parametersrdquoEnvironmentalModeling and Software vol 53 pp 112ndash120 2014

[19] C Brokamp R Jandarov M B Rao G LeMasters and PRyan ldquoExposure assessment models for elemental componentsof particulate matter in an urban environment a comparisonof regression and random forest approachesrdquo AtmosphericEnvironment vol 151 pp 1ndash11 2017

[20] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[21] P Yıldırım ldquoPattern Classification with Imbalanced and Mul-ticlass Data for the Prediction of Albendazole Adverse EventOutcomesrdquo in Proceedings of the 7th International Conference onAmbient Systems Networks and Technologies ANT 2016 and the6th International Conference on Sustainable Energy InformationTechnology SEIT 2016 pp 1013ndash1018 May 2016

[22] Y Zhao and Y Zhang ldquoComparison of decision tree methodsfor finding active objectsrdquo Advances in Space Research vol 41no 12 pp 1955ndash1959 2008

[23] KM A Kumar N Rajasimha M Reddy A Rajanarayana andK Nadgir ldquoAnalysis of usersrsquo Sentiments from Kannada WebDocumentsrdquo in Proceedings of the 11th International Conferenceon Communication Networks ICCN 2015 pp 247ndash256 August2015

[24] C ZhongW Pedrycz DWang L Li and Z Li ldquoGranular dataimputation A framework of Granular ComputingrdquoApplied SoftComputing vol 46 pp 307ndash316 2016

[25] C Genolini A Lacombe R Ecochard and F Subtil ldquoCopy-Mean A new method to predict monotone missing values inlongitudinal studiesrdquo Computer Methods and Programs in Bio-medicine vol 132 pp 29ndash44 2016

[26] M Majidpour C Qiu P Chu H R Pota and R GadhldquoForecasting the EV charging load based on customer profileor station measurementrdquo Applied Energy vol 163 pp 134ndash1412016

[27] O Cokluk and M Kayri ldquoThe effects of methods of imputationfor missing values on the validity and reliability of scalesrdquo inEducational Sciences Theory and Practice vol 11 pp 303ndash3092011

[28] S A Bashir and S W Duffy ldquoThe correction of risk estimatesformeasurement errorrdquoAnnals of Epidemiology vol 7 no 2 pp154ndash164 1997

[29] A Ozturk S Kayalıgil and N E Ozdemirel ldquoManufacturinglead time estimation using data miningrdquo European Journal ofOperational Research vol 173 no 2 pp 683ndash700 2006

[30] H Vathsala and S G Koolagudi ldquoClosed Item-Set Mining forPrediction of Indian SummerMonsoon Rainfall ADataMiningModel with Land and Ocean Variables as Predictorsrdquo in Pro-ceedings of the 11th International Conference on CommunicationNetworks ICCN 2015 pp 271ndash280 August 2015

[31] S Banerjee A Anura J Chakrabarty S Sengupta and J Chat-terjee ldquoIdentification and functional assessment of novel genesets towards better understanding of dysplasia associated oralcarcinogenesisrdquo Gene Reports vol 4 pp 131ndash138 2016

[32] H Yahyaoui and A Al-Mutairi ldquoA feature-based trust sequenceclassification algorithmrdquo Information Sciences vol 328 pp 455ndash484 2016

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

2 Computational Intelligence and Neuroscience

This paper includes five sections Section 2 is relatedworkSection 3 proposes research methodology and introducesthe concepts imputation methods variable selection andforecasting model Section 4 verifies the proposed model andcompares with the listing models Section 5 concludes

2 Related Work

This section introduces a forecast method of machine learn-ing imputation techniques and variable selection

21 Machine Learning Forecast (Regression)

211 RBF Network Radial Basis Function Networks wereproposed by Broomhead and Lowe in 1988 [7] RBF is asimple supervised learning feed forward network that avoidsiterative training processes and trains the data at one stage[8] The RBF Network is a type of ANN for applications tosolve problems of supervised learning for example regres-sion classification and time-series prediction [9] The RBFNetwork consists of three layers input layer hidden layer andoutput layer The input layer is the set of source nodes thesecond layer is a hidden layer high dimension and the outputlayer gives the response of the network to the activationpatterns applied to the input layer [10] The advantages ofthe RBF approach are the (partial) linearity in the parametersand the availability of fast and efficient training methods [11]The use of radial basis functions results from a number ofdifferent concepts including function approximation noisyinterpolation density estimation and optimal classificationtheory [12]

212 Kstar The Kstar is an instance-based classifier thatdiffers from other instance-based learners in that it uses anentropy-based distance function [13] The Lazy Family DataMining Classifiers supports incremental learning It containssome classifiers such as Kstar and it takes less time fortraining and more time for predicting [14] It provides a con-sistent approach to handling symbolic attributes real valuedattributes and missing values [15] Kstar uses an entropy-based distance function for instance-based regression Thepredicted class value of a test instance comes from values oftraining instances that are similar to the Kstar [16]

213 KNN The 119896-Nearest-Neighbor classifier offers a goodclassification accuracy rate for activity classification [17] ThekNN algorithm is based on the notion that similar instanceshave similar behavior and thus the new input instances arepredicted according to the stored most similar neighboringinstances [18]

22 Random Forest A Random Forest can be applied forclassification regression and unsupervised learning [19] It issimilar to the baggingmethod RandomForest is an ensemblelearning method A decision tree represents the classifierRandom Forest gets 119873 outputs through 119873 decision treesThese are forecast by voting for all of the predicted results Itcan solve the classification and regression problems RandomForest is simple and easily parallelized [20]

23 Random Tree Random Tree is an ensemble learn-ing algorithm that generates many individual learners andemploys a bagging idea to produce a random set of data in theconstruction of a decision tree [21] Random Tree classifierscan deal with regression and classification problems Randomtrees can be generated efficiently and can be combined intolarge sets of random trees This generally leads to accuratemodels [22] The Random Tree classifier takes the inputfeature vector and classifies it with every tree in the forestIt then outputs the class label that received themajority of thevotes [23]

24 Imputation The daily atmospheric data may have miss-ing values due to human error or machine failure Manyprevious studies have shown that the statistical bias occurredwhen the missing values were directly deleted Thus imput-ing data can significantly improve the quality of the datasetOtherwise biased results may cause poor performance inthe ensuing constructs [24] Single imputation methods haveseveral advantages such as a wider scope than multipleimputation methods Sometimes it is more important to findthe missing values than to estimate the parameters [25] Themedian of nearby point imputation methods uses nearbyvalues for ordering and then selects the median to replacethe missing value The advantage of the median imputationmethod is that its replaced value is actually a real valuein the data [26] Series mean imputation methods replacethe average of the variables directly Regression imputationmethod uses simple linear regression to estimate missingvalues and replace them The mean of the nearby pointimputation methods is the mean of nearby values Thenumber of nearby values can be found by using a ldquospan ofnearby pointsrdquo option [27] The linear imputation is mostreadily applicable to continuous explanatory variables [28]

25 Variable Selection The variable selectionmethodmainlyidentifies the key variable that actually influences the fore-casting target from several variables It then deletes theunimportant variables to improve the modelrsquos efficiency Itcan solve high dimensional and complex problems Previousstudies in several field have shown that variable selectioncan improve the forecasting efficiency of machine learningmethods [29ndash32]

Variable selection is an important technique in datapreprocessing It removes irrelevant data and improves theaccuracy and comprehensibility of the results [33] Variableselection methods can be categorized into three classes filterwrapper and embedded Filter uses statistic methods toselect variables It has better generalization ability and lowercomputational demands Wrapper methods use classifiers toidentify the best subset components The embedded methodhas a deeper interaction between variable selection andconstruction of the classifier [34]

Filter models utilize statistical techniques such as prin-cipal component analysis (PCA) factor analysis (FA) inde-pendent component analysis and discriminate analysis in theinvestigation of other indirect performance measures Theseare mostly based on distance and information measures [35]

Computational Intelligence and Neuroscience 3

PCA transforms a set of feature columns in the datasetinto a projection of the feature space with lower dimen-sionality FA is a generalization of PCA the main differencebetween PCA and FA is that FA allows noise to have non-spherical shape while transforming the data The main goalof both PCA and FA is to transform the coordinate systemsuch that correlation between system variables is minimized[36]

There are several methods to decide how many factorshave to be extracted The most widely used method fordetermining the number of factors is using eigenvaluesgreater than one [10]

3 Proposed Model

Reservoirs are important domestically as well as in thenational defense and for economic development Thus thereservoir water levels should be forecast over a long periodof time and water resources should be planned and man-aged comprehensively to reach great cost-effectiveness Thispaper proposes a time-series forecasting model based on theimputation of missing values and variable selection First theproposed model used five imputation methods (ie medianof nearby points series mean mean of nearby points linearand regression imputation) It then compares these findingswith a delete strategy to estimate the missing values Secondby identifying the key variable that influences the daily waterlevels the proposed method ranked the importance of theatmospheric variables via factor analysis It then sequentiallyremoves the unimportant variables Finally the proposedmodel uses a Random Forest machine learning method tobuild a forecasting model of the reservoir water level tocompare it to other methods The proposed model could bepartitioned into four parts data preprocessing imputationand feature selection model building and accuracy evalua-tion The procedure is shown in Figure 1

31 Computational Step To understand the proposed modelmore easily this section partitioned the proposed model intofour steps

Step 1 (data preprocessing) The related water level and atmo-spheric data were collected from the reservoir administrationwebsite and the Taoyuan weather station The two collecteddatasets are concatenated into an integrated dataset basedon the date There are nine independent variables and onedependent variable in the integrated datasetThe variables aredefined in the integrated dataset and are listed in Table 1

Step 2 (imputation) After Step 1 we found some variablesand records with missing values in the integrated datasetdue to mechanical measurement failure or human operationerror Previous studies showed that deleting missing valuesdirectly will impact the results To identify that one thatbetter fits with the imputationmethod this paper utilized fiveimputation methods to estimate the missing values and thencompared it with no imputation method to directly deletethe missing value The five imputation methods were themedian of the nearby points series mean mean of nearby

Table 1 Description of variables in the research dataset

Output Shimen Reservoir daily discharge releaseInput Shimen Reservoir daily inflow dischargeTemperature Daily temperature in Daxi Taoyuan

Rainfall The previous day Shimen Reservoiraccumulated rainfall

Pressure Daily barometric pressure in DaxiTaoyuan

Relative Humidity Daily relative humidity in Daxi TaoyuanWind Speed Daily wind speed in Daxi TaoyuanDirection Daily wind direction in Daxi TaoyuanRainfall Dasi Daily rainfall in Daxi Taoyuan

points linear imputation and regression imputation Thisstudy had six processed datasets after processing the missingvalues problem The problem is then how to identify theimputationmethod that is a better fit to the integrated datasetTo determine this the following steps were followed

(i) In order to rescale all numeric values in the range[0 1] this step normalized each variable value bydividing the maximal value for the five imputeddatasets and then deleted the missing value datasetAll independent and dependent variables are positivevalues

(ii) The six normalized datasets are partitioned into 66training datasets and 34 testing datasets We alsoemployed a 10-fold cross-validation approach to iden-tify the imputation dataset that has best predictionperformance

(iii) We utilized five forecast methods including RandomForest RBF Network Kstar IBK (KNN) and Ran-dom Tree via five evaluation indices These includecorrelation coefficient (CC) root mean squared error(RMSE) mean absolute error (MAE) relative abso-lute error (RAE) and root relative squared error(RRSE) We identified the normalized dataset withthe smaller index value over five evaluation indices aswell as the more fit imputation Section 4 shows thatthe better imputation method is the mean of nearbypoints

Step 3 (variable selection and model building) Based onStep 2 this study will now determine the better imputationmethod (ie mean of nearby points) Then the importantproblem is to determine the key variables that influencethe reservoir water level Therefore this step utilized factoranalysis to rank the importance of the variables for buildingthe best forecast model Based on the ordering of variable wefirst deleted the least variable of importance We then builtthe forecast model via a Random Forest RBFNetwork KstarKNN and RandomTree Next we repeatedly delete the lowervariable of importance to build the forecast model until theRMSE can no longer improve After many iterative experi-ments this study used five evaluation indices to determine

4 Computational Intelligence and Neuroscience

ree imputation methods used to estimate the missing values

(1) Dataset imputation and normalization

(2) Partition into 66 training data and 34 testing data

(3) Select better imputation method based on CC RMSE RRSE MAE and RAE

(1) e imputed dataset partition into 66training data and 34 testing data

(2) Rank the importance of variables

(3) Delete the least important variable

(4) Determine key variables

Imputation

Variable selection and modelbuilding

Dataset 1

Concatenate datasets

Dataset 2

e research dataset

Data preprocessing

Evaluation andcomparison

Model evaluation and comparison(1) Correlation coecient(2) Root mean squared error(3) Root relative squared error(4) Mean absolute error(5) Relative absolute error

Figure 1 The procedure of proposed model

the best forecast methodThis step could be introduced step-by-step as follows

(i) The imputed integrated datasets are partitioned into66 training datasets and 34 testing datasets

(ii) Factor analysis ranked the importance of the vari-ables

(iii) The variable ranking of factor analysis was used toiteratively delete the least important variable Theremaining variables were studied with Random For-est RBF Network Kstar KNN and Random Treeuntil the RMSE can no longer improve

(iv) Based on the previous Step 3 the key variables arefound when the lowest RMSE is achieved

(v) Concurrently we used five evaluation indices (CCRMSE RRSE MAE and RAE) to determine whichforecast method is a good forecasting model

Step 4 (evaluation and comparison) To verify the perfor-mance of the reservoir water level forecastingmodel this stepuses the superior imputed datasets with different variablesselected to evaluate the proposed method It then comparesthe results with the listing methods This study used CCRMSE MAE RAE and RRSE to evaluate the forecast modeThe five criteria indices are listed as equations (1)ndash(5)

RMSE = radic 1119899119899sum119894=1

(119910119894minus 119910119894)2 (1)

Computational Intelligence and Neuroscience 5

Table 2 The partial collected data

Date Rainfall Input Output Rainfall Dasi Temperature Wind Speed Direction Pressure Relative Humidity Water level200811 01 836 0 102 47 65 10015 56 24409200812 01 9608 28624 0 104 63 38 10007 59 24393200813 0 8272 8217 0 145 45 50 9977 67 24381200814 0 13332 26222 0 153 34 40 9965 82 24378200815 0 1256 30594 0 16 29 46 9963 77 24355200816 03 9874 19233 0 167 12 170 9954 83 24332200817 0 1166 19276 0 184 19 46 9949 77 24334200818 0 9312 10973 0 199 13 311 9929 78 24333200819 0 10757 12398 0 199 22 11 9922 77 243232008110 0 6515 27674 0 196 16 357 9916 80 2432008111 0 5564 24909 0 215 13 185 9902 71 242782008112 0 9167 19181 0 198 42 37 9921 75 242742008113 09 10734 18222 15 142 71 39 9962 85 242532008114 52 8009 14662 1 127 71 36 9977 85 242492008115 4 8577 24382 0 135 71 35 9984 86 24238

where 119910119894is the actual observation value of the data 119910

119894is the

forecast value of the model and 119899 is the sample number

Correlation Coefficient (CC)

CC = sum119873119894=1(119909119894minus 119909) sdot (119910

119894minus 119910)

radicsum119873119894=1(119909119894minus 119909)2 sdot radicsum119873

119894=1(119910119894minus 119910)2 (2)

where 119909119894and 119910

119894are the observed and predicted values

respectively 119909 and 119910 are the mean of the observed andpredicted values

Root Relative Squared Error (RRSE)

RRSE = radicsum119899119894=1 (119909119894 minus 119910119894)2sum119899119894=1(119910119894minus 119910)2 (3)

where 119909119894is the predicted value 119910

119894is the actual value and 119910 is

the mean of the actual value

Mean Absolute Error (MAE)

MAE = 1119899 times119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816 (4)

Here 119899 is the number of observation datasets 119910119905is the

forecast value at time 119905 and 119910119905is the actual value at time 119905

Relative Absolute Error (RAE)

RAE = sum119899119894=1 1003816100381610038161003816119910119894 minus 1199101198941003816100381610038161003816sum119899119894=1

1003816100381610038161003816119910119894 minus 1199101198941003816100381610038161003816 (5)

where 119910119894is the actual observation value of the data 119910

119894is the

mean value of the 119910119894 119910119894is the forecast value of the model and119899 is the sample number

4 Experimental Results

This section verifies the performance of the proposed forecastmodel and compares the results with the listing meth-ods To determine which imputation method has the bestperformance for the collected dataset this study collecteddaily atmospheric data from the monitoring station andthe website of Water Resources Agency in Taiwan ShimenReservoirThis work also compares the proposed model withthe listing models withwithout variable selection

41 Experimental Data The research dataset consisted oftwo historical datasets one was collected form the websiteof Taiwan Water Resources Agency and the other was fromDasi monitoring station nearest to the Shimen ReservoirThe two datasets were collected from January 1 2008 toOctober 31 2015 The two collected data are concatenatedinto an integrated dataset based on the date There are nineindependent variables and one dependent variable in theintegrated dataset that has 2854 daily records The studymainly forecasts the water level of the reservoir The waterlevel is the dependent variable The independent variablesare Reservoir OUT Reservoir IN Temperature RainfallPressure Relative Humidity Wind Speed Direction andRainfall Dasi respectivelyThe detailed data types are shownas Table 2

42 Forecast and Comparison Based on the computationalstep in Section 3 this section will employ the practically col-lected dataset to illustrate the proposedmodel and compare itwith the listing method A detailed description is introducedin the following section

(1) To achieve better processing of the missing valuesdataset this study applies series mean regressionmean of nearby points linear or themedian of nearbypointsrsquo imputation methods to estimate missing val-ues It then directly deletes missing values to deter-mine which method has the best performance After

6 Computational Intelligence and Neuroscience

Table 3 The results of listing models with the five imputation methods under percentage spilt (dataset partition into 66 training data and34 testing data) before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0085 0546 0728 0288 0534MAE 0183 0133 0111 0186 0145RMSE 0227 0200 0157 0271 0226RAE 0998 0727 0607 1015 0789RRSE 0997 0879 0690 1193 0995

Serial mean

CC 0052 0557 0739 0198 0563MAE 0174 0123 0102 0191 0126RMSE 0222 0189 0151lowast 0283 0202RAE 1001 0705 0587 1098 0722RRSE 0999 0850 0679 1276 0908

Linear

CC 0054 0565 0734 0200 0512MAE 0175 0121 0101 0189 0138RMSE 0222 0188 0152 0281 0218RAE 1000 0690 0575lowast 1082 0787RRSE 0999 0844 0684 1264 0980

Near median

CC 0054 0571 0737 0227 0559MAE 0175 0120 0101 0188 0126RMSE 0222 0186 0152 0277 0202RAE 1000 0689 0577 1074 0719RRSE 0999 0838 0681 1244 0907

Near mean

CC 0053 0572 0740lowast 0232 0512MAE 0175 0121 0101lowast 0186 0132RMSE 0222 0186 0151lowast 0275 0217RAE 1000 0690 0575lowast 1062 0756RRSE 0999 0837 0678lowast 1235 0975

Regression

CC 0052 0564 0739 0200 0509MAE 0174 0121 0102 0191 0133RMSE 0222 0188 0151 0283 0216RAE 1001 0695 0586 1096 0762RRSE 0999 0845 0680 1275 0974

lowast denotes the best performance among 5 imputation methods

normalizing the six processedmissing values datasetswe used two approaches to estimate the datasets per-centage spilt (dataset partition into 66 training dataand 34 testing data) and 10-fold cross-validationThe two approaches employ Random Forest RBFNetwork Kstar KNN and Random Tree to forecastwater levels for evaluating the six processed missingvalues methods under five evaluation indices CCRMSE MAE RAE and RRSE The results are shownin Tables 3 and 4 and we can see that the mean ofthe nearby pointsrsquo method wins versus other methodsin CC MAE RAE RRSE and RMSE Therefore themean of the nearby pointsrsquo imputation method betterestimates the Shimen Reservoir water level

(2) Variable selection andmodel building uses the resultsfrom above The best imputation method is themean of nearby points Therefore this study will usethe imputed dataset of the mean of nearby points

to select the variable and build the model For vari-able selection this study utilizes factor analysis torank the importance of independent variables for animproved imputed dataset Table 5 indicates that theordering of important variables is Reservoir IN Tem-perature Reservoir OUT Pressure Rainfall RainfallDasi Relative Humidity Wind Speed and Direction

Next we determine the key variables and build aforecast model This study utilizes the ordering ofimportant variables and iteratively deletes the leastimportant variable to iteratively implement the pro-posed forecasting model when the minimal RMSEis reached First all independent variables are usedto build the water level forecasting model Secondthe least important variables are removed one-by-one The remaining variables serve as a forecastmodel until the minimal RMSE is reached Afterthese iterative experiments the optimal forecast

Computational Intelligence and Neuroscience 7

Table 4 The results of listing models with the five imputation methods under 10-folds cross-validation before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0041 0590 0737 0246 0505MAE 0184 0126 0109 0195 0143RMSE 0227 0188 0154 0281 0225RAE 1000 0682 0592 1059 0775RRSE 0999 0825 0678 1235 0986

Serial mean

CC 0038 0612 0755 0237 0574MAE 0171 0113 0098 0181 0125RMSE 0217 0175 0143lowast 0270 0202RAE 1001 0660 0575 1058 0731RRSE 0999 0802 0658 1241 0929

Linear

CC 0042 0615 0753 0243 0551MAE 0173 0112 0098 0181 0127RMSE 0218 0175 0144 0269 0207RAE 1000 0649 0568 1057 0736RRSE 0999 0800 0660 1233 0948

Near median

CC 0043 0614 0752 0251 0535MAE 0173 0113 0098 0180 0131RMSE 0218 0175 0144 0268 0211RAE 1000 0653 0568 1041 0757RRSE 0999 0801 0661 1227 0967

Near mean

CC 0043 0613 0756lowast 0250 0558MAE 0173 0113 0098lowast 0179 0125RMSE 0218 0175 0144 0268 0205RAE 1000 0654 0565lowast 1039 0725RRSE 0999 0802 0657lowast 1226 0937

Regression

CC 0038 0618 0754 0240 0522MAE 0171 0112 0098 0181 0133RMSE 0217 0174 0143lowast 0270 0214RAE 1001 0653 0574 1055 0778RRSE 0999 0798 0659 1239 0983

lowast denotes the best performance among 5 imputation methods

Table 5 The results of variable selection

Factor1 2 3

Input 983 072 164Output 918 037 071Rainfall 717 064 503Temperature 092 965 minus149Pressure minus254 minus844 minus182Wind Speed 096 minus377 052Direction 041 351 minus067Rainfall Dasi 290 068 693Relative Humidity 025 minus196 413Note Each factorrsquos highest factor loading appears in bold

model is achieved when the Wind Speed and Direc-tion are deleted The key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

As a rule of thumb we recommend interpreting onlyfactor loadingswith an absolute value greater than 05which explains around 25 of the variance [37] Wecan see that the loadings of the two deleted variablesare smaller than 05 as seen in Table 5 Finally thisstudy employs the Random Forest forecast methodbased on variable selection and full variables to builda forecast model for water level forecasting in ShimenReservoir respectively

(3) Model comparison this study compares the proposedforecast model (using Random Forest) with the Ran-dom Forest RBF Network Kstar IBK (KNN) andRandomTree forecast models (Tables 6 and 7) Tables6 and 7 show that before variable selection (full vari-ables) the Random Forest forecast model has the bestforecast performance in CC RMSE MAE RAE andRRSE indices After variable selection all five forecastmodels have improved forecast performance underCC RMSE MAE RAE and RRSE Therefore theresults show that variable selection could improve the

8 Computational Intelligence and Neuroscience

Table 6 The results of compare forecasting models under percentage spilt (dataset partition into 66 training data and 34 testing data)after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0033 0638a 0729a 0251 0545a

MAE 0182a 0121a 0111a 0199 0135a

RMSE 0229 0176a 0156a 0287 0212a

RAE 0992a 0657a 0602a 1083 0736a

RRSE 1007 0775a 0688a 1262 0935a

Series mean

CC 0107a 0661a 0739a 0242a 0551MAE 0172a 0107a 0101a 0179a 0129RMSE 0221a 0167a 0151a 0268a 0205RAE 0988a 0615a 0579a 1027a 0740RRSE 0995a 0753a 0678a 1208a 0923

Linear

CC 0105a 0666a 0735a 0258a 0596a

MAE 0173a 0106a 0100a 0175a 0120a

RMSE 0221a 0166a 0151a 0266a 0196a

RAE 0987a 0606a 0572a 1002a 0683a

RRSE 0995a 0748a 0681a 1198a 0883a

Median of nearby points

CC 0106a 0666a 0740a 0264a 0553MAE 0173a 0107a 0100a 0177a 0127RMSE 0221a 0166a 0151a 0266a 0207RAE 0987a 0611a 0571a 1013a 0723RRSE 0995a 0747a 0677a 1195a 0932

Mean of nearby points

CC 01059a 0667a 0745ab 0249a 0540a

MAE 0173a 0107a 0099ab 0179a 0129a

RMSE 0221a 0166a 0149ab 0268a 0214a

RAE 0987a 0611a 0565ab 1025a 0735a

RRSE 0995a 0747a 0672ab 1207a 0962a

Regression

CC 0107a 0663a 0739a 0242a 0559a

MAE 0172a 0106a 0101a 0179a 0126a

RMSE 0221a 0167a 0151a 0268a 0200a

RAE 0987a 0610a 0581a 1027a 0723a

RRSE 0994a 0752a 0678a 1207a 0900a

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

forecasting performance The Random Forest modelas applied to variable selection with full variables isbetter than the listing model

43 Findings After variable selection and model buildingsome key findings can be highlighted

(1) Imputation after the two collected datasets wereconcatenated into an integrated dataset there aremissing values in the integrated dataset due to humanerror or mechanical failure Tables 3 and 4 show theintegrated dataset that uses the median of nearbypoints series mean mean of nearby points linearregression imputation and the delete strategy to eval-uate their accuracy via five machine learning forecastmodels The results show that the integrated datasetthat uses the mean of the nearby pointsrsquo imputationmethod has better forecasting performance

(2) Variable selection this study uses factor analysis torank the ordering of variables and then sequen-tially deletes the least important variables until theforecasting performance no longer improves Tables6 and 7 show that the variable selection couldsignificantly improve the performance of all fiveforecasting models After iterative experiments andvariable selection the key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

(3) Forecasting model this study proposed a time-seriesforecastingmodel based on estimatingmissing valuesand variable selection to forecast the water level inthe reservoirThese experimental results indicate thatthe Random Forest forecasting model when appliedto variable selection with full variables has betterforecasting performance than the listing models in

Computational Intelligence and Neuroscience 9

Table 7 The results of compare forecasting models under 10-folds cross-validation after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0103a 0665a 0737a 0233 0529a

MAE 0181a 0115a 0108a 0193a 0143a

RMSE 0226a 0171a 0154a 0282a 0223a

RAE 0984a 0627a 0589a 1047a 0774a

RRSE 0994a 0749a 0677a 1238 0977a

Series mean

CC 0081a 0688a 0751a 0295a 0547MAE 0169a 0103a 0098a 0170a 0131RMSE 0217a 0158a 0144 0260a 0209RAE 0988a 0600a 0571a 0990a 0767RRSE 0996a 0727a 0661 1193a 0960

Linear

CC 0081a 0692a 0750 0286a 0551a

MAE 0171a 0102a 0098a 0169a 0128RMSE 0218a 0158a 0145 0261a 0207a

RAE 0988a 0590a 0566a 0981a 0740RRSE 0996a 0723a 0662 1196a 0948a

Median of nearby points

CC 0083a 0692a 0752a 0305a 0555a

MAE 0171a 0102a 0097ab 0169a 0126a

RMSE 0218a 0158a 0144a 0259a 0208a

RAE 0987a 0593a 0563ab 0980a 0732a

RRSE 0996a 0722a 0660a 1186a 0951a

Mean of nearby points

CC 0082a 0694a 0753b 0276a 0537MAE 0171a 0102a 0097ab 0171a 0129RMSE 0218a 0157a 0144ab 0263a 0210RAE 0988a 0593a 0564a 0993a 0747RRSE 0996a 0721a 0659b 1204a 0960

Regression

CC 0081a 0690a 0753b 0295a 0572MAE 0169a 0102a 0098a 0169a 0126RMSE 0217a 0158a 0144ab 0259a 0204RAE 0988a 0595a 0571a 0989a 0735RRSE 0996a 0725a 0659ab 1193a 0938

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

the five evaluation indices The proposed time-seriesforecasting model is feasible for forecasting waterlevels in Shimen Reservoir

5 Conclusion

This study proposed a time-series forecastingmodel for waterlevel forecasting in Taiwanrsquos Shimen Reservoir The experi-ments showed that the mean of nearby pointsrsquo imputationmethod has the best performanceThese experimental resultsindicate that the Random Forest forecasting model whenapplied to variable selection with full variables has betterforecasting performance than the listing model The keyvariables are Reservoir IN Temperature Reservoir OUTPressure Rainfall Rainfall Dasi and Relative Humidity Theproposed time-series forecasting model withwithout vari-able selection has better forecasting performance than thelisting models using the five evaluation indices This showsthat the proposed time-series forecastingmodel is feasible for

forecastingwater levels in ShimenReservoir Futureworkwilladdress the following issues

(1) The reservoirrsquos utility includes irrigation domesticwater supply and electricity generation The keyvariables identified here could improve forecasting inthese fields

(2) We might apply the proposed time-series forecastingmodel based on imputation and variable selection toforecast the water level of lakes salt water bodiesreservoirs and so on

In addition this experiment shows that the proposed variableselection can help determine five forecast methods used hereto improve the forecasting capability

Conflicts of Interest

The authors declare that they have no conflicts of interest

10 Computational Intelligence and Neuroscience

References

[1] F-J Chang and Y-T Chang ldquoAdaptive neuro-fuzzy inferencesystem for prediction of water level in reservoirrdquo Advances inWater Resources vol 29 no 1 pp 1ndash10 2006

[2] M K Tiwari and C Chatterjee ldquoDevelopment of an accurateand reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approachrdquo Journal ofHydrology vol 394 no 3-4 pp 458ndash470 2010

[3] C-S Chen Y-D Jhong T-Y Wu and S-T Chen ldquoTyphoonevent-based evolutionary fuzzy inference model for flood stageforecastingrdquo Journal of Hydrology vol 490 pp 134ndash143 2013

[4] F-J Chang P-A Chen Y-R Lu E Huang and K-Y ChangldquoReal-time multi-step-ahead water level forecasting by recur-rent neural networks for urban flood controlrdquo Journal ofHydrology vol 517 pp 836ndash846 2014

[5] A Hipni A El-shafie A Najah O A Karim A Hussain andM Mukhlisin ldquoDaily Forecasting of Dam Water Levels Com-paring a Support Vector Machine (SVM)Model With AdaptiveNeuro Fuzzy Inference System (ANFIS)rdquo Water ResourcesManagement vol 27 no 10 pp 3803ndash3823 2013

[6] Y Seo S Kim O Kisi and V P Singh ldquoDaily water level fore-casting using wavelet decomposition and artificial intelligencetechniquesrdquo Journal of Hydrology vol 520 pp 224ndash243 2015

[7] D S Broomhead and D Lowe ldquoMultivariable functional inter-polation and adaptive networksrdquo Complex Systems vol 2 no 3pp 321ndash355 1988

[8] K Ganapathy V Vaidehi and J B Chandrasekar ldquoOptimumsteepest descent higher level learning radial basis functionnetworkrdquo Expert Systems with Applications vol 42 no 21 pp8064ndash8077 2015

[9] P Brandstetter and M Kuchar ldquoSensorless control of variablespeed induction motor drive using RBF neural networkrdquoJournal of Applied Logic vol 24 no part A pp 97ndash108 2017

[10] C M Anish and B Majhi ldquoHybrid nonlinear adaptive schemefor stock market prediction using feedback FLANN and factoranalysisrdquo Journal of the Korean Statistical Society vol 45 no 1pp 64ndash76 2016

[11] M Pottmann and H P Jorgl ldquoRadial basis function networksfor internal model controlrdquo Applied Mathematics and Compu-tation vol 70 no 2-3 pp 283ndash298 1995

[12] I Morlini ldquoRadial basis function networks with partiallyclassified datardquo Ecological Modelling vol 120 no 2-3 pp 109ndash118 1999

[13] A Nawahda ldquoAn assessment of adding value of traffic informa-tion and other attributes as part of its classifiers in a dataminingtool set for predicting surface ozone levelsrdquo Process Safety andEnvironmental Protection vol 99 pp 149ndash158 2016

[14] N R Kasat and S D Thepade ldquoNovel Content Based ImageClassificationMethod Using LBGVector QuantizationMethodwith Bayes and Lazy Family Data Mining Classifiersrdquo in Pro-ceedings of the 7th International Conference on CommunicationComputing and Virtualization ICCCV 2016 pp 483ndash489February 2016

[15] E Męzyk and O Unold ldquoMachine learning approach to modelsport trainingrdquoComputers in Human Behavior vol 27 no 5 pp1499ndash1506 2011

[16] J Abawajy A Kelarev M Chowdhury A Stranieri and H FJelinek ldquoPredicting cardiac autonomic neuropathy category fordiabetic data with missing valuesrdquo Computers in Biology andMedicine vol 43 no 10 pp 1328ndash1333 2013

[17] M A Ayu S A Ismail A F A Matin and T Mantoro ldquoAcomparison study of classifier algorithms for mobile-phonersquosaccelerometer based activity recognitionrdquoProcedia Engineeringvol 41 pp 224ndash229 2012

[18] A Tamvakis V Trygonis J Miritzis G Tsirtsis and S SpatharisldquoOptimizing biodiversity prediction from abiotic parametersrdquoEnvironmentalModeling and Software vol 53 pp 112ndash120 2014

[19] C Brokamp R Jandarov M B Rao G LeMasters and PRyan ldquoExposure assessment models for elemental componentsof particulate matter in an urban environment a comparisonof regression and random forest approachesrdquo AtmosphericEnvironment vol 151 pp 1ndash11 2017

[20] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[21] P Yıldırım ldquoPattern Classification with Imbalanced and Mul-ticlass Data for the Prediction of Albendazole Adverse EventOutcomesrdquo in Proceedings of the 7th International Conference onAmbient Systems Networks and Technologies ANT 2016 and the6th International Conference on Sustainable Energy InformationTechnology SEIT 2016 pp 1013ndash1018 May 2016

[22] Y Zhao and Y Zhang ldquoComparison of decision tree methodsfor finding active objectsrdquo Advances in Space Research vol 41no 12 pp 1955ndash1959 2008

[23] KM A Kumar N Rajasimha M Reddy A Rajanarayana andK Nadgir ldquoAnalysis of usersrsquo Sentiments from Kannada WebDocumentsrdquo in Proceedings of the 11th International Conferenceon Communication Networks ICCN 2015 pp 247ndash256 August2015

[24] C ZhongW Pedrycz DWang L Li and Z Li ldquoGranular dataimputation A framework of Granular ComputingrdquoApplied SoftComputing vol 46 pp 307ndash316 2016

[25] C Genolini A Lacombe R Ecochard and F Subtil ldquoCopy-Mean A new method to predict monotone missing values inlongitudinal studiesrdquo Computer Methods and Programs in Bio-medicine vol 132 pp 29ndash44 2016

[26] M Majidpour C Qiu P Chu H R Pota and R GadhldquoForecasting the EV charging load based on customer profileor station measurementrdquo Applied Energy vol 163 pp 134ndash1412016

[27] O Cokluk and M Kayri ldquoThe effects of methods of imputationfor missing values on the validity and reliability of scalesrdquo inEducational Sciences Theory and Practice vol 11 pp 303ndash3092011

[28] S A Bashir and S W Duffy ldquoThe correction of risk estimatesformeasurement errorrdquoAnnals of Epidemiology vol 7 no 2 pp154ndash164 1997

[29] A Ozturk S Kayalıgil and N E Ozdemirel ldquoManufacturinglead time estimation using data miningrdquo European Journal ofOperational Research vol 173 no 2 pp 683ndash700 2006

[30] H Vathsala and S G Koolagudi ldquoClosed Item-Set Mining forPrediction of Indian SummerMonsoon Rainfall ADataMiningModel with Land and Ocean Variables as Predictorsrdquo in Pro-ceedings of the 11th International Conference on CommunicationNetworks ICCN 2015 pp 271ndash280 August 2015

[31] S Banerjee A Anura J Chakrabarty S Sengupta and J Chat-terjee ldquoIdentification and functional assessment of novel genesets towards better understanding of dysplasia associated oralcarcinogenesisrdquo Gene Reports vol 4 pp 131ndash138 2016

[32] H Yahyaoui and A Al-Mutairi ldquoA feature-based trust sequenceclassification algorithmrdquo Information Sciences vol 328 pp 455ndash484 2016

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

Computational Intelligence and Neuroscience 3

PCA transforms a set of feature columns in the datasetinto a projection of the feature space with lower dimen-sionality FA is a generalization of PCA the main differencebetween PCA and FA is that FA allows noise to have non-spherical shape while transforming the data The main goalof both PCA and FA is to transform the coordinate systemsuch that correlation between system variables is minimized[36]

There are several methods to decide how many factorshave to be extracted The most widely used method fordetermining the number of factors is using eigenvaluesgreater than one [10]

3 Proposed Model

Reservoirs are important domestically as well as in thenational defense and for economic development Thus thereservoir water levels should be forecast over a long periodof time and water resources should be planned and man-aged comprehensively to reach great cost-effectiveness Thispaper proposes a time-series forecasting model based on theimputation of missing values and variable selection First theproposed model used five imputation methods (ie medianof nearby points series mean mean of nearby points linearand regression imputation) It then compares these findingswith a delete strategy to estimate the missing values Secondby identifying the key variable that influences the daily waterlevels the proposed method ranked the importance of theatmospheric variables via factor analysis It then sequentiallyremoves the unimportant variables Finally the proposedmodel uses a Random Forest machine learning method tobuild a forecasting model of the reservoir water level tocompare it to other methods The proposed model could bepartitioned into four parts data preprocessing imputationand feature selection model building and accuracy evalua-tion The procedure is shown in Figure 1

31 Computational Step To understand the proposed modelmore easily this section partitioned the proposed model intofour steps

Step 1 (data preprocessing) The related water level and atmo-spheric data were collected from the reservoir administrationwebsite and the Taoyuan weather station The two collecteddatasets are concatenated into an integrated dataset basedon the date There are nine independent variables and onedependent variable in the integrated datasetThe variables aredefined in the integrated dataset and are listed in Table 1

Step 2 (imputation) After Step 1 we found some variablesand records with missing values in the integrated datasetdue to mechanical measurement failure or human operationerror Previous studies showed that deleting missing valuesdirectly will impact the results To identify that one thatbetter fits with the imputationmethod this paper utilized fiveimputation methods to estimate the missing values and thencompared it with no imputation method to directly deletethe missing value The five imputation methods were themedian of the nearby points series mean mean of nearby

Table 1 Description of variables in the research dataset

Output Shimen Reservoir daily discharge releaseInput Shimen Reservoir daily inflow dischargeTemperature Daily temperature in Daxi Taoyuan

Rainfall The previous day Shimen Reservoiraccumulated rainfall

Pressure Daily barometric pressure in DaxiTaoyuan

Relative Humidity Daily relative humidity in Daxi TaoyuanWind Speed Daily wind speed in Daxi TaoyuanDirection Daily wind direction in Daxi TaoyuanRainfall Dasi Daily rainfall in Daxi Taoyuan

points linear imputation and regression imputation Thisstudy had six processed datasets after processing the missingvalues problem The problem is then how to identify theimputationmethod that is a better fit to the integrated datasetTo determine this the following steps were followed

(i) In order to rescale all numeric values in the range[0 1] this step normalized each variable value bydividing the maximal value for the five imputeddatasets and then deleted the missing value datasetAll independent and dependent variables are positivevalues

(ii) The six normalized datasets are partitioned into 66training datasets and 34 testing datasets We alsoemployed a 10-fold cross-validation approach to iden-tify the imputation dataset that has best predictionperformance

(iii) We utilized five forecast methods including RandomForest RBF Network Kstar IBK (KNN) and Ran-dom Tree via five evaluation indices These includecorrelation coefficient (CC) root mean squared error(RMSE) mean absolute error (MAE) relative abso-lute error (RAE) and root relative squared error(RRSE) We identified the normalized dataset withthe smaller index value over five evaluation indices aswell as the more fit imputation Section 4 shows thatthe better imputation method is the mean of nearbypoints

Step 3 (variable selection and model building) Based onStep 2 this study will now determine the better imputationmethod (ie mean of nearby points) Then the importantproblem is to determine the key variables that influencethe reservoir water level Therefore this step utilized factoranalysis to rank the importance of the variables for buildingthe best forecast model Based on the ordering of variable wefirst deleted the least variable of importance We then builtthe forecast model via a Random Forest RBFNetwork KstarKNN and RandomTree Next we repeatedly delete the lowervariable of importance to build the forecast model until theRMSE can no longer improve After many iterative experi-ments this study used five evaluation indices to determine

4 Computational Intelligence and Neuroscience

ree imputation methods used to estimate the missing values

(1) Dataset imputation and normalization

(2) Partition into 66 training data and 34 testing data

(3) Select better imputation method based on CC RMSE RRSE MAE and RAE

(1) e imputed dataset partition into 66training data and 34 testing data

(2) Rank the importance of variables

(3) Delete the least important variable

(4) Determine key variables

Imputation

Variable selection and modelbuilding

Dataset 1

Concatenate datasets

Dataset 2

e research dataset

Data preprocessing

Evaluation andcomparison

Model evaluation and comparison(1) Correlation coecient(2) Root mean squared error(3) Root relative squared error(4) Mean absolute error(5) Relative absolute error

Figure 1 The procedure of proposed model

the best forecast methodThis step could be introduced step-by-step as follows

(i) The imputed integrated datasets are partitioned into66 training datasets and 34 testing datasets

(ii) Factor analysis ranked the importance of the vari-ables

(iii) The variable ranking of factor analysis was used toiteratively delete the least important variable Theremaining variables were studied with Random For-est RBF Network Kstar KNN and Random Treeuntil the RMSE can no longer improve

(iv) Based on the previous Step 3 the key variables arefound when the lowest RMSE is achieved

(v) Concurrently we used five evaluation indices (CCRMSE RRSE MAE and RAE) to determine whichforecast method is a good forecasting model

Step 4 (evaluation and comparison) To verify the perfor-mance of the reservoir water level forecastingmodel this stepuses the superior imputed datasets with different variablesselected to evaluate the proposed method It then comparesthe results with the listing methods This study used CCRMSE MAE RAE and RRSE to evaluate the forecast modeThe five criteria indices are listed as equations (1)ndash(5)

RMSE = radic 1119899119899sum119894=1

(119910119894minus 119910119894)2 (1)

Computational Intelligence and Neuroscience 5

Table 2 The partial collected data

Date Rainfall Input Output Rainfall Dasi Temperature Wind Speed Direction Pressure Relative Humidity Water level200811 01 836 0 102 47 65 10015 56 24409200812 01 9608 28624 0 104 63 38 10007 59 24393200813 0 8272 8217 0 145 45 50 9977 67 24381200814 0 13332 26222 0 153 34 40 9965 82 24378200815 0 1256 30594 0 16 29 46 9963 77 24355200816 03 9874 19233 0 167 12 170 9954 83 24332200817 0 1166 19276 0 184 19 46 9949 77 24334200818 0 9312 10973 0 199 13 311 9929 78 24333200819 0 10757 12398 0 199 22 11 9922 77 243232008110 0 6515 27674 0 196 16 357 9916 80 2432008111 0 5564 24909 0 215 13 185 9902 71 242782008112 0 9167 19181 0 198 42 37 9921 75 242742008113 09 10734 18222 15 142 71 39 9962 85 242532008114 52 8009 14662 1 127 71 36 9977 85 242492008115 4 8577 24382 0 135 71 35 9984 86 24238

where 119910119894is the actual observation value of the data 119910

119894is the

forecast value of the model and 119899 is the sample number

Correlation Coefficient (CC)

CC = sum119873119894=1(119909119894minus 119909) sdot (119910

119894minus 119910)

radicsum119873119894=1(119909119894minus 119909)2 sdot radicsum119873

119894=1(119910119894minus 119910)2 (2)

where 119909119894and 119910

119894are the observed and predicted values

respectively 119909 and 119910 are the mean of the observed andpredicted values

Root Relative Squared Error (RRSE)

RRSE = radicsum119899119894=1 (119909119894 minus 119910119894)2sum119899119894=1(119910119894minus 119910)2 (3)

where 119909119894is the predicted value 119910

119894is the actual value and 119910 is

the mean of the actual value

Mean Absolute Error (MAE)

MAE = 1119899 times119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816 (4)

Here 119899 is the number of observation datasets 119910119905is the

forecast value at time 119905 and 119910119905is the actual value at time 119905

Relative Absolute Error (RAE)

RAE = sum119899119894=1 1003816100381610038161003816119910119894 minus 1199101198941003816100381610038161003816sum119899119894=1

1003816100381610038161003816119910119894 minus 1199101198941003816100381610038161003816 (5)

where 119910119894is the actual observation value of the data 119910

119894is the

mean value of the 119910119894 119910119894is the forecast value of the model and119899 is the sample number

4 Experimental Results

This section verifies the performance of the proposed forecastmodel and compares the results with the listing meth-ods To determine which imputation method has the bestperformance for the collected dataset this study collecteddaily atmospheric data from the monitoring station andthe website of Water Resources Agency in Taiwan ShimenReservoirThis work also compares the proposed model withthe listing models withwithout variable selection

41 Experimental Data The research dataset consisted oftwo historical datasets one was collected form the websiteof Taiwan Water Resources Agency and the other was fromDasi monitoring station nearest to the Shimen ReservoirThe two datasets were collected from January 1 2008 toOctober 31 2015 The two collected data are concatenatedinto an integrated dataset based on the date There are nineindependent variables and one dependent variable in theintegrated dataset that has 2854 daily records The studymainly forecasts the water level of the reservoir The waterlevel is the dependent variable The independent variablesare Reservoir OUT Reservoir IN Temperature RainfallPressure Relative Humidity Wind Speed Direction andRainfall Dasi respectivelyThe detailed data types are shownas Table 2

42 Forecast and Comparison Based on the computationalstep in Section 3 this section will employ the practically col-lected dataset to illustrate the proposedmodel and compare itwith the listing method A detailed description is introducedin the following section

(1) To achieve better processing of the missing valuesdataset this study applies series mean regressionmean of nearby points linear or themedian of nearbypointsrsquo imputation methods to estimate missing val-ues It then directly deletes missing values to deter-mine which method has the best performance After

6 Computational Intelligence and Neuroscience

Table 3 The results of listing models with the five imputation methods under percentage spilt (dataset partition into 66 training data and34 testing data) before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0085 0546 0728 0288 0534MAE 0183 0133 0111 0186 0145RMSE 0227 0200 0157 0271 0226RAE 0998 0727 0607 1015 0789RRSE 0997 0879 0690 1193 0995

Serial mean

CC 0052 0557 0739 0198 0563MAE 0174 0123 0102 0191 0126RMSE 0222 0189 0151lowast 0283 0202RAE 1001 0705 0587 1098 0722RRSE 0999 0850 0679 1276 0908

Linear

CC 0054 0565 0734 0200 0512MAE 0175 0121 0101 0189 0138RMSE 0222 0188 0152 0281 0218RAE 1000 0690 0575lowast 1082 0787RRSE 0999 0844 0684 1264 0980

Near median

CC 0054 0571 0737 0227 0559MAE 0175 0120 0101 0188 0126RMSE 0222 0186 0152 0277 0202RAE 1000 0689 0577 1074 0719RRSE 0999 0838 0681 1244 0907

Near mean

CC 0053 0572 0740lowast 0232 0512MAE 0175 0121 0101lowast 0186 0132RMSE 0222 0186 0151lowast 0275 0217RAE 1000 0690 0575lowast 1062 0756RRSE 0999 0837 0678lowast 1235 0975

Regression

CC 0052 0564 0739 0200 0509MAE 0174 0121 0102 0191 0133RMSE 0222 0188 0151 0283 0216RAE 1001 0695 0586 1096 0762RRSE 0999 0845 0680 1275 0974

lowast denotes the best performance among 5 imputation methods

normalizing the six processedmissing values datasetswe used two approaches to estimate the datasets per-centage spilt (dataset partition into 66 training dataand 34 testing data) and 10-fold cross-validationThe two approaches employ Random Forest RBFNetwork Kstar KNN and Random Tree to forecastwater levels for evaluating the six processed missingvalues methods under five evaluation indices CCRMSE MAE RAE and RRSE The results are shownin Tables 3 and 4 and we can see that the mean ofthe nearby pointsrsquo method wins versus other methodsin CC MAE RAE RRSE and RMSE Therefore themean of the nearby pointsrsquo imputation method betterestimates the Shimen Reservoir water level

(2) Variable selection andmodel building uses the resultsfrom above The best imputation method is themean of nearby points Therefore this study will usethe imputed dataset of the mean of nearby points

to select the variable and build the model For vari-able selection this study utilizes factor analysis torank the importance of independent variables for animproved imputed dataset Table 5 indicates that theordering of important variables is Reservoir IN Tem-perature Reservoir OUT Pressure Rainfall RainfallDasi Relative Humidity Wind Speed and Direction

Next we determine the key variables and build aforecast model This study utilizes the ordering ofimportant variables and iteratively deletes the leastimportant variable to iteratively implement the pro-posed forecasting model when the minimal RMSEis reached First all independent variables are usedto build the water level forecasting model Secondthe least important variables are removed one-by-one The remaining variables serve as a forecastmodel until the minimal RMSE is reached Afterthese iterative experiments the optimal forecast

Computational Intelligence and Neuroscience 7

Table 4 The results of listing models with the five imputation methods under 10-folds cross-validation before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0041 0590 0737 0246 0505MAE 0184 0126 0109 0195 0143RMSE 0227 0188 0154 0281 0225RAE 1000 0682 0592 1059 0775RRSE 0999 0825 0678 1235 0986

Serial mean

CC 0038 0612 0755 0237 0574MAE 0171 0113 0098 0181 0125RMSE 0217 0175 0143lowast 0270 0202RAE 1001 0660 0575 1058 0731RRSE 0999 0802 0658 1241 0929

Linear

CC 0042 0615 0753 0243 0551MAE 0173 0112 0098 0181 0127RMSE 0218 0175 0144 0269 0207RAE 1000 0649 0568 1057 0736RRSE 0999 0800 0660 1233 0948

Near median

CC 0043 0614 0752 0251 0535MAE 0173 0113 0098 0180 0131RMSE 0218 0175 0144 0268 0211RAE 1000 0653 0568 1041 0757RRSE 0999 0801 0661 1227 0967

Near mean

CC 0043 0613 0756lowast 0250 0558MAE 0173 0113 0098lowast 0179 0125RMSE 0218 0175 0144 0268 0205RAE 1000 0654 0565lowast 1039 0725RRSE 0999 0802 0657lowast 1226 0937

Regression

CC 0038 0618 0754 0240 0522MAE 0171 0112 0098 0181 0133RMSE 0217 0174 0143lowast 0270 0214RAE 1001 0653 0574 1055 0778RRSE 0999 0798 0659 1239 0983

lowast denotes the best performance among 5 imputation methods

Table 5 The results of variable selection

Factor1 2 3

Input 983 072 164Output 918 037 071Rainfall 717 064 503Temperature 092 965 minus149Pressure minus254 minus844 minus182Wind Speed 096 minus377 052Direction 041 351 minus067Rainfall Dasi 290 068 693Relative Humidity 025 minus196 413Note Each factorrsquos highest factor loading appears in bold

model is achieved when the Wind Speed and Direc-tion are deleted The key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

As a rule of thumb we recommend interpreting onlyfactor loadingswith an absolute value greater than 05which explains around 25 of the variance [37] Wecan see that the loadings of the two deleted variablesare smaller than 05 as seen in Table 5 Finally thisstudy employs the Random Forest forecast methodbased on variable selection and full variables to builda forecast model for water level forecasting in ShimenReservoir respectively

(3) Model comparison this study compares the proposedforecast model (using Random Forest) with the Ran-dom Forest RBF Network Kstar IBK (KNN) andRandomTree forecast models (Tables 6 and 7) Tables6 and 7 show that before variable selection (full vari-ables) the Random Forest forecast model has the bestforecast performance in CC RMSE MAE RAE andRRSE indices After variable selection all five forecastmodels have improved forecast performance underCC RMSE MAE RAE and RRSE Therefore theresults show that variable selection could improve the

8 Computational Intelligence and Neuroscience

Table 6 The results of compare forecasting models under percentage spilt (dataset partition into 66 training data and 34 testing data)after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0033 0638a 0729a 0251 0545a

MAE 0182a 0121a 0111a 0199 0135a

RMSE 0229 0176a 0156a 0287 0212a

RAE 0992a 0657a 0602a 1083 0736a

RRSE 1007 0775a 0688a 1262 0935a

Series mean

CC 0107a 0661a 0739a 0242a 0551MAE 0172a 0107a 0101a 0179a 0129RMSE 0221a 0167a 0151a 0268a 0205RAE 0988a 0615a 0579a 1027a 0740RRSE 0995a 0753a 0678a 1208a 0923

Linear

CC 0105a 0666a 0735a 0258a 0596a

MAE 0173a 0106a 0100a 0175a 0120a

RMSE 0221a 0166a 0151a 0266a 0196a

RAE 0987a 0606a 0572a 1002a 0683a

RRSE 0995a 0748a 0681a 1198a 0883a

Median of nearby points

CC 0106a 0666a 0740a 0264a 0553MAE 0173a 0107a 0100a 0177a 0127RMSE 0221a 0166a 0151a 0266a 0207RAE 0987a 0611a 0571a 1013a 0723RRSE 0995a 0747a 0677a 1195a 0932

Mean of nearby points

CC 01059a 0667a 0745ab 0249a 0540a

MAE 0173a 0107a 0099ab 0179a 0129a

RMSE 0221a 0166a 0149ab 0268a 0214a

RAE 0987a 0611a 0565ab 1025a 0735a

RRSE 0995a 0747a 0672ab 1207a 0962a

Regression

CC 0107a 0663a 0739a 0242a 0559a

MAE 0172a 0106a 0101a 0179a 0126a

RMSE 0221a 0167a 0151a 0268a 0200a

RAE 0987a 0610a 0581a 1027a 0723a

RRSE 0994a 0752a 0678a 1207a 0900a

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

forecasting performance The Random Forest modelas applied to variable selection with full variables isbetter than the listing model

43 Findings After variable selection and model buildingsome key findings can be highlighted

(1) Imputation after the two collected datasets wereconcatenated into an integrated dataset there aremissing values in the integrated dataset due to humanerror or mechanical failure Tables 3 and 4 show theintegrated dataset that uses the median of nearbypoints series mean mean of nearby points linearregression imputation and the delete strategy to eval-uate their accuracy via five machine learning forecastmodels The results show that the integrated datasetthat uses the mean of the nearby pointsrsquo imputationmethod has better forecasting performance

(2) Variable selection this study uses factor analysis torank the ordering of variables and then sequen-tially deletes the least important variables until theforecasting performance no longer improves Tables6 and 7 show that the variable selection couldsignificantly improve the performance of all fiveforecasting models After iterative experiments andvariable selection the key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

(3) Forecasting model this study proposed a time-seriesforecastingmodel based on estimatingmissing valuesand variable selection to forecast the water level inthe reservoirThese experimental results indicate thatthe Random Forest forecasting model when appliedto variable selection with full variables has betterforecasting performance than the listing models in

Computational Intelligence and Neuroscience 9

Table 7 The results of compare forecasting models under 10-folds cross-validation after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0103a 0665a 0737a 0233 0529a

MAE 0181a 0115a 0108a 0193a 0143a

RMSE 0226a 0171a 0154a 0282a 0223a

RAE 0984a 0627a 0589a 1047a 0774a

RRSE 0994a 0749a 0677a 1238 0977a

Series mean

CC 0081a 0688a 0751a 0295a 0547MAE 0169a 0103a 0098a 0170a 0131RMSE 0217a 0158a 0144 0260a 0209RAE 0988a 0600a 0571a 0990a 0767RRSE 0996a 0727a 0661 1193a 0960

Linear

CC 0081a 0692a 0750 0286a 0551a

MAE 0171a 0102a 0098a 0169a 0128RMSE 0218a 0158a 0145 0261a 0207a

RAE 0988a 0590a 0566a 0981a 0740RRSE 0996a 0723a 0662 1196a 0948a

Median of nearby points

CC 0083a 0692a 0752a 0305a 0555a

MAE 0171a 0102a 0097ab 0169a 0126a

RMSE 0218a 0158a 0144a 0259a 0208a

RAE 0987a 0593a 0563ab 0980a 0732a

RRSE 0996a 0722a 0660a 1186a 0951a

Mean of nearby points

CC 0082a 0694a 0753b 0276a 0537MAE 0171a 0102a 0097ab 0171a 0129RMSE 0218a 0157a 0144ab 0263a 0210RAE 0988a 0593a 0564a 0993a 0747RRSE 0996a 0721a 0659b 1204a 0960

Regression

CC 0081a 0690a 0753b 0295a 0572MAE 0169a 0102a 0098a 0169a 0126RMSE 0217a 0158a 0144ab 0259a 0204RAE 0988a 0595a 0571a 0989a 0735RRSE 0996a 0725a 0659ab 1193a 0938

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

the five evaluation indices The proposed time-seriesforecasting model is feasible for forecasting waterlevels in Shimen Reservoir

5 Conclusion

This study proposed a time-series forecastingmodel for waterlevel forecasting in Taiwanrsquos Shimen Reservoir The experi-ments showed that the mean of nearby pointsrsquo imputationmethod has the best performanceThese experimental resultsindicate that the Random Forest forecasting model whenapplied to variable selection with full variables has betterforecasting performance than the listing model The keyvariables are Reservoir IN Temperature Reservoir OUTPressure Rainfall Rainfall Dasi and Relative Humidity Theproposed time-series forecasting model withwithout vari-able selection has better forecasting performance than thelisting models using the five evaluation indices This showsthat the proposed time-series forecastingmodel is feasible for

forecastingwater levels in ShimenReservoir Futureworkwilladdress the following issues

(1) The reservoirrsquos utility includes irrigation domesticwater supply and electricity generation The keyvariables identified here could improve forecasting inthese fields

(2) We might apply the proposed time-series forecastingmodel based on imputation and variable selection toforecast the water level of lakes salt water bodiesreservoirs and so on

In addition this experiment shows that the proposed variableselection can help determine five forecast methods used hereto improve the forecasting capability

Conflicts of Interest

The authors declare that they have no conflicts of interest

10 Computational Intelligence and Neuroscience

References

[1] F-J Chang and Y-T Chang ldquoAdaptive neuro-fuzzy inferencesystem for prediction of water level in reservoirrdquo Advances inWater Resources vol 29 no 1 pp 1ndash10 2006

[2] M K Tiwari and C Chatterjee ldquoDevelopment of an accurateand reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approachrdquo Journal ofHydrology vol 394 no 3-4 pp 458ndash470 2010

[3] C-S Chen Y-D Jhong T-Y Wu and S-T Chen ldquoTyphoonevent-based evolutionary fuzzy inference model for flood stageforecastingrdquo Journal of Hydrology vol 490 pp 134ndash143 2013

[4] F-J Chang P-A Chen Y-R Lu E Huang and K-Y ChangldquoReal-time multi-step-ahead water level forecasting by recur-rent neural networks for urban flood controlrdquo Journal ofHydrology vol 517 pp 836ndash846 2014

[5] A Hipni A El-shafie A Najah O A Karim A Hussain andM Mukhlisin ldquoDaily Forecasting of Dam Water Levels Com-paring a Support Vector Machine (SVM)Model With AdaptiveNeuro Fuzzy Inference System (ANFIS)rdquo Water ResourcesManagement vol 27 no 10 pp 3803ndash3823 2013

[6] Y Seo S Kim O Kisi and V P Singh ldquoDaily water level fore-casting using wavelet decomposition and artificial intelligencetechniquesrdquo Journal of Hydrology vol 520 pp 224ndash243 2015

[7] D S Broomhead and D Lowe ldquoMultivariable functional inter-polation and adaptive networksrdquo Complex Systems vol 2 no 3pp 321ndash355 1988

[8] K Ganapathy V Vaidehi and J B Chandrasekar ldquoOptimumsteepest descent higher level learning radial basis functionnetworkrdquo Expert Systems with Applications vol 42 no 21 pp8064ndash8077 2015

[9] P Brandstetter and M Kuchar ldquoSensorless control of variablespeed induction motor drive using RBF neural networkrdquoJournal of Applied Logic vol 24 no part A pp 97ndash108 2017

[10] C M Anish and B Majhi ldquoHybrid nonlinear adaptive schemefor stock market prediction using feedback FLANN and factoranalysisrdquo Journal of the Korean Statistical Society vol 45 no 1pp 64ndash76 2016

[11] M Pottmann and H P Jorgl ldquoRadial basis function networksfor internal model controlrdquo Applied Mathematics and Compu-tation vol 70 no 2-3 pp 283ndash298 1995

[12] I Morlini ldquoRadial basis function networks with partiallyclassified datardquo Ecological Modelling vol 120 no 2-3 pp 109ndash118 1999

[13] A Nawahda ldquoAn assessment of adding value of traffic informa-tion and other attributes as part of its classifiers in a dataminingtool set for predicting surface ozone levelsrdquo Process Safety andEnvironmental Protection vol 99 pp 149ndash158 2016

[14] N R Kasat and S D Thepade ldquoNovel Content Based ImageClassificationMethod Using LBGVector QuantizationMethodwith Bayes and Lazy Family Data Mining Classifiersrdquo in Pro-ceedings of the 7th International Conference on CommunicationComputing and Virtualization ICCCV 2016 pp 483ndash489February 2016

[15] E Męzyk and O Unold ldquoMachine learning approach to modelsport trainingrdquoComputers in Human Behavior vol 27 no 5 pp1499ndash1506 2011

[16] J Abawajy A Kelarev M Chowdhury A Stranieri and H FJelinek ldquoPredicting cardiac autonomic neuropathy category fordiabetic data with missing valuesrdquo Computers in Biology andMedicine vol 43 no 10 pp 1328ndash1333 2013

[17] M A Ayu S A Ismail A F A Matin and T Mantoro ldquoAcomparison study of classifier algorithms for mobile-phonersquosaccelerometer based activity recognitionrdquoProcedia Engineeringvol 41 pp 224ndash229 2012

[18] A Tamvakis V Trygonis J Miritzis G Tsirtsis and S SpatharisldquoOptimizing biodiversity prediction from abiotic parametersrdquoEnvironmentalModeling and Software vol 53 pp 112ndash120 2014

[19] C Brokamp R Jandarov M B Rao G LeMasters and PRyan ldquoExposure assessment models for elemental componentsof particulate matter in an urban environment a comparisonof regression and random forest approachesrdquo AtmosphericEnvironment vol 151 pp 1ndash11 2017

[20] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[21] P Yıldırım ldquoPattern Classification with Imbalanced and Mul-ticlass Data for the Prediction of Albendazole Adverse EventOutcomesrdquo in Proceedings of the 7th International Conference onAmbient Systems Networks and Technologies ANT 2016 and the6th International Conference on Sustainable Energy InformationTechnology SEIT 2016 pp 1013ndash1018 May 2016

[22] Y Zhao and Y Zhang ldquoComparison of decision tree methodsfor finding active objectsrdquo Advances in Space Research vol 41no 12 pp 1955ndash1959 2008

[23] KM A Kumar N Rajasimha M Reddy A Rajanarayana andK Nadgir ldquoAnalysis of usersrsquo Sentiments from Kannada WebDocumentsrdquo in Proceedings of the 11th International Conferenceon Communication Networks ICCN 2015 pp 247ndash256 August2015

[24] C ZhongW Pedrycz DWang L Li and Z Li ldquoGranular dataimputation A framework of Granular ComputingrdquoApplied SoftComputing vol 46 pp 307ndash316 2016

[25] C Genolini A Lacombe R Ecochard and F Subtil ldquoCopy-Mean A new method to predict monotone missing values inlongitudinal studiesrdquo Computer Methods and Programs in Bio-medicine vol 132 pp 29ndash44 2016

[26] M Majidpour C Qiu P Chu H R Pota and R GadhldquoForecasting the EV charging load based on customer profileor station measurementrdquo Applied Energy vol 163 pp 134ndash1412016

[27] O Cokluk and M Kayri ldquoThe effects of methods of imputationfor missing values on the validity and reliability of scalesrdquo inEducational Sciences Theory and Practice vol 11 pp 303ndash3092011

[28] S A Bashir and S W Duffy ldquoThe correction of risk estimatesformeasurement errorrdquoAnnals of Epidemiology vol 7 no 2 pp154ndash164 1997

[29] A Ozturk S Kayalıgil and N E Ozdemirel ldquoManufacturinglead time estimation using data miningrdquo European Journal ofOperational Research vol 173 no 2 pp 683ndash700 2006

[30] H Vathsala and S G Koolagudi ldquoClosed Item-Set Mining forPrediction of Indian SummerMonsoon Rainfall ADataMiningModel with Land and Ocean Variables as Predictorsrdquo in Pro-ceedings of the 11th International Conference on CommunicationNetworks ICCN 2015 pp 271ndash280 August 2015

[31] S Banerjee A Anura J Chakrabarty S Sengupta and J Chat-terjee ldquoIdentification and functional assessment of novel genesets towards better understanding of dysplasia associated oralcarcinogenesisrdquo Gene Reports vol 4 pp 131ndash138 2016

[32] H Yahyaoui and A Al-Mutairi ldquoA feature-based trust sequenceclassification algorithmrdquo Information Sciences vol 328 pp 455ndash484 2016

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

4 Computational Intelligence and Neuroscience

ree imputation methods used to estimate the missing values

(1) Dataset imputation and normalization

(2) Partition into 66 training data and 34 testing data

(3) Select better imputation method based on CC RMSE RRSE MAE and RAE

(1) e imputed dataset partition into 66training data and 34 testing data

(2) Rank the importance of variables

(3) Delete the least important variable

(4) Determine key variables

Imputation

Variable selection and modelbuilding

Dataset 1

Concatenate datasets

Dataset 2

e research dataset

Data preprocessing

Evaluation andcomparison

Model evaluation and comparison(1) Correlation coecient(2) Root mean squared error(3) Root relative squared error(4) Mean absolute error(5) Relative absolute error

Figure 1 The procedure of proposed model

the best forecast methodThis step could be introduced step-by-step as follows

(i) The imputed integrated datasets are partitioned into66 training datasets and 34 testing datasets

(ii) Factor analysis ranked the importance of the vari-ables

(iii) The variable ranking of factor analysis was used toiteratively delete the least important variable Theremaining variables were studied with Random For-est RBF Network Kstar KNN and Random Treeuntil the RMSE can no longer improve

(iv) Based on the previous Step 3 the key variables arefound when the lowest RMSE is achieved

(v) Concurrently we used five evaluation indices (CCRMSE RRSE MAE and RAE) to determine whichforecast method is a good forecasting model

Step 4 (evaluation and comparison) To verify the perfor-mance of the reservoir water level forecastingmodel this stepuses the superior imputed datasets with different variablesselected to evaluate the proposed method It then comparesthe results with the listing methods This study used CCRMSE MAE RAE and RRSE to evaluate the forecast modeThe five criteria indices are listed as equations (1)ndash(5)

RMSE = radic 1119899119899sum119894=1

(119910119894minus 119910119894)2 (1)

Computational Intelligence and Neuroscience 5

Table 2 The partial collected data

Date Rainfall Input Output Rainfall Dasi Temperature Wind Speed Direction Pressure Relative Humidity Water level200811 01 836 0 102 47 65 10015 56 24409200812 01 9608 28624 0 104 63 38 10007 59 24393200813 0 8272 8217 0 145 45 50 9977 67 24381200814 0 13332 26222 0 153 34 40 9965 82 24378200815 0 1256 30594 0 16 29 46 9963 77 24355200816 03 9874 19233 0 167 12 170 9954 83 24332200817 0 1166 19276 0 184 19 46 9949 77 24334200818 0 9312 10973 0 199 13 311 9929 78 24333200819 0 10757 12398 0 199 22 11 9922 77 243232008110 0 6515 27674 0 196 16 357 9916 80 2432008111 0 5564 24909 0 215 13 185 9902 71 242782008112 0 9167 19181 0 198 42 37 9921 75 242742008113 09 10734 18222 15 142 71 39 9962 85 242532008114 52 8009 14662 1 127 71 36 9977 85 242492008115 4 8577 24382 0 135 71 35 9984 86 24238

where 119910119894is the actual observation value of the data 119910

119894is the

forecast value of the model and 119899 is the sample number

Correlation Coefficient (CC)

CC = sum119873119894=1(119909119894minus 119909) sdot (119910

119894minus 119910)

radicsum119873119894=1(119909119894minus 119909)2 sdot radicsum119873

119894=1(119910119894minus 119910)2 (2)

where 119909119894and 119910

119894are the observed and predicted values

respectively 119909 and 119910 are the mean of the observed andpredicted values

Root Relative Squared Error (RRSE)

RRSE = radicsum119899119894=1 (119909119894 minus 119910119894)2sum119899119894=1(119910119894minus 119910)2 (3)

where 119909119894is the predicted value 119910

119894is the actual value and 119910 is

the mean of the actual value

Mean Absolute Error (MAE)

MAE = 1119899 times119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816 (4)

Here 119899 is the number of observation datasets 119910119905is the

forecast value at time 119905 and 119910119905is the actual value at time 119905

Relative Absolute Error (RAE)

RAE = sum119899119894=1 1003816100381610038161003816119910119894 minus 1199101198941003816100381610038161003816sum119899119894=1

1003816100381610038161003816119910119894 minus 1199101198941003816100381610038161003816 (5)

where 119910119894is the actual observation value of the data 119910

119894is the

mean value of the 119910119894 119910119894is the forecast value of the model and119899 is the sample number

4 Experimental Results

This section verifies the performance of the proposed forecastmodel and compares the results with the listing meth-ods To determine which imputation method has the bestperformance for the collected dataset this study collecteddaily atmospheric data from the monitoring station andthe website of Water Resources Agency in Taiwan ShimenReservoirThis work also compares the proposed model withthe listing models withwithout variable selection

41 Experimental Data The research dataset consisted oftwo historical datasets one was collected form the websiteof Taiwan Water Resources Agency and the other was fromDasi monitoring station nearest to the Shimen ReservoirThe two datasets were collected from January 1 2008 toOctober 31 2015 The two collected data are concatenatedinto an integrated dataset based on the date There are nineindependent variables and one dependent variable in theintegrated dataset that has 2854 daily records The studymainly forecasts the water level of the reservoir The waterlevel is the dependent variable The independent variablesare Reservoir OUT Reservoir IN Temperature RainfallPressure Relative Humidity Wind Speed Direction andRainfall Dasi respectivelyThe detailed data types are shownas Table 2

42 Forecast and Comparison Based on the computationalstep in Section 3 this section will employ the practically col-lected dataset to illustrate the proposedmodel and compare itwith the listing method A detailed description is introducedin the following section

(1) To achieve better processing of the missing valuesdataset this study applies series mean regressionmean of nearby points linear or themedian of nearbypointsrsquo imputation methods to estimate missing val-ues It then directly deletes missing values to deter-mine which method has the best performance After

6 Computational Intelligence and Neuroscience

Table 3 The results of listing models with the five imputation methods under percentage spilt (dataset partition into 66 training data and34 testing data) before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0085 0546 0728 0288 0534MAE 0183 0133 0111 0186 0145RMSE 0227 0200 0157 0271 0226RAE 0998 0727 0607 1015 0789RRSE 0997 0879 0690 1193 0995

Serial mean

CC 0052 0557 0739 0198 0563MAE 0174 0123 0102 0191 0126RMSE 0222 0189 0151lowast 0283 0202RAE 1001 0705 0587 1098 0722RRSE 0999 0850 0679 1276 0908

Linear

CC 0054 0565 0734 0200 0512MAE 0175 0121 0101 0189 0138RMSE 0222 0188 0152 0281 0218RAE 1000 0690 0575lowast 1082 0787RRSE 0999 0844 0684 1264 0980

Near median

CC 0054 0571 0737 0227 0559MAE 0175 0120 0101 0188 0126RMSE 0222 0186 0152 0277 0202RAE 1000 0689 0577 1074 0719RRSE 0999 0838 0681 1244 0907

Near mean

CC 0053 0572 0740lowast 0232 0512MAE 0175 0121 0101lowast 0186 0132RMSE 0222 0186 0151lowast 0275 0217RAE 1000 0690 0575lowast 1062 0756RRSE 0999 0837 0678lowast 1235 0975

Regression

CC 0052 0564 0739 0200 0509MAE 0174 0121 0102 0191 0133RMSE 0222 0188 0151 0283 0216RAE 1001 0695 0586 1096 0762RRSE 0999 0845 0680 1275 0974

lowast denotes the best performance among 5 imputation methods

normalizing the six processedmissing values datasetswe used two approaches to estimate the datasets per-centage spilt (dataset partition into 66 training dataand 34 testing data) and 10-fold cross-validationThe two approaches employ Random Forest RBFNetwork Kstar KNN and Random Tree to forecastwater levels for evaluating the six processed missingvalues methods under five evaluation indices CCRMSE MAE RAE and RRSE The results are shownin Tables 3 and 4 and we can see that the mean ofthe nearby pointsrsquo method wins versus other methodsin CC MAE RAE RRSE and RMSE Therefore themean of the nearby pointsrsquo imputation method betterestimates the Shimen Reservoir water level

(2) Variable selection andmodel building uses the resultsfrom above The best imputation method is themean of nearby points Therefore this study will usethe imputed dataset of the mean of nearby points

to select the variable and build the model For vari-able selection this study utilizes factor analysis torank the importance of independent variables for animproved imputed dataset Table 5 indicates that theordering of important variables is Reservoir IN Tem-perature Reservoir OUT Pressure Rainfall RainfallDasi Relative Humidity Wind Speed and Direction

Next we determine the key variables and build aforecast model This study utilizes the ordering ofimportant variables and iteratively deletes the leastimportant variable to iteratively implement the pro-posed forecasting model when the minimal RMSEis reached First all independent variables are usedto build the water level forecasting model Secondthe least important variables are removed one-by-one The remaining variables serve as a forecastmodel until the minimal RMSE is reached Afterthese iterative experiments the optimal forecast

Computational Intelligence and Neuroscience 7

Table 4 The results of listing models with the five imputation methods under 10-folds cross-validation before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0041 0590 0737 0246 0505MAE 0184 0126 0109 0195 0143RMSE 0227 0188 0154 0281 0225RAE 1000 0682 0592 1059 0775RRSE 0999 0825 0678 1235 0986

Serial mean

CC 0038 0612 0755 0237 0574MAE 0171 0113 0098 0181 0125RMSE 0217 0175 0143lowast 0270 0202RAE 1001 0660 0575 1058 0731RRSE 0999 0802 0658 1241 0929

Linear

CC 0042 0615 0753 0243 0551MAE 0173 0112 0098 0181 0127RMSE 0218 0175 0144 0269 0207RAE 1000 0649 0568 1057 0736RRSE 0999 0800 0660 1233 0948

Near median

CC 0043 0614 0752 0251 0535MAE 0173 0113 0098 0180 0131RMSE 0218 0175 0144 0268 0211RAE 1000 0653 0568 1041 0757RRSE 0999 0801 0661 1227 0967

Near mean

CC 0043 0613 0756lowast 0250 0558MAE 0173 0113 0098lowast 0179 0125RMSE 0218 0175 0144 0268 0205RAE 1000 0654 0565lowast 1039 0725RRSE 0999 0802 0657lowast 1226 0937

Regression

CC 0038 0618 0754 0240 0522MAE 0171 0112 0098 0181 0133RMSE 0217 0174 0143lowast 0270 0214RAE 1001 0653 0574 1055 0778RRSE 0999 0798 0659 1239 0983

lowast denotes the best performance among 5 imputation methods

Table 5 The results of variable selection

Factor1 2 3

Input 983 072 164Output 918 037 071Rainfall 717 064 503Temperature 092 965 minus149Pressure minus254 minus844 minus182Wind Speed 096 minus377 052Direction 041 351 minus067Rainfall Dasi 290 068 693Relative Humidity 025 minus196 413Note Each factorrsquos highest factor loading appears in bold

model is achieved when the Wind Speed and Direc-tion are deleted The key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

As a rule of thumb we recommend interpreting onlyfactor loadingswith an absolute value greater than 05which explains around 25 of the variance [37] Wecan see that the loadings of the two deleted variablesare smaller than 05 as seen in Table 5 Finally thisstudy employs the Random Forest forecast methodbased on variable selection and full variables to builda forecast model for water level forecasting in ShimenReservoir respectively

(3) Model comparison this study compares the proposedforecast model (using Random Forest) with the Ran-dom Forest RBF Network Kstar IBK (KNN) andRandomTree forecast models (Tables 6 and 7) Tables6 and 7 show that before variable selection (full vari-ables) the Random Forest forecast model has the bestforecast performance in CC RMSE MAE RAE andRRSE indices After variable selection all five forecastmodels have improved forecast performance underCC RMSE MAE RAE and RRSE Therefore theresults show that variable selection could improve the

8 Computational Intelligence and Neuroscience

Table 6 The results of compare forecasting models under percentage spilt (dataset partition into 66 training data and 34 testing data)after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0033 0638a 0729a 0251 0545a

MAE 0182a 0121a 0111a 0199 0135a

RMSE 0229 0176a 0156a 0287 0212a

RAE 0992a 0657a 0602a 1083 0736a

RRSE 1007 0775a 0688a 1262 0935a

Series mean

CC 0107a 0661a 0739a 0242a 0551MAE 0172a 0107a 0101a 0179a 0129RMSE 0221a 0167a 0151a 0268a 0205RAE 0988a 0615a 0579a 1027a 0740RRSE 0995a 0753a 0678a 1208a 0923

Linear

CC 0105a 0666a 0735a 0258a 0596a

MAE 0173a 0106a 0100a 0175a 0120a

RMSE 0221a 0166a 0151a 0266a 0196a

RAE 0987a 0606a 0572a 1002a 0683a

RRSE 0995a 0748a 0681a 1198a 0883a

Median of nearby points

CC 0106a 0666a 0740a 0264a 0553MAE 0173a 0107a 0100a 0177a 0127RMSE 0221a 0166a 0151a 0266a 0207RAE 0987a 0611a 0571a 1013a 0723RRSE 0995a 0747a 0677a 1195a 0932

Mean of nearby points

CC 01059a 0667a 0745ab 0249a 0540a

MAE 0173a 0107a 0099ab 0179a 0129a

RMSE 0221a 0166a 0149ab 0268a 0214a

RAE 0987a 0611a 0565ab 1025a 0735a

RRSE 0995a 0747a 0672ab 1207a 0962a

Regression

CC 0107a 0663a 0739a 0242a 0559a

MAE 0172a 0106a 0101a 0179a 0126a

RMSE 0221a 0167a 0151a 0268a 0200a

RAE 0987a 0610a 0581a 1027a 0723a

RRSE 0994a 0752a 0678a 1207a 0900a

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

forecasting performance The Random Forest modelas applied to variable selection with full variables isbetter than the listing model

43 Findings After variable selection and model buildingsome key findings can be highlighted

(1) Imputation after the two collected datasets wereconcatenated into an integrated dataset there aremissing values in the integrated dataset due to humanerror or mechanical failure Tables 3 and 4 show theintegrated dataset that uses the median of nearbypoints series mean mean of nearby points linearregression imputation and the delete strategy to eval-uate their accuracy via five machine learning forecastmodels The results show that the integrated datasetthat uses the mean of the nearby pointsrsquo imputationmethod has better forecasting performance

(2) Variable selection this study uses factor analysis torank the ordering of variables and then sequen-tially deletes the least important variables until theforecasting performance no longer improves Tables6 and 7 show that the variable selection couldsignificantly improve the performance of all fiveforecasting models After iterative experiments andvariable selection the key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

(3) Forecasting model this study proposed a time-seriesforecastingmodel based on estimatingmissing valuesand variable selection to forecast the water level inthe reservoirThese experimental results indicate thatthe Random Forest forecasting model when appliedto variable selection with full variables has betterforecasting performance than the listing models in

Computational Intelligence and Neuroscience 9

Table 7 The results of compare forecasting models under 10-folds cross-validation after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0103a 0665a 0737a 0233 0529a

MAE 0181a 0115a 0108a 0193a 0143a

RMSE 0226a 0171a 0154a 0282a 0223a

RAE 0984a 0627a 0589a 1047a 0774a

RRSE 0994a 0749a 0677a 1238 0977a

Series mean

CC 0081a 0688a 0751a 0295a 0547MAE 0169a 0103a 0098a 0170a 0131RMSE 0217a 0158a 0144 0260a 0209RAE 0988a 0600a 0571a 0990a 0767RRSE 0996a 0727a 0661 1193a 0960

Linear

CC 0081a 0692a 0750 0286a 0551a

MAE 0171a 0102a 0098a 0169a 0128RMSE 0218a 0158a 0145 0261a 0207a

RAE 0988a 0590a 0566a 0981a 0740RRSE 0996a 0723a 0662 1196a 0948a

Median of nearby points

CC 0083a 0692a 0752a 0305a 0555a

MAE 0171a 0102a 0097ab 0169a 0126a

RMSE 0218a 0158a 0144a 0259a 0208a

RAE 0987a 0593a 0563ab 0980a 0732a

RRSE 0996a 0722a 0660a 1186a 0951a

Mean of nearby points

CC 0082a 0694a 0753b 0276a 0537MAE 0171a 0102a 0097ab 0171a 0129RMSE 0218a 0157a 0144ab 0263a 0210RAE 0988a 0593a 0564a 0993a 0747RRSE 0996a 0721a 0659b 1204a 0960

Regression

CC 0081a 0690a 0753b 0295a 0572MAE 0169a 0102a 0098a 0169a 0126RMSE 0217a 0158a 0144ab 0259a 0204RAE 0988a 0595a 0571a 0989a 0735RRSE 0996a 0725a 0659ab 1193a 0938

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

the five evaluation indices The proposed time-seriesforecasting model is feasible for forecasting waterlevels in Shimen Reservoir

5 Conclusion

This study proposed a time-series forecastingmodel for waterlevel forecasting in Taiwanrsquos Shimen Reservoir The experi-ments showed that the mean of nearby pointsrsquo imputationmethod has the best performanceThese experimental resultsindicate that the Random Forest forecasting model whenapplied to variable selection with full variables has betterforecasting performance than the listing model The keyvariables are Reservoir IN Temperature Reservoir OUTPressure Rainfall Rainfall Dasi and Relative Humidity Theproposed time-series forecasting model withwithout vari-able selection has better forecasting performance than thelisting models using the five evaluation indices This showsthat the proposed time-series forecastingmodel is feasible for

forecastingwater levels in ShimenReservoir Futureworkwilladdress the following issues

(1) The reservoirrsquos utility includes irrigation domesticwater supply and electricity generation The keyvariables identified here could improve forecasting inthese fields

(2) We might apply the proposed time-series forecastingmodel based on imputation and variable selection toforecast the water level of lakes salt water bodiesreservoirs and so on

In addition this experiment shows that the proposed variableselection can help determine five forecast methods used hereto improve the forecasting capability

Conflicts of Interest

The authors declare that they have no conflicts of interest

10 Computational Intelligence and Neuroscience

References

[1] F-J Chang and Y-T Chang ldquoAdaptive neuro-fuzzy inferencesystem for prediction of water level in reservoirrdquo Advances inWater Resources vol 29 no 1 pp 1ndash10 2006

[2] M K Tiwari and C Chatterjee ldquoDevelopment of an accurateand reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approachrdquo Journal ofHydrology vol 394 no 3-4 pp 458ndash470 2010

[3] C-S Chen Y-D Jhong T-Y Wu and S-T Chen ldquoTyphoonevent-based evolutionary fuzzy inference model for flood stageforecastingrdquo Journal of Hydrology vol 490 pp 134ndash143 2013

[4] F-J Chang P-A Chen Y-R Lu E Huang and K-Y ChangldquoReal-time multi-step-ahead water level forecasting by recur-rent neural networks for urban flood controlrdquo Journal ofHydrology vol 517 pp 836ndash846 2014

[5] A Hipni A El-shafie A Najah O A Karim A Hussain andM Mukhlisin ldquoDaily Forecasting of Dam Water Levels Com-paring a Support Vector Machine (SVM)Model With AdaptiveNeuro Fuzzy Inference System (ANFIS)rdquo Water ResourcesManagement vol 27 no 10 pp 3803ndash3823 2013

[6] Y Seo S Kim O Kisi and V P Singh ldquoDaily water level fore-casting using wavelet decomposition and artificial intelligencetechniquesrdquo Journal of Hydrology vol 520 pp 224ndash243 2015

[7] D S Broomhead and D Lowe ldquoMultivariable functional inter-polation and adaptive networksrdquo Complex Systems vol 2 no 3pp 321ndash355 1988

[8] K Ganapathy V Vaidehi and J B Chandrasekar ldquoOptimumsteepest descent higher level learning radial basis functionnetworkrdquo Expert Systems with Applications vol 42 no 21 pp8064ndash8077 2015

[9] P Brandstetter and M Kuchar ldquoSensorless control of variablespeed induction motor drive using RBF neural networkrdquoJournal of Applied Logic vol 24 no part A pp 97ndash108 2017

[10] C M Anish and B Majhi ldquoHybrid nonlinear adaptive schemefor stock market prediction using feedback FLANN and factoranalysisrdquo Journal of the Korean Statistical Society vol 45 no 1pp 64ndash76 2016

[11] M Pottmann and H P Jorgl ldquoRadial basis function networksfor internal model controlrdquo Applied Mathematics and Compu-tation vol 70 no 2-3 pp 283ndash298 1995

[12] I Morlini ldquoRadial basis function networks with partiallyclassified datardquo Ecological Modelling vol 120 no 2-3 pp 109ndash118 1999

[13] A Nawahda ldquoAn assessment of adding value of traffic informa-tion and other attributes as part of its classifiers in a dataminingtool set for predicting surface ozone levelsrdquo Process Safety andEnvironmental Protection vol 99 pp 149ndash158 2016

[14] N R Kasat and S D Thepade ldquoNovel Content Based ImageClassificationMethod Using LBGVector QuantizationMethodwith Bayes and Lazy Family Data Mining Classifiersrdquo in Pro-ceedings of the 7th International Conference on CommunicationComputing and Virtualization ICCCV 2016 pp 483ndash489February 2016

[15] E Męzyk and O Unold ldquoMachine learning approach to modelsport trainingrdquoComputers in Human Behavior vol 27 no 5 pp1499ndash1506 2011

[16] J Abawajy A Kelarev M Chowdhury A Stranieri and H FJelinek ldquoPredicting cardiac autonomic neuropathy category fordiabetic data with missing valuesrdquo Computers in Biology andMedicine vol 43 no 10 pp 1328ndash1333 2013

[17] M A Ayu S A Ismail A F A Matin and T Mantoro ldquoAcomparison study of classifier algorithms for mobile-phonersquosaccelerometer based activity recognitionrdquoProcedia Engineeringvol 41 pp 224ndash229 2012

[18] A Tamvakis V Trygonis J Miritzis G Tsirtsis and S SpatharisldquoOptimizing biodiversity prediction from abiotic parametersrdquoEnvironmentalModeling and Software vol 53 pp 112ndash120 2014

[19] C Brokamp R Jandarov M B Rao G LeMasters and PRyan ldquoExposure assessment models for elemental componentsof particulate matter in an urban environment a comparisonof regression and random forest approachesrdquo AtmosphericEnvironment vol 151 pp 1ndash11 2017

[20] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[21] P Yıldırım ldquoPattern Classification with Imbalanced and Mul-ticlass Data for the Prediction of Albendazole Adverse EventOutcomesrdquo in Proceedings of the 7th International Conference onAmbient Systems Networks and Technologies ANT 2016 and the6th International Conference on Sustainable Energy InformationTechnology SEIT 2016 pp 1013ndash1018 May 2016

[22] Y Zhao and Y Zhang ldquoComparison of decision tree methodsfor finding active objectsrdquo Advances in Space Research vol 41no 12 pp 1955ndash1959 2008

[23] KM A Kumar N Rajasimha M Reddy A Rajanarayana andK Nadgir ldquoAnalysis of usersrsquo Sentiments from Kannada WebDocumentsrdquo in Proceedings of the 11th International Conferenceon Communication Networks ICCN 2015 pp 247ndash256 August2015

[24] C ZhongW Pedrycz DWang L Li and Z Li ldquoGranular dataimputation A framework of Granular ComputingrdquoApplied SoftComputing vol 46 pp 307ndash316 2016

[25] C Genolini A Lacombe R Ecochard and F Subtil ldquoCopy-Mean A new method to predict monotone missing values inlongitudinal studiesrdquo Computer Methods and Programs in Bio-medicine vol 132 pp 29ndash44 2016

[26] M Majidpour C Qiu P Chu H R Pota and R GadhldquoForecasting the EV charging load based on customer profileor station measurementrdquo Applied Energy vol 163 pp 134ndash1412016

[27] O Cokluk and M Kayri ldquoThe effects of methods of imputationfor missing values on the validity and reliability of scalesrdquo inEducational Sciences Theory and Practice vol 11 pp 303ndash3092011

[28] S A Bashir and S W Duffy ldquoThe correction of risk estimatesformeasurement errorrdquoAnnals of Epidemiology vol 7 no 2 pp154ndash164 1997

[29] A Ozturk S Kayalıgil and N E Ozdemirel ldquoManufacturinglead time estimation using data miningrdquo European Journal ofOperational Research vol 173 no 2 pp 683ndash700 2006

[30] H Vathsala and S G Koolagudi ldquoClosed Item-Set Mining forPrediction of Indian SummerMonsoon Rainfall ADataMiningModel with Land and Ocean Variables as Predictorsrdquo in Pro-ceedings of the 11th International Conference on CommunicationNetworks ICCN 2015 pp 271ndash280 August 2015

[31] S Banerjee A Anura J Chakrabarty S Sengupta and J Chat-terjee ldquoIdentification and functional assessment of novel genesets towards better understanding of dysplasia associated oralcarcinogenesisrdquo Gene Reports vol 4 pp 131ndash138 2016

[32] H Yahyaoui and A Al-Mutairi ldquoA feature-based trust sequenceclassification algorithmrdquo Information Sciences vol 328 pp 455ndash484 2016

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

Computational Intelligence and Neuroscience 5

Table 2 The partial collected data

Date Rainfall Input Output Rainfall Dasi Temperature Wind Speed Direction Pressure Relative Humidity Water level200811 01 836 0 102 47 65 10015 56 24409200812 01 9608 28624 0 104 63 38 10007 59 24393200813 0 8272 8217 0 145 45 50 9977 67 24381200814 0 13332 26222 0 153 34 40 9965 82 24378200815 0 1256 30594 0 16 29 46 9963 77 24355200816 03 9874 19233 0 167 12 170 9954 83 24332200817 0 1166 19276 0 184 19 46 9949 77 24334200818 0 9312 10973 0 199 13 311 9929 78 24333200819 0 10757 12398 0 199 22 11 9922 77 243232008110 0 6515 27674 0 196 16 357 9916 80 2432008111 0 5564 24909 0 215 13 185 9902 71 242782008112 0 9167 19181 0 198 42 37 9921 75 242742008113 09 10734 18222 15 142 71 39 9962 85 242532008114 52 8009 14662 1 127 71 36 9977 85 242492008115 4 8577 24382 0 135 71 35 9984 86 24238

where 119910119894is the actual observation value of the data 119910

119894is the

forecast value of the model and 119899 is the sample number

Correlation Coefficient (CC)

CC = sum119873119894=1(119909119894minus 119909) sdot (119910

119894minus 119910)

radicsum119873119894=1(119909119894minus 119909)2 sdot radicsum119873

119894=1(119910119894minus 119910)2 (2)

where 119909119894and 119910

119894are the observed and predicted values

respectively 119909 and 119910 are the mean of the observed andpredicted values

Root Relative Squared Error (RRSE)

RRSE = radicsum119899119894=1 (119909119894 minus 119910119894)2sum119899119894=1(119910119894minus 119910)2 (3)

where 119909119894is the predicted value 119910

119894is the actual value and 119910 is

the mean of the actual value

Mean Absolute Error (MAE)

MAE = 1119899 times119873sum119894=1

1003816100381610038161003816119910119905 minus 1199101199051003816100381610038161003816 (4)

Here 119899 is the number of observation datasets 119910119905is the

forecast value at time 119905 and 119910119905is the actual value at time 119905

Relative Absolute Error (RAE)

RAE = sum119899119894=1 1003816100381610038161003816119910119894 minus 1199101198941003816100381610038161003816sum119899119894=1

1003816100381610038161003816119910119894 minus 1199101198941003816100381610038161003816 (5)

where 119910119894is the actual observation value of the data 119910

119894is the

mean value of the 119910119894 119910119894is the forecast value of the model and119899 is the sample number

4 Experimental Results

This section verifies the performance of the proposed forecastmodel and compares the results with the listing meth-ods To determine which imputation method has the bestperformance for the collected dataset this study collecteddaily atmospheric data from the monitoring station andthe website of Water Resources Agency in Taiwan ShimenReservoirThis work also compares the proposed model withthe listing models withwithout variable selection

41 Experimental Data The research dataset consisted oftwo historical datasets one was collected form the websiteof Taiwan Water Resources Agency and the other was fromDasi monitoring station nearest to the Shimen ReservoirThe two datasets were collected from January 1 2008 toOctober 31 2015 The two collected data are concatenatedinto an integrated dataset based on the date There are nineindependent variables and one dependent variable in theintegrated dataset that has 2854 daily records The studymainly forecasts the water level of the reservoir The waterlevel is the dependent variable The independent variablesare Reservoir OUT Reservoir IN Temperature RainfallPressure Relative Humidity Wind Speed Direction andRainfall Dasi respectivelyThe detailed data types are shownas Table 2

42 Forecast and Comparison Based on the computationalstep in Section 3 this section will employ the practically col-lected dataset to illustrate the proposedmodel and compare itwith the listing method A detailed description is introducedin the following section

(1) To achieve better processing of the missing valuesdataset this study applies series mean regressionmean of nearby points linear or themedian of nearbypointsrsquo imputation methods to estimate missing val-ues It then directly deletes missing values to deter-mine which method has the best performance After

6 Computational Intelligence and Neuroscience

Table 3 The results of listing models with the five imputation methods under percentage spilt (dataset partition into 66 training data and34 testing data) before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0085 0546 0728 0288 0534MAE 0183 0133 0111 0186 0145RMSE 0227 0200 0157 0271 0226RAE 0998 0727 0607 1015 0789RRSE 0997 0879 0690 1193 0995

Serial mean

CC 0052 0557 0739 0198 0563MAE 0174 0123 0102 0191 0126RMSE 0222 0189 0151lowast 0283 0202RAE 1001 0705 0587 1098 0722RRSE 0999 0850 0679 1276 0908

Linear

CC 0054 0565 0734 0200 0512MAE 0175 0121 0101 0189 0138RMSE 0222 0188 0152 0281 0218RAE 1000 0690 0575lowast 1082 0787RRSE 0999 0844 0684 1264 0980

Near median

CC 0054 0571 0737 0227 0559MAE 0175 0120 0101 0188 0126RMSE 0222 0186 0152 0277 0202RAE 1000 0689 0577 1074 0719RRSE 0999 0838 0681 1244 0907

Near mean

CC 0053 0572 0740lowast 0232 0512MAE 0175 0121 0101lowast 0186 0132RMSE 0222 0186 0151lowast 0275 0217RAE 1000 0690 0575lowast 1062 0756RRSE 0999 0837 0678lowast 1235 0975

Regression

CC 0052 0564 0739 0200 0509MAE 0174 0121 0102 0191 0133RMSE 0222 0188 0151 0283 0216RAE 1001 0695 0586 1096 0762RRSE 0999 0845 0680 1275 0974

lowast denotes the best performance among 5 imputation methods

normalizing the six processedmissing values datasetswe used two approaches to estimate the datasets per-centage spilt (dataset partition into 66 training dataand 34 testing data) and 10-fold cross-validationThe two approaches employ Random Forest RBFNetwork Kstar KNN and Random Tree to forecastwater levels for evaluating the six processed missingvalues methods under five evaluation indices CCRMSE MAE RAE and RRSE The results are shownin Tables 3 and 4 and we can see that the mean ofthe nearby pointsrsquo method wins versus other methodsin CC MAE RAE RRSE and RMSE Therefore themean of the nearby pointsrsquo imputation method betterestimates the Shimen Reservoir water level

(2) Variable selection andmodel building uses the resultsfrom above The best imputation method is themean of nearby points Therefore this study will usethe imputed dataset of the mean of nearby points

to select the variable and build the model For vari-able selection this study utilizes factor analysis torank the importance of independent variables for animproved imputed dataset Table 5 indicates that theordering of important variables is Reservoir IN Tem-perature Reservoir OUT Pressure Rainfall RainfallDasi Relative Humidity Wind Speed and Direction

Next we determine the key variables and build aforecast model This study utilizes the ordering ofimportant variables and iteratively deletes the leastimportant variable to iteratively implement the pro-posed forecasting model when the minimal RMSEis reached First all independent variables are usedto build the water level forecasting model Secondthe least important variables are removed one-by-one The remaining variables serve as a forecastmodel until the minimal RMSE is reached Afterthese iterative experiments the optimal forecast

Computational Intelligence and Neuroscience 7

Table 4 The results of listing models with the five imputation methods under 10-folds cross-validation before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0041 0590 0737 0246 0505MAE 0184 0126 0109 0195 0143RMSE 0227 0188 0154 0281 0225RAE 1000 0682 0592 1059 0775RRSE 0999 0825 0678 1235 0986

Serial mean

CC 0038 0612 0755 0237 0574MAE 0171 0113 0098 0181 0125RMSE 0217 0175 0143lowast 0270 0202RAE 1001 0660 0575 1058 0731RRSE 0999 0802 0658 1241 0929

Linear

CC 0042 0615 0753 0243 0551MAE 0173 0112 0098 0181 0127RMSE 0218 0175 0144 0269 0207RAE 1000 0649 0568 1057 0736RRSE 0999 0800 0660 1233 0948

Near median

CC 0043 0614 0752 0251 0535MAE 0173 0113 0098 0180 0131RMSE 0218 0175 0144 0268 0211RAE 1000 0653 0568 1041 0757RRSE 0999 0801 0661 1227 0967

Near mean

CC 0043 0613 0756lowast 0250 0558MAE 0173 0113 0098lowast 0179 0125RMSE 0218 0175 0144 0268 0205RAE 1000 0654 0565lowast 1039 0725RRSE 0999 0802 0657lowast 1226 0937

Regression

CC 0038 0618 0754 0240 0522MAE 0171 0112 0098 0181 0133RMSE 0217 0174 0143lowast 0270 0214RAE 1001 0653 0574 1055 0778RRSE 0999 0798 0659 1239 0983

lowast denotes the best performance among 5 imputation methods

Table 5 The results of variable selection

Factor1 2 3

Input 983 072 164Output 918 037 071Rainfall 717 064 503Temperature 092 965 minus149Pressure minus254 minus844 minus182Wind Speed 096 minus377 052Direction 041 351 minus067Rainfall Dasi 290 068 693Relative Humidity 025 minus196 413Note Each factorrsquos highest factor loading appears in bold

model is achieved when the Wind Speed and Direc-tion are deleted The key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

As a rule of thumb we recommend interpreting onlyfactor loadingswith an absolute value greater than 05which explains around 25 of the variance [37] Wecan see that the loadings of the two deleted variablesare smaller than 05 as seen in Table 5 Finally thisstudy employs the Random Forest forecast methodbased on variable selection and full variables to builda forecast model for water level forecasting in ShimenReservoir respectively

(3) Model comparison this study compares the proposedforecast model (using Random Forest) with the Ran-dom Forest RBF Network Kstar IBK (KNN) andRandomTree forecast models (Tables 6 and 7) Tables6 and 7 show that before variable selection (full vari-ables) the Random Forest forecast model has the bestforecast performance in CC RMSE MAE RAE andRRSE indices After variable selection all five forecastmodels have improved forecast performance underCC RMSE MAE RAE and RRSE Therefore theresults show that variable selection could improve the

8 Computational Intelligence and Neuroscience

Table 6 The results of compare forecasting models under percentage spilt (dataset partition into 66 training data and 34 testing data)after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0033 0638a 0729a 0251 0545a

MAE 0182a 0121a 0111a 0199 0135a

RMSE 0229 0176a 0156a 0287 0212a

RAE 0992a 0657a 0602a 1083 0736a

RRSE 1007 0775a 0688a 1262 0935a

Series mean

CC 0107a 0661a 0739a 0242a 0551MAE 0172a 0107a 0101a 0179a 0129RMSE 0221a 0167a 0151a 0268a 0205RAE 0988a 0615a 0579a 1027a 0740RRSE 0995a 0753a 0678a 1208a 0923

Linear

CC 0105a 0666a 0735a 0258a 0596a

MAE 0173a 0106a 0100a 0175a 0120a

RMSE 0221a 0166a 0151a 0266a 0196a

RAE 0987a 0606a 0572a 1002a 0683a

RRSE 0995a 0748a 0681a 1198a 0883a

Median of nearby points

CC 0106a 0666a 0740a 0264a 0553MAE 0173a 0107a 0100a 0177a 0127RMSE 0221a 0166a 0151a 0266a 0207RAE 0987a 0611a 0571a 1013a 0723RRSE 0995a 0747a 0677a 1195a 0932

Mean of nearby points

CC 01059a 0667a 0745ab 0249a 0540a

MAE 0173a 0107a 0099ab 0179a 0129a

RMSE 0221a 0166a 0149ab 0268a 0214a

RAE 0987a 0611a 0565ab 1025a 0735a

RRSE 0995a 0747a 0672ab 1207a 0962a

Regression

CC 0107a 0663a 0739a 0242a 0559a

MAE 0172a 0106a 0101a 0179a 0126a

RMSE 0221a 0167a 0151a 0268a 0200a

RAE 0987a 0610a 0581a 1027a 0723a

RRSE 0994a 0752a 0678a 1207a 0900a

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

forecasting performance The Random Forest modelas applied to variable selection with full variables isbetter than the listing model

43 Findings After variable selection and model buildingsome key findings can be highlighted

(1) Imputation after the two collected datasets wereconcatenated into an integrated dataset there aremissing values in the integrated dataset due to humanerror or mechanical failure Tables 3 and 4 show theintegrated dataset that uses the median of nearbypoints series mean mean of nearby points linearregression imputation and the delete strategy to eval-uate their accuracy via five machine learning forecastmodels The results show that the integrated datasetthat uses the mean of the nearby pointsrsquo imputationmethod has better forecasting performance

(2) Variable selection this study uses factor analysis torank the ordering of variables and then sequen-tially deletes the least important variables until theforecasting performance no longer improves Tables6 and 7 show that the variable selection couldsignificantly improve the performance of all fiveforecasting models After iterative experiments andvariable selection the key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

(3) Forecasting model this study proposed a time-seriesforecastingmodel based on estimatingmissing valuesand variable selection to forecast the water level inthe reservoirThese experimental results indicate thatthe Random Forest forecasting model when appliedto variable selection with full variables has betterforecasting performance than the listing models in

Computational Intelligence and Neuroscience 9

Table 7 The results of compare forecasting models under 10-folds cross-validation after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0103a 0665a 0737a 0233 0529a

MAE 0181a 0115a 0108a 0193a 0143a

RMSE 0226a 0171a 0154a 0282a 0223a

RAE 0984a 0627a 0589a 1047a 0774a

RRSE 0994a 0749a 0677a 1238 0977a

Series mean

CC 0081a 0688a 0751a 0295a 0547MAE 0169a 0103a 0098a 0170a 0131RMSE 0217a 0158a 0144 0260a 0209RAE 0988a 0600a 0571a 0990a 0767RRSE 0996a 0727a 0661 1193a 0960

Linear

CC 0081a 0692a 0750 0286a 0551a

MAE 0171a 0102a 0098a 0169a 0128RMSE 0218a 0158a 0145 0261a 0207a

RAE 0988a 0590a 0566a 0981a 0740RRSE 0996a 0723a 0662 1196a 0948a

Median of nearby points

CC 0083a 0692a 0752a 0305a 0555a

MAE 0171a 0102a 0097ab 0169a 0126a

RMSE 0218a 0158a 0144a 0259a 0208a

RAE 0987a 0593a 0563ab 0980a 0732a

RRSE 0996a 0722a 0660a 1186a 0951a

Mean of nearby points

CC 0082a 0694a 0753b 0276a 0537MAE 0171a 0102a 0097ab 0171a 0129RMSE 0218a 0157a 0144ab 0263a 0210RAE 0988a 0593a 0564a 0993a 0747RRSE 0996a 0721a 0659b 1204a 0960

Regression

CC 0081a 0690a 0753b 0295a 0572MAE 0169a 0102a 0098a 0169a 0126RMSE 0217a 0158a 0144ab 0259a 0204RAE 0988a 0595a 0571a 0989a 0735RRSE 0996a 0725a 0659ab 1193a 0938

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

the five evaluation indices The proposed time-seriesforecasting model is feasible for forecasting waterlevels in Shimen Reservoir

5 Conclusion

This study proposed a time-series forecastingmodel for waterlevel forecasting in Taiwanrsquos Shimen Reservoir The experi-ments showed that the mean of nearby pointsrsquo imputationmethod has the best performanceThese experimental resultsindicate that the Random Forest forecasting model whenapplied to variable selection with full variables has betterforecasting performance than the listing model The keyvariables are Reservoir IN Temperature Reservoir OUTPressure Rainfall Rainfall Dasi and Relative Humidity Theproposed time-series forecasting model withwithout vari-able selection has better forecasting performance than thelisting models using the five evaluation indices This showsthat the proposed time-series forecastingmodel is feasible for

forecastingwater levels in ShimenReservoir Futureworkwilladdress the following issues

(1) The reservoirrsquos utility includes irrigation domesticwater supply and electricity generation The keyvariables identified here could improve forecasting inthese fields

(2) We might apply the proposed time-series forecastingmodel based on imputation and variable selection toforecast the water level of lakes salt water bodiesreservoirs and so on

In addition this experiment shows that the proposed variableselection can help determine five forecast methods used hereto improve the forecasting capability

Conflicts of Interest

The authors declare that they have no conflicts of interest

10 Computational Intelligence and Neuroscience

References

[1] F-J Chang and Y-T Chang ldquoAdaptive neuro-fuzzy inferencesystem for prediction of water level in reservoirrdquo Advances inWater Resources vol 29 no 1 pp 1ndash10 2006

[2] M K Tiwari and C Chatterjee ldquoDevelopment of an accurateand reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approachrdquo Journal ofHydrology vol 394 no 3-4 pp 458ndash470 2010

[3] C-S Chen Y-D Jhong T-Y Wu and S-T Chen ldquoTyphoonevent-based evolutionary fuzzy inference model for flood stageforecastingrdquo Journal of Hydrology vol 490 pp 134ndash143 2013

[4] F-J Chang P-A Chen Y-R Lu E Huang and K-Y ChangldquoReal-time multi-step-ahead water level forecasting by recur-rent neural networks for urban flood controlrdquo Journal ofHydrology vol 517 pp 836ndash846 2014

[5] A Hipni A El-shafie A Najah O A Karim A Hussain andM Mukhlisin ldquoDaily Forecasting of Dam Water Levels Com-paring a Support Vector Machine (SVM)Model With AdaptiveNeuro Fuzzy Inference System (ANFIS)rdquo Water ResourcesManagement vol 27 no 10 pp 3803ndash3823 2013

[6] Y Seo S Kim O Kisi and V P Singh ldquoDaily water level fore-casting using wavelet decomposition and artificial intelligencetechniquesrdquo Journal of Hydrology vol 520 pp 224ndash243 2015

[7] D S Broomhead and D Lowe ldquoMultivariable functional inter-polation and adaptive networksrdquo Complex Systems vol 2 no 3pp 321ndash355 1988

[8] K Ganapathy V Vaidehi and J B Chandrasekar ldquoOptimumsteepest descent higher level learning radial basis functionnetworkrdquo Expert Systems with Applications vol 42 no 21 pp8064ndash8077 2015

[9] P Brandstetter and M Kuchar ldquoSensorless control of variablespeed induction motor drive using RBF neural networkrdquoJournal of Applied Logic vol 24 no part A pp 97ndash108 2017

[10] C M Anish and B Majhi ldquoHybrid nonlinear adaptive schemefor stock market prediction using feedback FLANN and factoranalysisrdquo Journal of the Korean Statistical Society vol 45 no 1pp 64ndash76 2016

[11] M Pottmann and H P Jorgl ldquoRadial basis function networksfor internal model controlrdquo Applied Mathematics and Compu-tation vol 70 no 2-3 pp 283ndash298 1995

[12] I Morlini ldquoRadial basis function networks with partiallyclassified datardquo Ecological Modelling vol 120 no 2-3 pp 109ndash118 1999

[13] A Nawahda ldquoAn assessment of adding value of traffic informa-tion and other attributes as part of its classifiers in a dataminingtool set for predicting surface ozone levelsrdquo Process Safety andEnvironmental Protection vol 99 pp 149ndash158 2016

[14] N R Kasat and S D Thepade ldquoNovel Content Based ImageClassificationMethod Using LBGVector QuantizationMethodwith Bayes and Lazy Family Data Mining Classifiersrdquo in Pro-ceedings of the 7th International Conference on CommunicationComputing and Virtualization ICCCV 2016 pp 483ndash489February 2016

[15] E Męzyk and O Unold ldquoMachine learning approach to modelsport trainingrdquoComputers in Human Behavior vol 27 no 5 pp1499ndash1506 2011

[16] J Abawajy A Kelarev M Chowdhury A Stranieri and H FJelinek ldquoPredicting cardiac autonomic neuropathy category fordiabetic data with missing valuesrdquo Computers in Biology andMedicine vol 43 no 10 pp 1328ndash1333 2013

[17] M A Ayu S A Ismail A F A Matin and T Mantoro ldquoAcomparison study of classifier algorithms for mobile-phonersquosaccelerometer based activity recognitionrdquoProcedia Engineeringvol 41 pp 224ndash229 2012

[18] A Tamvakis V Trygonis J Miritzis G Tsirtsis and S SpatharisldquoOptimizing biodiversity prediction from abiotic parametersrdquoEnvironmentalModeling and Software vol 53 pp 112ndash120 2014

[19] C Brokamp R Jandarov M B Rao G LeMasters and PRyan ldquoExposure assessment models for elemental componentsof particulate matter in an urban environment a comparisonof regression and random forest approachesrdquo AtmosphericEnvironment vol 151 pp 1ndash11 2017

[20] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[21] P Yıldırım ldquoPattern Classification with Imbalanced and Mul-ticlass Data for the Prediction of Albendazole Adverse EventOutcomesrdquo in Proceedings of the 7th International Conference onAmbient Systems Networks and Technologies ANT 2016 and the6th International Conference on Sustainable Energy InformationTechnology SEIT 2016 pp 1013ndash1018 May 2016

[22] Y Zhao and Y Zhang ldquoComparison of decision tree methodsfor finding active objectsrdquo Advances in Space Research vol 41no 12 pp 1955ndash1959 2008

[23] KM A Kumar N Rajasimha M Reddy A Rajanarayana andK Nadgir ldquoAnalysis of usersrsquo Sentiments from Kannada WebDocumentsrdquo in Proceedings of the 11th International Conferenceon Communication Networks ICCN 2015 pp 247ndash256 August2015

[24] C ZhongW Pedrycz DWang L Li and Z Li ldquoGranular dataimputation A framework of Granular ComputingrdquoApplied SoftComputing vol 46 pp 307ndash316 2016

[25] C Genolini A Lacombe R Ecochard and F Subtil ldquoCopy-Mean A new method to predict monotone missing values inlongitudinal studiesrdquo Computer Methods and Programs in Bio-medicine vol 132 pp 29ndash44 2016

[26] M Majidpour C Qiu P Chu H R Pota and R GadhldquoForecasting the EV charging load based on customer profileor station measurementrdquo Applied Energy vol 163 pp 134ndash1412016

[27] O Cokluk and M Kayri ldquoThe effects of methods of imputationfor missing values on the validity and reliability of scalesrdquo inEducational Sciences Theory and Practice vol 11 pp 303ndash3092011

[28] S A Bashir and S W Duffy ldquoThe correction of risk estimatesformeasurement errorrdquoAnnals of Epidemiology vol 7 no 2 pp154ndash164 1997

[29] A Ozturk S Kayalıgil and N E Ozdemirel ldquoManufacturinglead time estimation using data miningrdquo European Journal ofOperational Research vol 173 no 2 pp 683ndash700 2006

[30] H Vathsala and S G Koolagudi ldquoClosed Item-Set Mining forPrediction of Indian SummerMonsoon Rainfall ADataMiningModel with Land and Ocean Variables as Predictorsrdquo in Pro-ceedings of the 11th International Conference on CommunicationNetworks ICCN 2015 pp 271ndash280 August 2015

[31] S Banerjee A Anura J Chakrabarty S Sengupta and J Chat-terjee ldquoIdentification and functional assessment of novel genesets towards better understanding of dysplasia associated oralcarcinogenesisrdquo Gene Reports vol 4 pp 131ndash138 2016

[32] H Yahyaoui and A Al-Mutairi ldquoA feature-based trust sequenceclassification algorithmrdquo Information Sciences vol 328 pp 455ndash484 2016

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

6 Computational Intelligence and Neuroscience

Table 3 The results of listing models with the five imputation methods under percentage spilt (dataset partition into 66 training data and34 testing data) before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0085 0546 0728 0288 0534MAE 0183 0133 0111 0186 0145RMSE 0227 0200 0157 0271 0226RAE 0998 0727 0607 1015 0789RRSE 0997 0879 0690 1193 0995

Serial mean

CC 0052 0557 0739 0198 0563MAE 0174 0123 0102 0191 0126RMSE 0222 0189 0151lowast 0283 0202RAE 1001 0705 0587 1098 0722RRSE 0999 0850 0679 1276 0908

Linear

CC 0054 0565 0734 0200 0512MAE 0175 0121 0101 0189 0138RMSE 0222 0188 0152 0281 0218RAE 1000 0690 0575lowast 1082 0787RRSE 0999 0844 0684 1264 0980

Near median

CC 0054 0571 0737 0227 0559MAE 0175 0120 0101 0188 0126RMSE 0222 0186 0152 0277 0202RAE 1000 0689 0577 1074 0719RRSE 0999 0838 0681 1244 0907

Near mean

CC 0053 0572 0740lowast 0232 0512MAE 0175 0121 0101lowast 0186 0132RMSE 0222 0186 0151lowast 0275 0217RAE 1000 0690 0575lowast 1062 0756RRSE 0999 0837 0678lowast 1235 0975

Regression

CC 0052 0564 0739 0200 0509MAE 0174 0121 0102 0191 0133RMSE 0222 0188 0151 0283 0216RAE 1001 0695 0586 1096 0762RRSE 0999 0845 0680 1275 0974

lowast denotes the best performance among 5 imputation methods

normalizing the six processedmissing values datasetswe used two approaches to estimate the datasets per-centage spilt (dataset partition into 66 training dataand 34 testing data) and 10-fold cross-validationThe two approaches employ Random Forest RBFNetwork Kstar KNN and Random Tree to forecastwater levels for evaluating the six processed missingvalues methods under five evaluation indices CCRMSE MAE RAE and RRSE The results are shownin Tables 3 and 4 and we can see that the mean ofthe nearby pointsrsquo method wins versus other methodsin CC MAE RAE RRSE and RMSE Therefore themean of the nearby pointsrsquo imputation method betterestimates the Shimen Reservoir water level

(2) Variable selection andmodel building uses the resultsfrom above The best imputation method is themean of nearby points Therefore this study will usethe imputed dataset of the mean of nearby points

to select the variable and build the model For vari-able selection this study utilizes factor analysis torank the importance of independent variables for animproved imputed dataset Table 5 indicates that theordering of important variables is Reservoir IN Tem-perature Reservoir OUT Pressure Rainfall RainfallDasi Relative Humidity Wind Speed and Direction

Next we determine the key variables and build aforecast model This study utilizes the ordering ofimportant variables and iteratively deletes the leastimportant variable to iteratively implement the pro-posed forecasting model when the minimal RMSEis reached First all independent variables are usedto build the water level forecasting model Secondthe least important variables are removed one-by-one The remaining variables serve as a forecastmodel until the minimal RMSE is reached Afterthese iterative experiments the optimal forecast

Computational Intelligence and Neuroscience 7

Table 4 The results of listing models with the five imputation methods under 10-folds cross-validation before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0041 0590 0737 0246 0505MAE 0184 0126 0109 0195 0143RMSE 0227 0188 0154 0281 0225RAE 1000 0682 0592 1059 0775RRSE 0999 0825 0678 1235 0986

Serial mean

CC 0038 0612 0755 0237 0574MAE 0171 0113 0098 0181 0125RMSE 0217 0175 0143lowast 0270 0202RAE 1001 0660 0575 1058 0731RRSE 0999 0802 0658 1241 0929

Linear

CC 0042 0615 0753 0243 0551MAE 0173 0112 0098 0181 0127RMSE 0218 0175 0144 0269 0207RAE 1000 0649 0568 1057 0736RRSE 0999 0800 0660 1233 0948

Near median

CC 0043 0614 0752 0251 0535MAE 0173 0113 0098 0180 0131RMSE 0218 0175 0144 0268 0211RAE 1000 0653 0568 1041 0757RRSE 0999 0801 0661 1227 0967

Near mean

CC 0043 0613 0756lowast 0250 0558MAE 0173 0113 0098lowast 0179 0125RMSE 0218 0175 0144 0268 0205RAE 1000 0654 0565lowast 1039 0725RRSE 0999 0802 0657lowast 1226 0937

Regression

CC 0038 0618 0754 0240 0522MAE 0171 0112 0098 0181 0133RMSE 0217 0174 0143lowast 0270 0214RAE 1001 0653 0574 1055 0778RRSE 0999 0798 0659 1239 0983

lowast denotes the best performance among 5 imputation methods

Table 5 The results of variable selection

Factor1 2 3

Input 983 072 164Output 918 037 071Rainfall 717 064 503Temperature 092 965 minus149Pressure minus254 minus844 minus182Wind Speed 096 minus377 052Direction 041 351 minus067Rainfall Dasi 290 068 693Relative Humidity 025 minus196 413Note Each factorrsquos highest factor loading appears in bold

model is achieved when the Wind Speed and Direc-tion are deleted The key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

As a rule of thumb we recommend interpreting onlyfactor loadingswith an absolute value greater than 05which explains around 25 of the variance [37] Wecan see that the loadings of the two deleted variablesare smaller than 05 as seen in Table 5 Finally thisstudy employs the Random Forest forecast methodbased on variable selection and full variables to builda forecast model for water level forecasting in ShimenReservoir respectively

(3) Model comparison this study compares the proposedforecast model (using Random Forest) with the Ran-dom Forest RBF Network Kstar IBK (KNN) andRandomTree forecast models (Tables 6 and 7) Tables6 and 7 show that before variable selection (full vari-ables) the Random Forest forecast model has the bestforecast performance in CC RMSE MAE RAE andRRSE indices After variable selection all five forecastmodels have improved forecast performance underCC RMSE MAE RAE and RRSE Therefore theresults show that variable selection could improve the

8 Computational Intelligence and Neuroscience

Table 6 The results of compare forecasting models under percentage spilt (dataset partition into 66 training data and 34 testing data)after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0033 0638a 0729a 0251 0545a

MAE 0182a 0121a 0111a 0199 0135a

RMSE 0229 0176a 0156a 0287 0212a

RAE 0992a 0657a 0602a 1083 0736a

RRSE 1007 0775a 0688a 1262 0935a

Series mean

CC 0107a 0661a 0739a 0242a 0551MAE 0172a 0107a 0101a 0179a 0129RMSE 0221a 0167a 0151a 0268a 0205RAE 0988a 0615a 0579a 1027a 0740RRSE 0995a 0753a 0678a 1208a 0923

Linear

CC 0105a 0666a 0735a 0258a 0596a

MAE 0173a 0106a 0100a 0175a 0120a

RMSE 0221a 0166a 0151a 0266a 0196a

RAE 0987a 0606a 0572a 1002a 0683a

RRSE 0995a 0748a 0681a 1198a 0883a

Median of nearby points

CC 0106a 0666a 0740a 0264a 0553MAE 0173a 0107a 0100a 0177a 0127RMSE 0221a 0166a 0151a 0266a 0207RAE 0987a 0611a 0571a 1013a 0723RRSE 0995a 0747a 0677a 1195a 0932

Mean of nearby points

CC 01059a 0667a 0745ab 0249a 0540a

MAE 0173a 0107a 0099ab 0179a 0129a

RMSE 0221a 0166a 0149ab 0268a 0214a

RAE 0987a 0611a 0565ab 1025a 0735a

RRSE 0995a 0747a 0672ab 1207a 0962a

Regression

CC 0107a 0663a 0739a 0242a 0559a

MAE 0172a 0106a 0101a 0179a 0126a

RMSE 0221a 0167a 0151a 0268a 0200a

RAE 0987a 0610a 0581a 1027a 0723a

RRSE 0994a 0752a 0678a 1207a 0900a

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

forecasting performance The Random Forest modelas applied to variable selection with full variables isbetter than the listing model

43 Findings After variable selection and model buildingsome key findings can be highlighted

(1) Imputation after the two collected datasets wereconcatenated into an integrated dataset there aremissing values in the integrated dataset due to humanerror or mechanical failure Tables 3 and 4 show theintegrated dataset that uses the median of nearbypoints series mean mean of nearby points linearregression imputation and the delete strategy to eval-uate their accuracy via five machine learning forecastmodels The results show that the integrated datasetthat uses the mean of the nearby pointsrsquo imputationmethod has better forecasting performance

(2) Variable selection this study uses factor analysis torank the ordering of variables and then sequen-tially deletes the least important variables until theforecasting performance no longer improves Tables6 and 7 show that the variable selection couldsignificantly improve the performance of all fiveforecasting models After iterative experiments andvariable selection the key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

(3) Forecasting model this study proposed a time-seriesforecastingmodel based on estimatingmissing valuesand variable selection to forecast the water level inthe reservoirThese experimental results indicate thatthe Random Forest forecasting model when appliedto variable selection with full variables has betterforecasting performance than the listing models in

Computational Intelligence and Neuroscience 9

Table 7 The results of compare forecasting models under 10-folds cross-validation after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0103a 0665a 0737a 0233 0529a

MAE 0181a 0115a 0108a 0193a 0143a

RMSE 0226a 0171a 0154a 0282a 0223a

RAE 0984a 0627a 0589a 1047a 0774a

RRSE 0994a 0749a 0677a 1238 0977a

Series mean

CC 0081a 0688a 0751a 0295a 0547MAE 0169a 0103a 0098a 0170a 0131RMSE 0217a 0158a 0144 0260a 0209RAE 0988a 0600a 0571a 0990a 0767RRSE 0996a 0727a 0661 1193a 0960

Linear

CC 0081a 0692a 0750 0286a 0551a

MAE 0171a 0102a 0098a 0169a 0128RMSE 0218a 0158a 0145 0261a 0207a

RAE 0988a 0590a 0566a 0981a 0740RRSE 0996a 0723a 0662 1196a 0948a

Median of nearby points

CC 0083a 0692a 0752a 0305a 0555a

MAE 0171a 0102a 0097ab 0169a 0126a

RMSE 0218a 0158a 0144a 0259a 0208a

RAE 0987a 0593a 0563ab 0980a 0732a

RRSE 0996a 0722a 0660a 1186a 0951a

Mean of nearby points

CC 0082a 0694a 0753b 0276a 0537MAE 0171a 0102a 0097ab 0171a 0129RMSE 0218a 0157a 0144ab 0263a 0210RAE 0988a 0593a 0564a 0993a 0747RRSE 0996a 0721a 0659b 1204a 0960

Regression

CC 0081a 0690a 0753b 0295a 0572MAE 0169a 0102a 0098a 0169a 0126RMSE 0217a 0158a 0144ab 0259a 0204RAE 0988a 0595a 0571a 0989a 0735RRSE 0996a 0725a 0659ab 1193a 0938

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

the five evaluation indices The proposed time-seriesforecasting model is feasible for forecasting waterlevels in Shimen Reservoir

5 Conclusion

This study proposed a time-series forecastingmodel for waterlevel forecasting in Taiwanrsquos Shimen Reservoir The experi-ments showed that the mean of nearby pointsrsquo imputationmethod has the best performanceThese experimental resultsindicate that the Random Forest forecasting model whenapplied to variable selection with full variables has betterforecasting performance than the listing model The keyvariables are Reservoir IN Temperature Reservoir OUTPressure Rainfall Rainfall Dasi and Relative Humidity Theproposed time-series forecasting model withwithout vari-able selection has better forecasting performance than thelisting models using the five evaluation indices This showsthat the proposed time-series forecastingmodel is feasible for

forecastingwater levels in ShimenReservoir Futureworkwilladdress the following issues

(1) The reservoirrsquos utility includes irrigation domesticwater supply and electricity generation The keyvariables identified here could improve forecasting inthese fields

(2) We might apply the proposed time-series forecastingmodel based on imputation and variable selection toforecast the water level of lakes salt water bodiesreservoirs and so on

In addition this experiment shows that the proposed variableselection can help determine five forecast methods used hereto improve the forecasting capability

Conflicts of Interest

The authors declare that they have no conflicts of interest

10 Computational Intelligence and Neuroscience

References

[1] F-J Chang and Y-T Chang ldquoAdaptive neuro-fuzzy inferencesystem for prediction of water level in reservoirrdquo Advances inWater Resources vol 29 no 1 pp 1ndash10 2006

[2] M K Tiwari and C Chatterjee ldquoDevelopment of an accurateand reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approachrdquo Journal ofHydrology vol 394 no 3-4 pp 458ndash470 2010

[3] C-S Chen Y-D Jhong T-Y Wu and S-T Chen ldquoTyphoonevent-based evolutionary fuzzy inference model for flood stageforecastingrdquo Journal of Hydrology vol 490 pp 134ndash143 2013

[4] F-J Chang P-A Chen Y-R Lu E Huang and K-Y ChangldquoReal-time multi-step-ahead water level forecasting by recur-rent neural networks for urban flood controlrdquo Journal ofHydrology vol 517 pp 836ndash846 2014

[5] A Hipni A El-shafie A Najah O A Karim A Hussain andM Mukhlisin ldquoDaily Forecasting of Dam Water Levels Com-paring a Support Vector Machine (SVM)Model With AdaptiveNeuro Fuzzy Inference System (ANFIS)rdquo Water ResourcesManagement vol 27 no 10 pp 3803ndash3823 2013

[6] Y Seo S Kim O Kisi and V P Singh ldquoDaily water level fore-casting using wavelet decomposition and artificial intelligencetechniquesrdquo Journal of Hydrology vol 520 pp 224ndash243 2015

[7] D S Broomhead and D Lowe ldquoMultivariable functional inter-polation and adaptive networksrdquo Complex Systems vol 2 no 3pp 321ndash355 1988

[8] K Ganapathy V Vaidehi and J B Chandrasekar ldquoOptimumsteepest descent higher level learning radial basis functionnetworkrdquo Expert Systems with Applications vol 42 no 21 pp8064ndash8077 2015

[9] P Brandstetter and M Kuchar ldquoSensorless control of variablespeed induction motor drive using RBF neural networkrdquoJournal of Applied Logic vol 24 no part A pp 97ndash108 2017

[10] C M Anish and B Majhi ldquoHybrid nonlinear adaptive schemefor stock market prediction using feedback FLANN and factoranalysisrdquo Journal of the Korean Statistical Society vol 45 no 1pp 64ndash76 2016

[11] M Pottmann and H P Jorgl ldquoRadial basis function networksfor internal model controlrdquo Applied Mathematics and Compu-tation vol 70 no 2-3 pp 283ndash298 1995

[12] I Morlini ldquoRadial basis function networks with partiallyclassified datardquo Ecological Modelling vol 120 no 2-3 pp 109ndash118 1999

[13] A Nawahda ldquoAn assessment of adding value of traffic informa-tion and other attributes as part of its classifiers in a dataminingtool set for predicting surface ozone levelsrdquo Process Safety andEnvironmental Protection vol 99 pp 149ndash158 2016

[14] N R Kasat and S D Thepade ldquoNovel Content Based ImageClassificationMethod Using LBGVector QuantizationMethodwith Bayes and Lazy Family Data Mining Classifiersrdquo in Pro-ceedings of the 7th International Conference on CommunicationComputing and Virtualization ICCCV 2016 pp 483ndash489February 2016

[15] E Męzyk and O Unold ldquoMachine learning approach to modelsport trainingrdquoComputers in Human Behavior vol 27 no 5 pp1499ndash1506 2011

[16] J Abawajy A Kelarev M Chowdhury A Stranieri and H FJelinek ldquoPredicting cardiac autonomic neuropathy category fordiabetic data with missing valuesrdquo Computers in Biology andMedicine vol 43 no 10 pp 1328ndash1333 2013

[17] M A Ayu S A Ismail A F A Matin and T Mantoro ldquoAcomparison study of classifier algorithms for mobile-phonersquosaccelerometer based activity recognitionrdquoProcedia Engineeringvol 41 pp 224ndash229 2012

[18] A Tamvakis V Trygonis J Miritzis G Tsirtsis and S SpatharisldquoOptimizing biodiversity prediction from abiotic parametersrdquoEnvironmentalModeling and Software vol 53 pp 112ndash120 2014

[19] C Brokamp R Jandarov M B Rao G LeMasters and PRyan ldquoExposure assessment models for elemental componentsof particulate matter in an urban environment a comparisonof regression and random forest approachesrdquo AtmosphericEnvironment vol 151 pp 1ndash11 2017

[20] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[21] P Yıldırım ldquoPattern Classification with Imbalanced and Mul-ticlass Data for the Prediction of Albendazole Adverse EventOutcomesrdquo in Proceedings of the 7th International Conference onAmbient Systems Networks and Technologies ANT 2016 and the6th International Conference on Sustainable Energy InformationTechnology SEIT 2016 pp 1013ndash1018 May 2016

[22] Y Zhao and Y Zhang ldquoComparison of decision tree methodsfor finding active objectsrdquo Advances in Space Research vol 41no 12 pp 1955ndash1959 2008

[23] KM A Kumar N Rajasimha M Reddy A Rajanarayana andK Nadgir ldquoAnalysis of usersrsquo Sentiments from Kannada WebDocumentsrdquo in Proceedings of the 11th International Conferenceon Communication Networks ICCN 2015 pp 247ndash256 August2015

[24] C ZhongW Pedrycz DWang L Li and Z Li ldquoGranular dataimputation A framework of Granular ComputingrdquoApplied SoftComputing vol 46 pp 307ndash316 2016

[25] C Genolini A Lacombe R Ecochard and F Subtil ldquoCopy-Mean A new method to predict monotone missing values inlongitudinal studiesrdquo Computer Methods and Programs in Bio-medicine vol 132 pp 29ndash44 2016

[26] M Majidpour C Qiu P Chu H R Pota and R GadhldquoForecasting the EV charging load based on customer profileor station measurementrdquo Applied Energy vol 163 pp 134ndash1412016

[27] O Cokluk and M Kayri ldquoThe effects of methods of imputationfor missing values on the validity and reliability of scalesrdquo inEducational Sciences Theory and Practice vol 11 pp 303ndash3092011

[28] S A Bashir and S W Duffy ldquoThe correction of risk estimatesformeasurement errorrdquoAnnals of Epidemiology vol 7 no 2 pp154ndash164 1997

[29] A Ozturk S Kayalıgil and N E Ozdemirel ldquoManufacturinglead time estimation using data miningrdquo European Journal ofOperational Research vol 173 no 2 pp 683ndash700 2006

[30] H Vathsala and S G Koolagudi ldquoClosed Item-Set Mining forPrediction of Indian SummerMonsoon Rainfall ADataMiningModel with Land and Ocean Variables as Predictorsrdquo in Pro-ceedings of the 11th International Conference on CommunicationNetworks ICCN 2015 pp 271ndash280 August 2015

[31] S Banerjee A Anura J Chakrabarty S Sengupta and J Chat-terjee ldquoIdentification and functional assessment of novel genesets towards better understanding of dysplasia associated oralcarcinogenesisrdquo Gene Reports vol 4 pp 131ndash138 2016

[32] H Yahyaoui and A Al-Mutairi ldquoA feature-based trust sequenceclassification algorithmrdquo Information Sciences vol 328 pp 455ndash484 2016

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

Computational Intelligence and Neuroscience 7

Table 4 The results of listing models with the five imputation methods under 10-folds cross-validation before variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

Before variable selection

Delete the rows with missing data

CC 0041 0590 0737 0246 0505MAE 0184 0126 0109 0195 0143RMSE 0227 0188 0154 0281 0225RAE 1000 0682 0592 1059 0775RRSE 0999 0825 0678 1235 0986

Serial mean

CC 0038 0612 0755 0237 0574MAE 0171 0113 0098 0181 0125RMSE 0217 0175 0143lowast 0270 0202RAE 1001 0660 0575 1058 0731RRSE 0999 0802 0658 1241 0929

Linear

CC 0042 0615 0753 0243 0551MAE 0173 0112 0098 0181 0127RMSE 0218 0175 0144 0269 0207RAE 1000 0649 0568 1057 0736RRSE 0999 0800 0660 1233 0948

Near median

CC 0043 0614 0752 0251 0535MAE 0173 0113 0098 0180 0131RMSE 0218 0175 0144 0268 0211RAE 1000 0653 0568 1041 0757RRSE 0999 0801 0661 1227 0967

Near mean

CC 0043 0613 0756lowast 0250 0558MAE 0173 0113 0098lowast 0179 0125RMSE 0218 0175 0144 0268 0205RAE 1000 0654 0565lowast 1039 0725RRSE 0999 0802 0657lowast 1226 0937

Regression

CC 0038 0618 0754 0240 0522MAE 0171 0112 0098 0181 0133RMSE 0217 0174 0143lowast 0270 0214RAE 1001 0653 0574 1055 0778RRSE 0999 0798 0659 1239 0983

lowast denotes the best performance among 5 imputation methods

Table 5 The results of variable selection

Factor1 2 3

Input 983 072 164Output 918 037 071Rainfall 717 064 503Temperature 092 965 minus149Pressure minus254 minus844 minus182Wind Speed 096 minus377 052Direction 041 351 minus067Rainfall Dasi 290 068 693Relative Humidity 025 minus196 413Note Each factorrsquos highest factor loading appears in bold

model is achieved when the Wind Speed and Direc-tion are deleted The key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

As a rule of thumb we recommend interpreting onlyfactor loadingswith an absolute value greater than 05which explains around 25 of the variance [37] Wecan see that the loadings of the two deleted variablesare smaller than 05 as seen in Table 5 Finally thisstudy employs the Random Forest forecast methodbased on variable selection and full variables to builda forecast model for water level forecasting in ShimenReservoir respectively

(3) Model comparison this study compares the proposedforecast model (using Random Forest) with the Ran-dom Forest RBF Network Kstar IBK (KNN) andRandomTree forecast models (Tables 6 and 7) Tables6 and 7 show that before variable selection (full vari-ables) the Random Forest forecast model has the bestforecast performance in CC RMSE MAE RAE andRRSE indices After variable selection all five forecastmodels have improved forecast performance underCC RMSE MAE RAE and RRSE Therefore theresults show that variable selection could improve the

8 Computational Intelligence and Neuroscience

Table 6 The results of compare forecasting models under percentage spilt (dataset partition into 66 training data and 34 testing data)after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0033 0638a 0729a 0251 0545a

MAE 0182a 0121a 0111a 0199 0135a

RMSE 0229 0176a 0156a 0287 0212a

RAE 0992a 0657a 0602a 1083 0736a

RRSE 1007 0775a 0688a 1262 0935a

Series mean

CC 0107a 0661a 0739a 0242a 0551MAE 0172a 0107a 0101a 0179a 0129RMSE 0221a 0167a 0151a 0268a 0205RAE 0988a 0615a 0579a 1027a 0740RRSE 0995a 0753a 0678a 1208a 0923

Linear

CC 0105a 0666a 0735a 0258a 0596a

MAE 0173a 0106a 0100a 0175a 0120a

RMSE 0221a 0166a 0151a 0266a 0196a

RAE 0987a 0606a 0572a 1002a 0683a

RRSE 0995a 0748a 0681a 1198a 0883a

Median of nearby points

CC 0106a 0666a 0740a 0264a 0553MAE 0173a 0107a 0100a 0177a 0127RMSE 0221a 0166a 0151a 0266a 0207RAE 0987a 0611a 0571a 1013a 0723RRSE 0995a 0747a 0677a 1195a 0932

Mean of nearby points

CC 01059a 0667a 0745ab 0249a 0540a

MAE 0173a 0107a 0099ab 0179a 0129a

RMSE 0221a 0166a 0149ab 0268a 0214a

RAE 0987a 0611a 0565ab 1025a 0735a

RRSE 0995a 0747a 0672ab 1207a 0962a

Regression

CC 0107a 0663a 0739a 0242a 0559a

MAE 0172a 0106a 0101a 0179a 0126a

RMSE 0221a 0167a 0151a 0268a 0200a

RAE 0987a 0610a 0581a 1027a 0723a

RRSE 0994a 0752a 0678a 1207a 0900a

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

forecasting performance The Random Forest modelas applied to variable selection with full variables isbetter than the listing model

43 Findings After variable selection and model buildingsome key findings can be highlighted

(1) Imputation after the two collected datasets wereconcatenated into an integrated dataset there aremissing values in the integrated dataset due to humanerror or mechanical failure Tables 3 and 4 show theintegrated dataset that uses the median of nearbypoints series mean mean of nearby points linearregression imputation and the delete strategy to eval-uate their accuracy via five machine learning forecastmodels The results show that the integrated datasetthat uses the mean of the nearby pointsrsquo imputationmethod has better forecasting performance

(2) Variable selection this study uses factor analysis torank the ordering of variables and then sequen-tially deletes the least important variables until theforecasting performance no longer improves Tables6 and 7 show that the variable selection couldsignificantly improve the performance of all fiveforecasting models After iterative experiments andvariable selection the key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

(3) Forecasting model this study proposed a time-seriesforecastingmodel based on estimatingmissing valuesand variable selection to forecast the water level inthe reservoirThese experimental results indicate thatthe Random Forest forecasting model when appliedto variable selection with full variables has betterforecasting performance than the listing models in

Computational Intelligence and Neuroscience 9

Table 7 The results of compare forecasting models under 10-folds cross-validation after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0103a 0665a 0737a 0233 0529a

MAE 0181a 0115a 0108a 0193a 0143a

RMSE 0226a 0171a 0154a 0282a 0223a

RAE 0984a 0627a 0589a 1047a 0774a

RRSE 0994a 0749a 0677a 1238 0977a

Series mean

CC 0081a 0688a 0751a 0295a 0547MAE 0169a 0103a 0098a 0170a 0131RMSE 0217a 0158a 0144 0260a 0209RAE 0988a 0600a 0571a 0990a 0767RRSE 0996a 0727a 0661 1193a 0960

Linear

CC 0081a 0692a 0750 0286a 0551a

MAE 0171a 0102a 0098a 0169a 0128RMSE 0218a 0158a 0145 0261a 0207a

RAE 0988a 0590a 0566a 0981a 0740RRSE 0996a 0723a 0662 1196a 0948a

Median of nearby points

CC 0083a 0692a 0752a 0305a 0555a

MAE 0171a 0102a 0097ab 0169a 0126a

RMSE 0218a 0158a 0144a 0259a 0208a

RAE 0987a 0593a 0563ab 0980a 0732a

RRSE 0996a 0722a 0660a 1186a 0951a

Mean of nearby points

CC 0082a 0694a 0753b 0276a 0537MAE 0171a 0102a 0097ab 0171a 0129RMSE 0218a 0157a 0144ab 0263a 0210RAE 0988a 0593a 0564a 0993a 0747RRSE 0996a 0721a 0659b 1204a 0960

Regression

CC 0081a 0690a 0753b 0295a 0572MAE 0169a 0102a 0098a 0169a 0126RMSE 0217a 0158a 0144ab 0259a 0204RAE 0988a 0595a 0571a 0989a 0735RRSE 0996a 0725a 0659ab 1193a 0938

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

the five evaluation indices The proposed time-seriesforecasting model is feasible for forecasting waterlevels in Shimen Reservoir

5 Conclusion

This study proposed a time-series forecastingmodel for waterlevel forecasting in Taiwanrsquos Shimen Reservoir The experi-ments showed that the mean of nearby pointsrsquo imputationmethod has the best performanceThese experimental resultsindicate that the Random Forest forecasting model whenapplied to variable selection with full variables has betterforecasting performance than the listing model The keyvariables are Reservoir IN Temperature Reservoir OUTPressure Rainfall Rainfall Dasi and Relative Humidity Theproposed time-series forecasting model withwithout vari-able selection has better forecasting performance than thelisting models using the five evaluation indices This showsthat the proposed time-series forecastingmodel is feasible for

forecastingwater levels in ShimenReservoir Futureworkwilladdress the following issues

(1) The reservoirrsquos utility includes irrigation domesticwater supply and electricity generation The keyvariables identified here could improve forecasting inthese fields

(2) We might apply the proposed time-series forecastingmodel based on imputation and variable selection toforecast the water level of lakes salt water bodiesreservoirs and so on

In addition this experiment shows that the proposed variableselection can help determine five forecast methods used hereto improve the forecasting capability

Conflicts of Interest

The authors declare that they have no conflicts of interest

10 Computational Intelligence and Neuroscience

References

[1] F-J Chang and Y-T Chang ldquoAdaptive neuro-fuzzy inferencesystem for prediction of water level in reservoirrdquo Advances inWater Resources vol 29 no 1 pp 1ndash10 2006

[2] M K Tiwari and C Chatterjee ldquoDevelopment of an accurateand reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approachrdquo Journal ofHydrology vol 394 no 3-4 pp 458ndash470 2010

[3] C-S Chen Y-D Jhong T-Y Wu and S-T Chen ldquoTyphoonevent-based evolutionary fuzzy inference model for flood stageforecastingrdquo Journal of Hydrology vol 490 pp 134ndash143 2013

[4] F-J Chang P-A Chen Y-R Lu E Huang and K-Y ChangldquoReal-time multi-step-ahead water level forecasting by recur-rent neural networks for urban flood controlrdquo Journal ofHydrology vol 517 pp 836ndash846 2014

[5] A Hipni A El-shafie A Najah O A Karim A Hussain andM Mukhlisin ldquoDaily Forecasting of Dam Water Levels Com-paring a Support Vector Machine (SVM)Model With AdaptiveNeuro Fuzzy Inference System (ANFIS)rdquo Water ResourcesManagement vol 27 no 10 pp 3803ndash3823 2013

[6] Y Seo S Kim O Kisi and V P Singh ldquoDaily water level fore-casting using wavelet decomposition and artificial intelligencetechniquesrdquo Journal of Hydrology vol 520 pp 224ndash243 2015

[7] D S Broomhead and D Lowe ldquoMultivariable functional inter-polation and adaptive networksrdquo Complex Systems vol 2 no 3pp 321ndash355 1988

[8] K Ganapathy V Vaidehi and J B Chandrasekar ldquoOptimumsteepest descent higher level learning radial basis functionnetworkrdquo Expert Systems with Applications vol 42 no 21 pp8064ndash8077 2015

[9] P Brandstetter and M Kuchar ldquoSensorless control of variablespeed induction motor drive using RBF neural networkrdquoJournal of Applied Logic vol 24 no part A pp 97ndash108 2017

[10] C M Anish and B Majhi ldquoHybrid nonlinear adaptive schemefor stock market prediction using feedback FLANN and factoranalysisrdquo Journal of the Korean Statistical Society vol 45 no 1pp 64ndash76 2016

[11] M Pottmann and H P Jorgl ldquoRadial basis function networksfor internal model controlrdquo Applied Mathematics and Compu-tation vol 70 no 2-3 pp 283ndash298 1995

[12] I Morlini ldquoRadial basis function networks with partiallyclassified datardquo Ecological Modelling vol 120 no 2-3 pp 109ndash118 1999

[13] A Nawahda ldquoAn assessment of adding value of traffic informa-tion and other attributes as part of its classifiers in a dataminingtool set for predicting surface ozone levelsrdquo Process Safety andEnvironmental Protection vol 99 pp 149ndash158 2016

[14] N R Kasat and S D Thepade ldquoNovel Content Based ImageClassificationMethod Using LBGVector QuantizationMethodwith Bayes and Lazy Family Data Mining Classifiersrdquo in Pro-ceedings of the 7th International Conference on CommunicationComputing and Virtualization ICCCV 2016 pp 483ndash489February 2016

[15] E Męzyk and O Unold ldquoMachine learning approach to modelsport trainingrdquoComputers in Human Behavior vol 27 no 5 pp1499ndash1506 2011

[16] J Abawajy A Kelarev M Chowdhury A Stranieri and H FJelinek ldquoPredicting cardiac autonomic neuropathy category fordiabetic data with missing valuesrdquo Computers in Biology andMedicine vol 43 no 10 pp 1328ndash1333 2013

[17] M A Ayu S A Ismail A F A Matin and T Mantoro ldquoAcomparison study of classifier algorithms for mobile-phonersquosaccelerometer based activity recognitionrdquoProcedia Engineeringvol 41 pp 224ndash229 2012

[18] A Tamvakis V Trygonis J Miritzis G Tsirtsis and S SpatharisldquoOptimizing biodiversity prediction from abiotic parametersrdquoEnvironmentalModeling and Software vol 53 pp 112ndash120 2014

[19] C Brokamp R Jandarov M B Rao G LeMasters and PRyan ldquoExposure assessment models for elemental componentsof particulate matter in an urban environment a comparisonof regression and random forest approachesrdquo AtmosphericEnvironment vol 151 pp 1ndash11 2017

[20] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[21] P Yıldırım ldquoPattern Classification with Imbalanced and Mul-ticlass Data for the Prediction of Albendazole Adverse EventOutcomesrdquo in Proceedings of the 7th International Conference onAmbient Systems Networks and Technologies ANT 2016 and the6th International Conference on Sustainable Energy InformationTechnology SEIT 2016 pp 1013ndash1018 May 2016

[22] Y Zhao and Y Zhang ldquoComparison of decision tree methodsfor finding active objectsrdquo Advances in Space Research vol 41no 12 pp 1955ndash1959 2008

[23] KM A Kumar N Rajasimha M Reddy A Rajanarayana andK Nadgir ldquoAnalysis of usersrsquo Sentiments from Kannada WebDocumentsrdquo in Proceedings of the 11th International Conferenceon Communication Networks ICCN 2015 pp 247ndash256 August2015

[24] C ZhongW Pedrycz DWang L Li and Z Li ldquoGranular dataimputation A framework of Granular ComputingrdquoApplied SoftComputing vol 46 pp 307ndash316 2016

[25] C Genolini A Lacombe R Ecochard and F Subtil ldquoCopy-Mean A new method to predict monotone missing values inlongitudinal studiesrdquo Computer Methods and Programs in Bio-medicine vol 132 pp 29ndash44 2016

[26] M Majidpour C Qiu P Chu H R Pota and R GadhldquoForecasting the EV charging load based on customer profileor station measurementrdquo Applied Energy vol 163 pp 134ndash1412016

[27] O Cokluk and M Kayri ldquoThe effects of methods of imputationfor missing values on the validity and reliability of scalesrdquo inEducational Sciences Theory and Practice vol 11 pp 303ndash3092011

[28] S A Bashir and S W Duffy ldquoThe correction of risk estimatesformeasurement errorrdquoAnnals of Epidemiology vol 7 no 2 pp154ndash164 1997

[29] A Ozturk S Kayalıgil and N E Ozdemirel ldquoManufacturinglead time estimation using data miningrdquo European Journal ofOperational Research vol 173 no 2 pp 683ndash700 2006

[30] H Vathsala and S G Koolagudi ldquoClosed Item-Set Mining forPrediction of Indian SummerMonsoon Rainfall ADataMiningModel with Land and Ocean Variables as Predictorsrdquo in Pro-ceedings of the 11th International Conference on CommunicationNetworks ICCN 2015 pp 271ndash280 August 2015

[31] S Banerjee A Anura J Chakrabarty S Sengupta and J Chat-terjee ldquoIdentification and functional assessment of novel genesets towards better understanding of dysplasia associated oralcarcinogenesisrdquo Gene Reports vol 4 pp 131ndash138 2016

[32] H Yahyaoui and A Al-Mutairi ldquoA feature-based trust sequenceclassification algorithmrdquo Information Sciences vol 328 pp 455ndash484 2016

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

8 Computational Intelligence and Neuroscience

Table 6 The results of compare forecasting models under percentage spilt (dataset partition into 66 training data and 34 testing data)after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0033 0638a 0729a 0251 0545a

MAE 0182a 0121a 0111a 0199 0135a

RMSE 0229 0176a 0156a 0287 0212a

RAE 0992a 0657a 0602a 1083 0736a

RRSE 1007 0775a 0688a 1262 0935a

Series mean

CC 0107a 0661a 0739a 0242a 0551MAE 0172a 0107a 0101a 0179a 0129RMSE 0221a 0167a 0151a 0268a 0205RAE 0988a 0615a 0579a 1027a 0740RRSE 0995a 0753a 0678a 1208a 0923

Linear

CC 0105a 0666a 0735a 0258a 0596a

MAE 0173a 0106a 0100a 0175a 0120a

RMSE 0221a 0166a 0151a 0266a 0196a

RAE 0987a 0606a 0572a 1002a 0683a

RRSE 0995a 0748a 0681a 1198a 0883a

Median of nearby points

CC 0106a 0666a 0740a 0264a 0553MAE 0173a 0107a 0100a 0177a 0127RMSE 0221a 0166a 0151a 0266a 0207RAE 0987a 0611a 0571a 1013a 0723RRSE 0995a 0747a 0677a 1195a 0932

Mean of nearby points

CC 01059a 0667a 0745ab 0249a 0540a

MAE 0173a 0107a 0099ab 0179a 0129a

RMSE 0221a 0166a 0149ab 0268a 0214a

RAE 0987a 0611a 0565ab 1025a 0735a

RRSE 0995a 0747a 0672ab 1207a 0962a

Regression

CC 0107a 0663a 0739a 0242a 0559a

MAE 0172a 0106a 0101a 0179a 0126a

RMSE 0221a 0167a 0151a 0268a 0200a

RAE 0987a 0610a 0581a 1027a 0723a

RRSE 0994a 0752a 0678a 1207a 0900a

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

forecasting performance The Random Forest modelas applied to variable selection with full variables isbetter than the listing model

43 Findings After variable selection and model buildingsome key findings can be highlighted

(1) Imputation after the two collected datasets wereconcatenated into an integrated dataset there aremissing values in the integrated dataset due to humanerror or mechanical failure Tables 3 and 4 show theintegrated dataset that uses the median of nearbypoints series mean mean of nearby points linearregression imputation and the delete strategy to eval-uate their accuracy via five machine learning forecastmodels The results show that the integrated datasetthat uses the mean of the nearby pointsrsquo imputationmethod has better forecasting performance

(2) Variable selection this study uses factor analysis torank the ordering of variables and then sequen-tially deletes the least important variables until theforecasting performance no longer improves Tables6 and 7 show that the variable selection couldsignificantly improve the performance of all fiveforecasting models After iterative experiments andvariable selection the key remaining variables areReservoir IN Temperature Reservoir OUT Pres-sure Rainfall Rainfall Dasi and Relative Humidity

(3) Forecasting model this study proposed a time-seriesforecastingmodel based on estimatingmissing valuesand variable selection to forecast the water level inthe reservoirThese experimental results indicate thatthe Random Forest forecasting model when appliedto variable selection with full variables has betterforecasting performance than the listing models in

Computational Intelligence and Neuroscience 9

Table 7 The results of compare forecasting models under 10-folds cross-validation after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0103a 0665a 0737a 0233 0529a

MAE 0181a 0115a 0108a 0193a 0143a

RMSE 0226a 0171a 0154a 0282a 0223a

RAE 0984a 0627a 0589a 1047a 0774a

RRSE 0994a 0749a 0677a 1238 0977a

Series mean

CC 0081a 0688a 0751a 0295a 0547MAE 0169a 0103a 0098a 0170a 0131RMSE 0217a 0158a 0144 0260a 0209RAE 0988a 0600a 0571a 0990a 0767RRSE 0996a 0727a 0661 1193a 0960

Linear

CC 0081a 0692a 0750 0286a 0551a

MAE 0171a 0102a 0098a 0169a 0128RMSE 0218a 0158a 0145 0261a 0207a

RAE 0988a 0590a 0566a 0981a 0740RRSE 0996a 0723a 0662 1196a 0948a

Median of nearby points

CC 0083a 0692a 0752a 0305a 0555a

MAE 0171a 0102a 0097ab 0169a 0126a

RMSE 0218a 0158a 0144a 0259a 0208a

RAE 0987a 0593a 0563ab 0980a 0732a

RRSE 0996a 0722a 0660a 1186a 0951a

Mean of nearby points

CC 0082a 0694a 0753b 0276a 0537MAE 0171a 0102a 0097ab 0171a 0129RMSE 0218a 0157a 0144ab 0263a 0210RAE 0988a 0593a 0564a 0993a 0747RRSE 0996a 0721a 0659b 1204a 0960

Regression

CC 0081a 0690a 0753b 0295a 0572MAE 0169a 0102a 0098a 0169a 0126RMSE 0217a 0158a 0144ab 0259a 0204RAE 0988a 0595a 0571a 0989a 0735RRSE 0996a 0725a 0659ab 1193a 0938

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

the five evaluation indices The proposed time-seriesforecasting model is feasible for forecasting waterlevels in Shimen Reservoir

5 Conclusion

This study proposed a time-series forecastingmodel for waterlevel forecasting in Taiwanrsquos Shimen Reservoir The experi-ments showed that the mean of nearby pointsrsquo imputationmethod has the best performanceThese experimental resultsindicate that the Random Forest forecasting model whenapplied to variable selection with full variables has betterforecasting performance than the listing model The keyvariables are Reservoir IN Temperature Reservoir OUTPressure Rainfall Rainfall Dasi and Relative Humidity Theproposed time-series forecasting model withwithout vari-able selection has better forecasting performance than thelisting models using the five evaluation indices This showsthat the proposed time-series forecastingmodel is feasible for

forecastingwater levels in ShimenReservoir Futureworkwilladdress the following issues

(1) The reservoirrsquos utility includes irrigation domesticwater supply and electricity generation The keyvariables identified here could improve forecasting inthese fields

(2) We might apply the proposed time-series forecastingmodel based on imputation and variable selection toforecast the water level of lakes salt water bodiesreservoirs and so on

In addition this experiment shows that the proposed variableselection can help determine five forecast methods used hereto improve the forecasting capability

Conflicts of Interest

The authors declare that they have no conflicts of interest

10 Computational Intelligence and Neuroscience

References

[1] F-J Chang and Y-T Chang ldquoAdaptive neuro-fuzzy inferencesystem for prediction of water level in reservoirrdquo Advances inWater Resources vol 29 no 1 pp 1ndash10 2006

[2] M K Tiwari and C Chatterjee ldquoDevelopment of an accurateand reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approachrdquo Journal ofHydrology vol 394 no 3-4 pp 458ndash470 2010

[3] C-S Chen Y-D Jhong T-Y Wu and S-T Chen ldquoTyphoonevent-based evolutionary fuzzy inference model for flood stageforecastingrdquo Journal of Hydrology vol 490 pp 134ndash143 2013

[4] F-J Chang P-A Chen Y-R Lu E Huang and K-Y ChangldquoReal-time multi-step-ahead water level forecasting by recur-rent neural networks for urban flood controlrdquo Journal ofHydrology vol 517 pp 836ndash846 2014

[5] A Hipni A El-shafie A Najah O A Karim A Hussain andM Mukhlisin ldquoDaily Forecasting of Dam Water Levels Com-paring a Support Vector Machine (SVM)Model With AdaptiveNeuro Fuzzy Inference System (ANFIS)rdquo Water ResourcesManagement vol 27 no 10 pp 3803ndash3823 2013

[6] Y Seo S Kim O Kisi and V P Singh ldquoDaily water level fore-casting using wavelet decomposition and artificial intelligencetechniquesrdquo Journal of Hydrology vol 520 pp 224ndash243 2015

[7] D S Broomhead and D Lowe ldquoMultivariable functional inter-polation and adaptive networksrdquo Complex Systems vol 2 no 3pp 321ndash355 1988

[8] K Ganapathy V Vaidehi and J B Chandrasekar ldquoOptimumsteepest descent higher level learning radial basis functionnetworkrdquo Expert Systems with Applications vol 42 no 21 pp8064ndash8077 2015

[9] P Brandstetter and M Kuchar ldquoSensorless control of variablespeed induction motor drive using RBF neural networkrdquoJournal of Applied Logic vol 24 no part A pp 97ndash108 2017

[10] C M Anish and B Majhi ldquoHybrid nonlinear adaptive schemefor stock market prediction using feedback FLANN and factoranalysisrdquo Journal of the Korean Statistical Society vol 45 no 1pp 64ndash76 2016

[11] M Pottmann and H P Jorgl ldquoRadial basis function networksfor internal model controlrdquo Applied Mathematics and Compu-tation vol 70 no 2-3 pp 283ndash298 1995

[12] I Morlini ldquoRadial basis function networks with partiallyclassified datardquo Ecological Modelling vol 120 no 2-3 pp 109ndash118 1999

[13] A Nawahda ldquoAn assessment of adding value of traffic informa-tion and other attributes as part of its classifiers in a dataminingtool set for predicting surface ozone levelsrdquo Process Safety andEnvironmental Protection vol 99 pp 149ndash158 2016

[14] N R Kasat and S D Thepade ldquoNovel Content Based ImageClassificationMethod Using LBGVector QuantizationMethodwith Bayes and Lazy Family Data Mining Classifiersrdquo in Pro-ceedings of the 7th International Conference on CommunicationComputing and Virtualization ICCCV 2016 pp 483ndash489February 2016

[15] E Męzyk and O Unold ldquoMachine learning approach to modelsport trainingrdquoComputers in Human Behavior vol 27 no 5 pp1499ndash1506 2011

[16] J Abawajy A Kelarev M Chowdhury A Stranieri and H FJelinek ldquoPredicting cardiac autonomic neuropathy category fordiabetic data with missing valuesrdquo Computers in Biology andMedicine vol 43 no 10 pp 1328ndash1333 2013

[17] M A Ayu S A Ismail A F A Matin and T Mantoro ldquoAcomparison study of classifier algorithms for mobile-phonersquosaccelerometer based activity recognitionrdquoProcedia Engineeringvol 41 pp 224ndash229 2012

[18] A Tamvakis V Trygonis J Miritzis G Tsirtsis and S SpatharisldquoOptimizing biodiversity prediction from abiotic parametersrdquoEnvironmentalModeling and Software vol 53 pp 112ndash120 2014

[19] C Brokamp R Jandarov M B Rao G LeMasters and PRyan ldquoExposure assessment models for elemental componentsof particulate matter in an urban environment a comparisonof regression and random forest approachesrdquo AtmosphericEnvironment vol 151 pp 1ndash11 2017

[20] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[21] P Yıldırım ldquoPattern Classification with Imbalanced and Mul-ticlass Data for the Prediction of Albendazole Adverse EventOutcomesrdquo in Proceedings of the 7th International Conference onAmbient Systems Networks and Technologies ANT 2016 and the6th International Conference on Sustainable Energy InformationTechnology SEIT 2016 pp 1013ndash1018 May 2016

[22] Y Zhao and Y Zhang ldquoComparison of decision tree methodsfor finding active objectsrdquo Advances in Space Research vol 41no 12 pp 1955ndash1959 2008

[23] KM A Kumar N Rajasimha M Reddy A Rajanarayana andK Nadgir ldquoAnalysis of usersrsquo Sentiments from Kannada WebDocumentsrdquo in Proceedings of the 11th International Conferenceon Communication Networks ICCN 2015 pp 247ndash256 August2015

[24] C ZhongW Pedrycz DWang L Li and Z Li ldquoGranular dataimputation A framework of Granular ComputingrdquoApplied SoftComputing vol 46 pp 307ndash316 2016

[25] C Genolini A Lacombe R Ecochard and F Subtil ldquoCopy-Mean A new method to predict monotone missing values inlongitudinal studiesrdquo Computer Methods and Programs in Bio-medicine vol 132 pp 29ndash44 2016

[26] M Majidpour C Qiu P Chu H R Pota and R GadhldquoForecasting the EV charging load based on customer profileor station measurementrdquo Applied Energy vol 163 pp 134ndash1412016

[27] O Cokluk and M Kayri ldquoThe effects of methods of imputationfor missing values on the validity and reliability of scalesrdquo inEducational Sciences Theory and Practice vol 11 pp 303ndash3092011

[28] S A Bashir and S W Duffy ldquoThe correction of risk estimatesformeasurement errorrdquoAnnals of Epidemiology vol 7 no 2 pp154ndash164 1997

[29] A Ozturk S Kayalıgil and N E Ozdemirel ldquoManufacturinglead time estimation using data miningrdquo European Journal ofOperational Research vol 173 no 2 pp 683ndash700 2006

[30] H Vathsala and S G Koolagudi ldquoClosed Item-Set Mining forPrediction of Indian SummerMonsoon Rainfall ADataMiningModel with Land and Ocean Variables as Predictorsrdquo in Pro-ceedings of the 11th International Conference on CommunicationNetworks ICCN 2015 pp 271ndash280 August 2015

[31] S Banerjee A Anura J Chakrabarty S Sengupta and J Chat-terjee ldquoIdentification and functional assessment of novel genesets towards better understanding of dysplasia associated oralcarcinogenesisrdquo Gene Reports vol 4 pp 131ndash138 2016

[32] H Yahyaoui and A Al-Mutairi ldquoA feature-based trust sequenceclassification algorithmrdquo Information Sciences vol 328 pp 455ndash484 2016

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

Computational Intelligence and Neuroscience 9

Table 7 The results of compare forecasting models under 10-folds cross-validation after variable selection

Methods Index RBF Network Kstar Random Forest IBK Random Tree

After variable selection

Delete the rows with missing data

CC 0103a 0665a 0737a 0233 0529a

MAE 0181a 0115a 0108a 0193a 0143a

RMSE 0226a 0171a 0154a 0282a 0223a

RAE 0984a 0627a 0589a 1047a 0774a

RRSE 0994a 0749a 0677a 1238 0977a

Series mean

CC 0081a 0688a 0751a 0295a 0547MAE 0169a 0103a 0098a 0170a 0131RMSE 0217a 0158a 0144 0260a 0209RAE 0988a 0600a 0571a 0990a 0767RRSE 0996a 0727a 0661 1193a 0960

Linear

CC 0081a 0692a 0750 0286a 0551a

MAE 0171a 0102a 0098a 0169a 0128RMSE 0218a 0158a 0145 0261a 0207a

RAE 0988a 0590a 0566a 0981a 0740RRSE 0996a 0723a 0662 1196a 0948a

Median of nearby points

CC 0083a 0692a 0752a 0305a 0555a

MAE 0171a 0102a 0097ab 0169a 0126a

RMSE 0218a 0158a 0144a 0259a 0208a

RAE 0987a 0593a 0563ab 0980a 0732a

RRSE 0996a 0722a 0660a 1186a 0951a

Mean of nearby points

CC 0082a 0694a 0753b 0276a 0537MAE 0171a 0102a 0097ab 0171a 0129RMSE 0218a 0157a 0144ab 0263a 0210RAE 0988a 0593a 0564a 0993a 0747RRSE 0996a 0721a 0659b 1204a 0960

Regression

CC 0081a 0690a 0753b 0295a 0572MAE 0169a 0102a 0098a 0169a 0126RMSE 0217a 0158a 0144ab 0259a 0204RAE 0988a 0595a 0571a 0989a 0735RRSE 0996a 0725a 0659ab 1193a 0938

a denotes after variable selection with enhancing performance b denotes the best performance among 5 models after variable selection

the five evaluation indices The proposed time-seriesforecasting model is feasible for forecasting waterlevels in Shimen Reservoir

5 Conclusion

This study proposed a time-series forecastingmodel for waterlevel forecasting in Taiwanrsquos Shimen Reservoir The experi-ments showed that the mean of nearby pointsrsquo imputationmethod has the best performanceThese experimental resultsindicate that the Random Forest forecasting model whenapplied to variable selection with full variables has betterforecasting performance than the listing model The keyvariables are Reservoir IN Temperature Reservoir OUTPressure Rainfall Rainfall Dasi and Relative Humidity Theproposed time-series forecasting model withwithout vari-able selection has better forecasting performance than thelisting models using the five evaluation indices This showsthat the proposed time-series forecastingmodel is feasible for

forecastingwater levels in ShimenReservoir Futureworkwilladdress the following issues

(1) The reservoirrsquos utility includes irrigation domesticwater supply and electricity generation The keyvariables identified here could improve forecasting inthese fields

(2) We might apply the proposed time-series forecastingmodel based on imputation and variable selection toforecast the water level of lakes salt water bodiesreservoirs and so on

In addition this experiment shows that the proposed variableselection can help determine five forecast methods used hereto improve the forecasting capability

Conflicts of Interest

The authors declare that they have no conflicts of interest

10 Computational Intelligence and Neuroscience

References

[1] F-J Chang and Y-T Chang ldquoAdaptive neuro-fuzzy inferencesystem for prediction of water level in reservoirrdquo Advances inWater Resources vol 29 no 1 pp 1ndash10 2006

[2] M K Tiwari and C Chatterjee ldquoDevelopment of an accurateand reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approachrdquo Journal ofHydrology vol 394 no 3-4 pp 458ndash470 2010

[3] C-S Chen Y-D Jhong T-Y Wu and S-T Chen ldquoTyphoonevent-based evolutionary fuzzy inference model for flood stageforecastingrdquo Journal of Hydrology vol 490 pp 134ndash143 2013

[4] F-J Chang P-A Chen Y-R Lu E Huang and K-Y ChangldquoReal-time multi-step-ahead water level forecasting by recur-rent neural networks for urban flood controlrdquo Journal ofHydrology vol 517 pp 836ndash846 2014

[5] A Hipni A El-shafie A Najah O A Karim A Hussain andM Mukhlisin ldquoDaily Forecasting of Dam Water Levels Com-paring a Support Vector Machine (SVM)Model With AdaptiveNeuro Fuzzy Inference System (ANFIS)rdquo Water ResourcesManagement vol 27 no 10 pp 3803ndash3823 2013

[6] Y Seo S Kim O Kisi and V P Singh ldquoDaily water level fore-casting using wavelet decomposition and artificial intelligencetechniquesrdquo Journal of Hydrology vol 520 pp 224ndash243 2015

[7] D S Broomhead and D Lowe ldquoMultivariable functional inter-polation and adaptive networksrdquo Complex Systems vol 2 no 3pp 321ndash355 1988

[8] K Ganapathy V Vaidehi and J B Chandrasekar ldquoOptimumsteepest descent higher level learning radial basis functionnetworkrdquo Expert Systems with Applications vol 42 no 21 pp8064ndash8077 2015

[9] P Brandstetter and M Kuchar ldquoSensorless control of variablespeed induction motor drive using RBF neural networkrdquoJournal of Applied Logic vol 24 no part A pp 97ndash108 2017

[10] C M Anish and B Majhi ldquoHybrid nonlinear adaptive schemefor stock market prediction using feedback FLANN and factoranalysisrdquo Journal of the Korean Statistical Society vol 45 no 1pp 64ndash76 2016

[11] M Pottmann and H P Jorgl ldquoRadial basis function networksfor internal model controlrdquo Applied Mathematics and Compu-tation vol 70 no 2-3 pp 283ndash298 1995

[12] I Morlini ldquoRadial basis function networks with partiallyclassified datardquo Ecological Modelling vol 120 no 2-3 pp 109ndash118 1999

[13] A Nawahda ldquoAn assessment of adding value of traffic informa-tion and other attributes as part of its classifiers in a dataminingtool set for predicting surface ozone levelsrdquo Process Safety andEnvironmental Protection vol 99 pp 149ndash158 2016

[14] N R Kasat and S D Thepade ldquoNovel Content Based ImageClassificationMethod Using LBGVector QuantizationMethodwith Bayes and Lazy Family Data Mining Classifiersrdquo in Pro-ceedings of the 7th International Conference on CommunicationComputing and Virtualization ICCCV 2016 pp 483ndash489February 2016

[15] E Męzyk and O Unold ldquoMachine learning approach to modelsport trainingrdquoComputers in Human Behavior vol 27 no 5 pp1499ndash1506 2011

[16] J Abawajy A Kelarev M Chowdhury A Stranieri and H FJelinek ldquoPredicting cardiac autonomic neuropathy category fordiabetic data with missing valuesrdquo Computers in Biology andMedicine vol 43 no 10 pp 1328ndash1333 2013

[17] M A Ayu S A Ismail A F A Matin and T Mantoro ldquoAcomparison study of classifier algorithms for mobile-phonersquosaccelerometer based activity recognitionrdquoProcedia Engineeringvol 41 pp 224ndash229 2012

[18] A Tamvakis V Trygonis J Miritzis G Tsirtsis and S SpatharisldquoOptimizing biodiversity prediction from abiotic parametersrdquoEnvironmentalModeling and Software vol 53 pp 112ndash120 2014

[19] C Brokamp R Jandarov M B Rao G LeMasters and PRyan ldquoExposure assessment models for elemental componentsof particulate matter in an urban environment a comparisonof regression and random forest approachesrdquo AtmosphericEnvironment vol 151 pp 1ndash11 2017

[20] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[21] P Yıldırım ldquoPattern Classification with Imbalanced and Mul-ticlass Data for the Prediction of Albendazole Adverse EventOutcomesrdquo in Proceedings of the 7th International Conference onAmbient Systems Networks and Technologies ANT 2016 and the6th International Conference on Sustainable Energy InformationTechnology SEIT 2016 pp 1013ndash1018 May 2016

[22] Y Zhao and Y Zhang ldquoComparison of decision tree methodsfor finding active objectsrdquo Advances in Space Research vol 41no 12 pp 1955ndash1959 2008

[23] KM A Kumar N Rajasimha M Reddy A Rajanarayana andK Nadgir ldquoAnalysis of usersrsquo Sentiments from Kannada WebDocumentsrdquo in Proceedings of the 11th International Conferenceon Communication Networks ICCN 2015 pp 247ndash256 August2015

[24] C ZhongW Pedrycz DWang L Li and Z Li ldquoGranular dataimputation A framework of Granular ComputingrdquoApplied SoftComputing vol 46 pp 307ndash316 2016

[25] C Genolini A Lacombe R Ecochard and F Subtil ldquoCopy-Mean A new method to predict monotone missing values inlongitudinal studiesrdquo Computer Methods and Programs in Bio-medicine vol 132 pp 29ndash44 2016

[26] M Majidpour C Qiu P Chu H R Pota and R GadhldquoForecasting the EV charging load based on customer profileor station measurementrdquo Applied Energy vol 163 pp 134ndash1412016

[27] O Cokluk and M Kayri ldquoThe effects of methods of imputationfor missing values on the validity and reliability of scalesrdquo inEducational Sciences Theory and Practice vol 11 pp 303ndash3092011

[28] S A Bashir and S W Duffy ldquoThe correction of risk estimatesformeasurement errorrdquoAnnals of Epidemiology vol 7 no 2 pp154ndash164 1997

[29] A Ozturk S Kayalıgil and N E Ozdemirel ldquoManufacturinglead time estimation using data miningrdquo European Journal ofOperational Research vol 173 no 2 pp 683ndash700 2006

[30] H Vathsala and S G Koolagudi ldquoClosed Item-Set Mining forPrediction of Indian SummerMonsoon Rainfall ADataMiningModel with Land and Ocean Variables as Predictorsrdquo in Pro-ceedings of the 11th International Conference on CommunicationNetworks ICCN 2015 pp 271ndash280 August 2015

[31] S Banerjee A Anura J Chakrabarty S Sengupta and J Chat-terjee ldquoIdentification and functional assessment of novel genesets towards better understanding of dysplasia associated oralcarcinogenesisrdquo Gene Reports vol 4 pp 131ndash138 2016

[32] H Yahyaoui and A Al-Mutairi ldquoA feature-based trust sequenceclassification algorithmrdquo Information Sciences vol 328 pp 455ndash484 2016

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

10 Computational Intelligence and Neuroscience

References

[1] F-J Chang and Y-T Chang ldquoAdaptive neuro-fuzzy inferencesystem for prediction of water level in reservoirrdquo Advances inWater Resources vol 29 no 1 pp 1ndash10 2006

[2] M K Tiwari and C Chatterjee ldquoDevelopment of an accurateand reliable hourly flood forecasting model using wavelet-bootstrap-ANN (WBANN) hybrid approachrdquo Journal ofHydrology vol 394 no 3-4 pp 458ndash470 2010

[3] C-S Chen Y-D Jhong T-Y Wu and S-T Chen ldquoTyphoonevent-based evolutionary fuzzy inference model for flood stageforecastingrdquo Journal of Hydrology vol 490 pp 134ndash143 2013

[4] F-J Chang P-A Chen Y-R Lu E Huang and K-Y ChangldquoReal-time multi-step-ahead water level forecasting by recur-rent neural networks for urban flood controlrdquo Journal ofHydrology vol 517 pp 836ndash846 2014

[5] A Hipni A El-shafie A Najah O A Karim A Hussain andM Mukhlisin ldquoDaily Forecasting of Dam Water Levels Com-paring a Support Vector Machine (SVM)Model With AdaptiveNeuro Fuzzy Inference System (ANFIS)rdquo Water ResourcesManagement vol 27 no 10 pp 3803ndash3823 2013

[6] Y Seo S Kim O Kisi and V P Singh ldquoDaily water level fore-casting using wavelet decomposition and artificial intelligencetechniquesrdquo Journal of Hydrology vol 520 pp 224ndash243 2015

[7] D S Broomhead and D Lowe ldquoMultivariable functional inter-polation and adaptive networksrdquo Complex Systems vol 2 no 3pp 321ndash355 1988

[8] K Ganapathy V Vaidehi and J B Chandrasekar ldquoOptimumsteepest descent higher level learning radial basis functionnetworkrdquo Expert Systems with Applications vol 42 no 21 pp8064ndash8077 2015

[9] P Brandstetter and M Kuchar ldquoSensorless control of variablespeed induction motor drive using RBF neural networkrdquoJournal of Applied Logic vol 24 no part A pp 97ndash108 2017

[10] C M Anish and B Majhi ldquoHybrid nonlinear adaptive schemefor stock market prediction using feedback FLANN and factoranalysisrdquo Journal of the Korean Statistical Society vol 45 no 1pp 64ndash76 2016

[11] M Pottmann and H P Jorgl ldquoRadial basis function networksfor internal model controlrdquo Applied Mathematics and Compu-tation vol 70 no 2-3 pp 283ndash298 1995

[12] I Morlini ldquoRadial basis function networks with partiallyclassified datardquo Ecological Modelling vol 120 no 2-3 pp 109ndash118 1999

[13] A Nawahda ldquoAn assessment of adding value of traffic informa-tion and other attributes as part of its classifiers in a dataminingtool set for predicting surface ozone levelsrdquo Process Safety andEnvironmental Protection vol 99 pp 149ndash158 2016

[14] N R Kasat and S D Thepade ldquoNovel Content Based ImageClassificationMethod Using LBGVector QuantizationMethodwith Bayes and Lazy Family Data Mining Classifiersrdquo in Pro-ceedings of the 7th International Conference on CommunicationComputing and Virtualization ICCCV 2016 pp 483ndash489February 2016

[15] E Męzyk and O Unold ldquoMachine learning approach to modelsport trainingrdquoComputers in Human Behavior vol 27 no 5 pp1499ndash1506 2011

[16] J Abawajy A Kelarev M Chowdhury A Stranieri and H FJelinek ldquoPredicting cardiac autonomic neuropathy category fordiabetic data with missing valuesrdquo Computers in Biology andMedicine vol 43 no 10 pp 1328ndash1333 2013

[17] M A Ayu S A Ismail A F A Matin and T Mantoro ldquoAcomparison study of classifier algorithms for mobile-phonersquosaccelerometer based activity recognitionrdquoProcedia Engineeringvol 41 pp 224ndash229 2012

[18] A Tamvakis V Trygonis J Miritzis G Tsirtsis and S SpatharisldquoOptimizing biodiversity prediction from abiotic parametersrdquoEnvironmentalModeling and Software vol 53 pp 112ndash120 2014

[19] C Brokamp R Jandarov M B Rao G LeMasters and PRyan ldquoExposure assessment models for elemental componentsof particulate matter in an urban environment a comparisonof regression and random forest approachesrdquo AtmosphericEnvironment vol 151 pp 1ndash11 2017

[20] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

[21] P Yıldırım ldquoPattern Classification with Imbalanced and Mul-ticlass Data for the Prediction of Albendazole Adverse EventOutcomesrdquo in Proceedings of the 7th International Conference onAmbient Systems Networks and Technologies ANT 2016 and the6th International Conference on Sustainable Energy InformationTechnology SEIT 2016 pp 1013ndash1018 May 2016

[22] Y Zhao and Y Zhang ldquoComparison of decision tree methodsfor finding active objectsrdquo Advances in Space Research vol 41no 12 pp 1955ndash1959 2008

[23] KM A Kumar N Rajasimha M Reddy A Rajanarayana andK Nadgir ldquoAnalysis of usersrsquo Sentiments from Kannada WebDocumentsrdquo in Proceedings of the 11th International Conferenceon Communication Networks ICCN 2015 pp 247ndash256 August2015

[24] C ZhongW Pedrycz DWang L Li and Z Li ldquoGranular dataimputation A framework of Granular ComputingrdquoApplied SoftComputing vol 46 pp 307ndash316 2016

[25] C Genolini A Lacombe R Ecochard and F Subtil ldquoCopy-Mean A new method to predict monotone missing values inlongitudinal studiesrdquo Computer Methods and Programs in Bio-medicine vol 132 pp 29ndash44 2016

[26] M Majidpour C Qiu P Chu H R Pota and R GadhldquoForecasting the EV charging load based on customer profileor station measurementrdquo Applied Energy vol 163 pp 134ndash1412016

[27] O Cokluk and M Kayri ldquoThe effects of methods of imputationfor missing values on the validity and reliability of scalesrdquo inEducational Sciences Theory and Practice vol 11 pp 303ndash3092011

[28] S A Bashir and S W Duffy ldquoThe correction of risk estimatesformeasurement errorrdquoAnnals of Epidemiology vol 7 no 2 pp154ndash164 1997

[29] A Ozturk S Kayalıgil and N E Ozdemirel ldquoManufacturinglead time estimation using data miningrdquo European Journal ofOperational Research vol 173 no 2 pp 683ndash700 2006

[30] H Vathsala and S G Koolagudi ldquoClosed Item-Set Mining forPrediction of Indian SummerMonsoon Rainfall ADataMiningModel with Land and Ocean Variables as Predictorsrdquo in Pro-ceedings of the 11th International Conference on CommunicationNetworks ICCN 2015 pp 271ndash280 August 2015

[31] S Banerjee A Anura J Chakrabarty S Sengupta and J Chat-terjee ldquoIdentification and functional assessment of novel genesets towards better understanding of dysplasia associated oralcarcinogenesisrdquo Gene Reports vol 4 pp 131ndash138 2016

[32] H Yahyaoui and A Al-Mutairi ldquoA feature-based trust sequenceclassification algorithmrdquo Information Sciences vol 328 pp 455ndash484 2016

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

Computational Intelligence and Neuroscience 11

[33] H B Borges and J C Nievola ldquoComparing the dimensionalityreduction methods in gene expression databasesrdquo Expert Sys-tems with Applications vol 39 no 12 pp 10780ndash10795 2012

[34] A Wang N An G Chen L Li and G Alterovitz ldquoAccelerat-ing wrapper-based feature selection with K-nearest-neighborrdquoKnowledge-Based Systems vol 83 no 1 pp 81ndash91 2015

[35] S-W Lin Z-J Lee S-C Chen and T-Y Tseng ldquoParameterdetermination of support vector machine and feature selectionusing simulated annealing approachrdquo Applied Soft Computingvol 8 no 4 pp 1505ndash1512 2008

[36] O Uncu and I B Turksen ldquoA novel feature selection approachcombining feature wrappers and filtersrdquo Information Sciencesvol 177 no 2 pp 449ndash466 2007

[37] R E Anderson R L Tatham and W C Black MultivariateData Analysis 1998

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: A Time-Series Water Level Forecasting Model Based on ...downloads.hindawi.com/journals/cin/2017/8734214.pdf · Random Forest. A Random Forest can be applied for classification,regression,andunsupervisedlearning[19].Itis

Submit your manuscripts athttpswwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014