14
Methods to improve neural network performance in daily flows prediction C.L. Wu, K.W. Chau * , Y.S. Li Dept. of Civil and Structural Engineering, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, People’s Republic of China article info Article history: Received 2 April 2008 Received in revised form 25 February 2009 Accepted 30 March 2009 This manuscript was handled by K. Georgakakos, Editor-in-Chief, with the assistance of Taha Ouarda, Associate Editor Keywords: Daily flows prediction Artificial neural network Lagged prediction Moving average Singular spectral analysis Wavelet multi-resolution analysis summary In this paper, three data-preprocessing techniques, moving average (MA), singular spectrum analysis (SSA), and wavelet multi-resolution analysis (WMRA), were coupled with artificial neural network (ANN) to improve the estimate of daily flows. Six models, including the original ANN model without data preprocessing, were set up and evaluated. Five new models were ANN-MA, ANN-SSA1, ANN-SSA2, ANN- WMRA1, and ANN-WMRA2. The ANN-MA was derived from the raw ANN model combined with the MA. The ANN-SSA1, ANN-SSA2, ANN-WMRA1 and ANN-WMRA2 were generated by using the original ANN model coupled with SSA and WMRA in terms of two different means. Two daily flow series from different watersheds in China (Lushui and Daning) were used in six models for three prediction horizons (i.e., 1-, 2-, and 3-day-ahead forecast). The poor performance on ANN forecast models was mainly due to the exis- tence of the lagged prediction. The ANN-MA, among six models, performed best and eradicated the lag effect. The performances from the ANN-SSA1 and ANN-SSA2 were similar, and the performances from the ANN-WMRA1 and ANN-WMRA2 were also similar. However, the models based on the SSA presented better performance than the models based on the WMRA at all forecast horizons, which meant that the SSA is more effective than the WMRA in improving the ANN performance in the current study. Based on an overall consideration including the model performance and the complexity of modeling, the ANN-MA model was optimal, then the ANN model coupled with SSA, and finally the ANN model coupled with WMRA. Ó 2009 Elsevier B.V. All rights reserved. Introduction Artificial neural networks (ANNs) have gained significant atten- tion in past two decades and been widely used for hydrological forecasting. The ASCE (2000) and Dawson and Wilby (2001) give good state-of-the-art reviews on ANN modeling in hydrology. Many studies focused on streamflow predictions have proven that ANN is superior to traditional regression techniques and time- series models including autoregressive (AR) and autoregressive moving average (ARMA) (Raman and Sunilkumar, 1995; Jain et al., 1999; Thirumalaiah and Deo, 2000; Abrahart and See, 2002; Castellano-Méndeza et al., 2004; Kis ßi, 2003, 2005). Besides, ANN is also compared with nonlinear prediction (NLP) method which is derived from the chaotic time series (Farmer and Sidoro- wich, 1987). Laio et al. (2003) carried out a comparison of ANN and NLP for flood predictions and found that ANN performed slightly better at long forecast time while the situation was reversed for shorter time. Sivakumar et al. (2002) found that ANN was worse than NLP in short-term river flow prediction. The ANN is able to capture the dynamics of the flow series by using previously observed flow values as inputs during the fore- casting of daily flows from the flow data alone. As a consequence, the high autocorrelation of the flow data often introduce the lagged predictions for the ANN model. The issue of lagged predictions in the ANN model has been mentioned by some researchers (Dawson and Wilby, 1999; Jain and Srinivasulu, 2004; de Vos and Rientjes, 2005; Muttil and Chau, 2006). De Vos and Rientjes (2005) sug- gested that an effective solution to the forecasting lag effect is to obtain new model inputs by moving average (MA) over the original discharge data. As known, a natural flow series can be viewed as a quasi-peri- odic signal, which is contaminated by various noises at different flow levels. Cleaner signals used as model inputs will improve the model performance. Therefore, signal decomposition tech- niques for the purpose of data-preprocessing may be favorable. Two such techniques are known as singular spectral analysis (SSA) and wavelet multi-resolution analysis (WMRA). Briefly, the SSA decomposes a time series into a number of components with simpler structures, such as a slowly varying trend, oscillations and noise. The SSA uses the basis functions characterized by data-adaptive nature, which makes the approach suitable for the analysis of some nonlinear dynamics (Elsner and Tsonis, 1997). A time series in the WMRA breaks down into a series of linearly inde- pendent detail signals and one approximation signal by using dis- crete wavelet transform with a specific wavelet function such as 0022-1694/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jhydrol.2009.03.038 * Corresponding author. Tel.: +852 27666014. E-mail address: [email protected] (K.W. Chau). Journal of Hydrology 372 (2009) 80–93 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Methods to Improve Neural Network Performance in Daily Flows Prediction

Embed Size (px)

DESCRIPTION

Methods to Improve Neural Network Performance in Daily Flows Prediction

Citation preview

  • or

    , Kow

    assistance of Taha Ouarda, Associate Editor

    Keywords:

    represateand. Th

    watersheds in China (Lushui and Daning) were used in six models for three prediction horizons (i.e., 1-,

    have gwidelyawson

    which is derived from the chaotic time series (Farmer and Sidoro-wich, 1987). Laio et al. (2003) carried out a comparison of ANN andNLP for ood predictions and found that ANN performed slightlybetter at long forecast time while the situation was reversed forshorter time. Sivakumar et al. (2002) found that ANN was worsethan NLP in short-term river ow prediction.

    The ANN is able to capture the dynamics of the ow series byusing previously observed ow values as inputs during the fore-

    niques for the purpose of data-preprocessing may be favorable.Two such techniques are known as singular spectral analysis(SSA) and wavelet multi-resolution analysis (WMRA). Briey, theSSA decomposes a time series into a number of components withsimpler structures, such as a slowly varying trend, oscillationsand noise. The SSA uses the basis functions characterized bydata-adaptive nature, which makes the approach suitable for theanalysis of some nonlinear dynamics (Elsner and Tsonis, 1997). Atime series in theWMRA breaks down into a series of linearly inde-pendent detail signals and one approximation signal by using dis-crete wavelet transform with a specic wavelet function such as

    * Corresponding author. Tel.: +852 27666014.

    Journal of Hydrology 372 (2009) 8093

    Contents lists availab

    H

    elsE-mail address: [email protected] (K.W. Chau).good state-of-the-art reviews on ANN modeling in hydrology.Many studies focused on streamow predictions have proven thatANN is superior to traditional regression techniques and time-series models including autoregressive (AR) and autoregressivemoving average (ARMA) (Raman and Sunilkumar, 1995; Jainet al., 1999; Thirumalaiah and Deo, 2000; Abrahart and See,2002; Castellano-Mndeza et al., 2004; Kisi, 2003, 2005). Besides,ANN is also compared with nonlinear prediction (NLP) method

    2005; Muttil and Chau, 2006). De Vos and Rientjes (2005) sug-gested that an effective solution to the forecasting lag effect is toobtain newmodel inputs by moving average (MA) over the originaldischarge data.

    As known, a natural ow series can be viewed as a quasi-peri-odic signal, which is contaminated by various noises at differentow levels. Cleaner signals used as model inputs will improvethe model performance. Therefore, signal decomposition tech-Daily ows predictionArticial neural networkLagged predictionMoving averageSingular spectral analysisWavelet multi-resolution analysis

    Introduction

    Articial neural networks (ANNs)tion in past two decades and beenforecasting. The ASCE (2000) and D0022-1694/$ - see front matter 2009 Elsevier B.V. Adoi:10.1016/j.jhydrol.2009.03.0382-, and 3-day-ahead forecast). The poor performance on ANN forecast models was mainly due to the exis-tence of the lagged prediction. The ANN-MA, among six models, performed best and eradicated the lageffect. The performances from the ANN-SSA1 and ANN-SSA2 were similar, and the performances fromthe ANN-WMRA1 and ANN-WMRA2 were also similar. However, the models based on the SSA presentedbetter performance than the models based on the WMRA at all forecast horizons, which meant that theSSA is more effective than the WMRA in improving the ANN performance in the current study. Based onan overall consideration including the model performance and the complexity of modeling, the ANN-MAmodel was optimal, then the ANN model coupled with SSA, and nally the ANN model coupled withWMRA.

    2009 Elsevier B.V. All rights reserved.

    ained signicant atten-used for hydrological

    and Wilby (2001) give

    casting of daily ows from the ow data alone. As a consequence,the high autocorrelation of the ow data often introduce the laggedpredictions for the ANN model. The issue of lagged predictions inthe ANN model has been mentioned by some researchers (Dawsonand Wilby, 1999; Jain and Srinivasulu, 2004; de Vos and Rientjes,This manuscript was handled byK. Georgakakos, Editor-in-Chief, with the

    The ANN-SSA1, ANN-SSA2, ANN-WMRA1 and ANN-WMRA2 were generated by using the original ANNmodel coupled with SSA and WMRA in terms of two different means. Two daily ow series from differentMethods to improve neural network perf

    C.L. Wu, K.W. Chau *, Y.S. LiDept. of Civil and Structural Engineering, Hong Kong Polytechnic University, Hung Hom

    a r t i c l e i n f o

    Article history:Received 2 April 2008Received in revised form 25 February 2009Accepted 30 March 2009

    s u m m a r y

    In this paper, three data-p(SSA), and wavelet multi-(ANN) to improve the estimpreprocessing, were set upWMRA1, and ANN-WMRA2

    Journal of

    journal homepage: www.ll rights reserved.mance in daily ows prediction

    loon, Hong Kong, Peoples Republic of China

    rocessing techniques, moving average (MA), singular spectrum analysisolution analysis (WMRA), were coupled with articial neural networkof daily ows. Six models, including the original ANN model without dataevaluated. Five new models were ANN-MA, ANN-SSA1, ANN-SSA2, ANN-e ANN-MA was derived from the raw ANN model combined with the MA.

    le at ScienceDirect

    ydrology

    evier .com/locate / jhydrol

  • the Haar wavelet. Mallat (1989) presented a complete theory forwavelet multi-resolution signal decomposition (also mentionedas pyramid decomposition algorithm). Moreover, the continuouswavelet transform can conduct a local signal analysis at whichpoint the traditional Fourier and SSA are, however, less effective(Howell and Mahrt, 1994), which can be referred to Torrence andCompo (1998) for a practical guide. Nevertheless, the signal analy-sis in the timefrequency space is not the point of concern in thisstudy.

    The techniques of SSA and WMRA have been successfully intro-duced to the eld of hydrology (Lisi et al., 1995; Sivapragasamet al., 2001; Marques et al., 2006; Partal and Kisi, 2007). Sivaprag-asam et al. (2001) established a hybrid model of support vector

    Yangtze River, and are both located in Hubei province, PeopleRepublic of China. The ow data from Lushui River were acquiredat Tongcheng hydrology station which is at the upper stream ofLushui watershed (hereafter, the ow data is referred to as Lushuiseries). The watershed has an area of 224 km2. The data period cov-ers a 5 years long duration (January 1, 1984December 31, 1988).The ow data from Daning River were collected at Wuxi hydrologystation which is at the upper and middle streams of Daning wa-tershed. The drainage area controlled by Wuxi station is 2001 km2. The ow data spanned 20 years (from January 1, 1988to December 31, 2007).

    In the process of modeling of ANNs, the raw ow data is oftenpartitioned into three parts as training set, cross-validation set

    m3)

    C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 81machine (SVM) in conjunction with the SSA for the forecasting ofrainfall and runoff. A considerable improvement in the model per-formance was obtained in comparison with the original SVM mod-el. However, the paper did not explicitly mention how to combineSVM with the SSA. The applications of WMRA to precipitation anddischarge were presented in the work of Partal and Kisi (2007) andPartal and Cigizoglu (2008) respectively, where the WMRA was ap-plied to each model input variable. Results from their studies indi-cated that the WMRA is highly promising for improvement of themodel performance.

    The objective of this study is to evaluate the effectiveness of thethree data-preprocessing techniques of MA, SSA and WMRA in theimprovement of the ANN model performance. To explore the SSAor WMRA, the ANN model is coupled with the components ofSSA or WMRA in terms of two different methods. One is that theraw ow data is rst decomposed, then a new ow series is ob-tained by components lter, and nally the new ow series is usedto generate the model inputs. The type of model is named as ANN-SSA1 or ANN-WMRA1. The other method is the same as Partal andCigizoglu (2008), based on which the model is named as ANN-SSA2or ANN-WMRA2. With the original ANN model and the ANN-MA,there are six models for the ow data forecasting in all. This paperis organized in the following manner. Streamow data presentsthe two sets of streamow data. Methods describes the modelingmethods including a brief introduction of ANN, MA, SSA, andWMRA, and how to construct the hybrid ANN models. The applica-tion of the forecast models to the ow data is presented in Appli-cation of models to the ow data where relevant points includedecomposition of ows, the identication of the ANNs architec-ture, implementation of ANN models, and forecasting results anddiscussion. Conclusions sheds light on main conclusions in thisstudy.

    Streamow data

    Daily mean ow data from two rivers of Lushui and Daning areused in this study. The two rivers belong to direct tributaries of

    Table 1Related information for two rivers and the ow data.

    Watershed and datasets Statistical parameters

    l (m3) Sx (m3) Cv Cs Xmin (

    LushuiOriginal data 4.63 8.49 0.55 7.38 0.02Training 4.41 8.56 0.52 7.84 0.02Cross-validation 5.78 8.35 0.69 3.84 0.05Testing 3.90 8.39 0.47 10.22 0.30

    WuxiOriginal data 61.9 112.6 0.55 7.20 6.0Training 60.6 95.6 0.63 5.90 7.6

    Cross-validation 60.7 132.2 0.46 8.35 6.0Testing 66.0 122.1 0.54 6.30 10.1and testing set. The training set serves the model training andthe testing set is used to evaluate the performances of models.The cross-validation set help to implement an early stopping ap-proach in order to avoid overtting of the training data. The samedata partition was adopted in two daily ow series: the rst half ofthe entire ow data as training set and the rst half of the remain-ing data as cross-validation set and the other half as testing set.

    Table 1 presents related information about two rivers and somedescriptive statistics of the original data and three data subsets,including mean (l), standard deviation (Sx), coefcient of variation(Cv), skewness coefcient (Cs), minimum (Xmin), and maximum(Xmax). As shown in Table 1, the training set cannot fully includethe cross-validation or testing set. Due to the weak extrapolationability of ANN, it is suggested that all data should be scaled tothe interval [0.9, 0.9] rather than [1, 1] when ANN employsthe hyperbolic tangent functions as transfer functions in the hid-den layer and output layer.

    Fig. 1 estimates the autocorrelation functions (ACF), averagemutual information (AMI), and partial autocorrelation functions(PACF) from lag 0 to lag 30 days for the two ow series. The AMImeasures the general dependence of two variables (Fraser andSwinney, 1986) whereas the ACF and PACF show the dependencefrom the perspective of linearity. The rst order autocorrelationof each ow data is large (0.59 for Lushui, and 0.7 for Wuxi). Therapid decaying pattern of the PACF conrms the dominance ofautoregressive process, relative to the moving-averaging processrevealed by the ACF.

    Methods

    Articial neural networks

    An ANN is a massively parallel-distributed information process-ing system with highly exible conguration and so has an excel-lent nonlinearity capturing ability. The feed-forward multilayerperceptron (MLP) among many ANN paradigms is by far the mostpopular, which usually uses the technique of error back propaga-

    Watershed area and data period

    Xmax (m3)

    134 Area: 224 km2; data period: January, 1984December, 198812863

    134

    2230 Area: 2001 km2; data period: January, 1988December, 20071530

    22301730

  • bd

    2 an

    Hydrtion to train the network conguration. The architecture of theANN consists of the number of hidden layers and the number ofneurons in input layer, hidden layers and output layer. ANNs withone hidden layer are commonly used in hydrologic modeling(Dawson and Wilby, 2001; de Vos and Rientjes, 2005) since thesenetworks are considered to provide enough complexity to accu-rately simulate the nonlinear-properties of the hydrologic process.A three-layer ANN is therefore chosen for the present study, whichcomprises the input layer with I nodes, the hidden layer with H

    0 5 10 15 20 25 300

    0.2

    AMI (

    bits

    )

    0 5 10 15 20 25 30-1

    -0.5

    0

    0.5

    1

    ACF

    Lushui

    AMI ACF

    0 5 10 15 20 25 30-0.5

    0

    0.5

    1

    Lag (day)

    PAC

    F

    a

    c

    Fig. 1. Plots of ACF, AMI, and PACF of the ow data (1 and 3 for Lushui;

    82 C.L. Wu et al. / Journal ofnodes (neurons), and the output layer with one node. The hyper-bolic tangent functions are used as transfer functions in the hiddenlayer and output layer. The purpose of network training is to opti-mize the weights w connecting neighboring layers and bias h ofeach neuron in hidden layer and output layer. The LevernbergMarquardt (LM) training algorithm is used here for adjusting theweights and bias.

    Moving average

    The moving average method smoothes data by replacing eachdata point with the average of the K neighboring data points, whereKmay be called the length ofmemorywindow. Themethod is basedon the idea that any large irregular component at any point in timewill exert a smaller effect if we average the point with its immedi-ate neighbors (Newbold et al., 2003). The most common movingaverage method is the unweighted moving average, in which eachvalue of the data carries the same weight in the smoothing process.For time series {x1, x2, . . ., xN}, the K-term unweighted moving aver-age is written as xs

    Pk1i0 xsi=k where t K; . . .N; xs when

    the backward moving mode is adopted (Lee et al., 2000). Choiceof the window length K is by a trial and error procedure of minimiz-ing the ANN prediction error.

    Singular spectrum analysis

    The implementation of SSA can be referred to Vautard et al.(1992) and Elsner and Tsonis (1997). Four steps are summarizedfor the implementation. The rst step is to construct the trajectorymatrix. The trajectory matrix results from the method of delays.In the method of delays, the coordinates of the phase space willapproximate the dynamic of the system by using lagged copies ofthe time series. Therefore, the trajectory matrix can reect theevolution of the time series with a careful choice of (s, L) window.For time series {x1, x2, . . ., xN}, the trajectory matrix is denoted by

    X 1p

    x1 x1s x12s . . . x1m1sx2 x2s x22s . . . x2L1sx3 x3s x32s . . . x3L1s

    0BBBBBB

    1CCCCCC 1

    0 5 10 15 20 25 300

    0.1

    0.2

    0.3

    0.4

    AMI (

    bits

    )

    0 5 10 15 20 25 30-1

    -0.5

    0

    0.5

    1

    ACF

    Wuxi

    AMI ACF

    0 5 10 15 20 25 30-0.5

    0

    0.5

    1

    Lag (day)PA

    CF

    d 4 for Wuxi) where the dashed lines stand for 95% condence bound.

    ology 372 (2009) 8093N... ..

    . ... ..

    . ...

    xNL1s xNL2s xNL3s . . . xN

    B@ CA

    where L is the embedding dimension (also called singular numberin the context of SSA), s is the lagged (or delay) time. The matrixdimension is R L where R = N (L 1)s. The next step is the sin-gular value decomposition (SVD) of the trajectory matrix X. LetS = XTX (called the lagged-covariance matrix). With SVD, X can bewritten as X = DLET where D and E are left and right singular vectorsof X, and L is a diagonal matrix of singular values. E consists oforthonormal columns, and is also called the empirical orthonormalfunctions (EOFs). Substituting X into the denition of S yields theformula of S = EL2ET. Further S = E^ET since L2 = ^where ^ is a diag-onal matrix consisting of ordered values 0 k1 6 k2 6 kL. There-fore, the right singular vectors of X are the eigenvectors of S (Elsnerand Tsonis, 1997). In other words, the singular vectors E and singu-lar values of X can be respectively attained by calculating the eigen-vectors and the square roots of the eigenvalues of S.

    The rst two steps involve the decomposition stage of SSA, andthe next two steps belong to the recovering stage. The third step isto calculate the principal components (aki

    0s) by projecting the origi-nal time record onto the eigenvectors as follows:

    aki XLj1

    xij1sekj ; for i 1;2; . . . ;N L 1s 2

    where ekj represents the jth component of the kth eigenvector. Asknown, each principal component is a ltered process of the origi-nal series with length N (L 1)s, not length N as desired, whichposes a problem in real-time prediction.

  • ANN-SSA1 and ANN-SSA2The raw ow data is rst decomposed by SSA into L RCs, and

    then the raw ow data is ltered by selecting p (6L) from all LRCs. A new ow series is generated by summing the selected pRCs. Finally, the new ow series is used to generate the model in-puts. The type of model is hereafter referred to as ANN-SSA1.

    Different from ANN-SSA1, the model inputs are rst derivedfrom the original ow data, and then each input variable seriesof the model is ltered by selecting p ( 1, b0 > 0 are xed. The appropriatechoices for a0 and b0 depend on the wavelet function. A commonchoice for them is a0 = 2 and b0 = 1. Now Assuming a discrete timeseries xt, where xi occurs at the discrete time i, the DWT becomes

    Wm;n 2m=2XN1i0

    xiw2mi n 5

    where Wm,n is the wavelet coefcient for the discrete wavelet func-tion with scale a = 2m and location b = 2mn. In this study, the wave-let function is derived from the family of Daubechies wavelets withthe 3 order.

    Multi-resolution analysis (MRA)The Mallats decomposition algorithm (Mallat, 1989) is em-

    ployed in this study. According to the Mallats theory, the originaldiscrete time series xt is decomposed into a series of linearly inde-pendent approximation and detail signals.

    The process consists of a number of successive ltering steps asdepicted in Fig. 2. Fig. 2a displays an entire MRA scheme, andFig. 2b shows the ltering operation between two adjacent resolu-tions. The original signal xt is rst decomposed into an approxima-

    C.L. Wu et al. / Journal oftion and accompanying detail. The decomposition process is theniterated, with successive approximation being decomposed in turnso that the nest-resolution original signal is transformed intomany coarser-resolution components (Kck and Agiral_Ioglu,2006). As shown in Fig. 2b, the approximation cAi+1 is achievedby letting cAi pass through the low-pass lter H and downsam-pling by two (denoted as ;2) whereas the detailed version cDi+1is obtained by letting cAi pass through the high-pass lter G0 anddownsampling by two. The details are therefore the low-scale,high-frequency components whereas the approximations are thehigh-scale, low-frequency components. Finally, the original signalxt is decomposed into many detailed components and one approx-imation component which denotes the coarsest resolution. Follow-ing the procedure, the raw ow data can be decomposed intom + 1components if the m in DWA is set.

    ANNs integrated with data preprocessing techniques

    To explore the capability of ANNs, ve ANN models are gener-ated with the aid of the above three data-processing techniques.These data-preprocessing techniques are aimed at improving map-ping relationship between inputs and output of the ANN model bysmoothing raw ow data. Six forecasting models are described asfollows.

    ANNThe original ANN model (hereafter referred to as ANN) directly

    employs original ow data to generate model input/output pairs. Itis used as the baseline model for the purpose of comparison withthe other ve proposed models.

    ANN-MAThe moving average method rst smoothes the original ow

    data, and then the smoothed data are used to form the model in-puts. The model is hereafter referred to as ANN-MA.

    tx

    1cD

    1cA

    2cD

    2cA

    3cD

    3cA

    icA

    'G 2 1icD +'H 2 1icA+

    (b) (a)

    Fig. 2. Schematics of WMRA (a) perform decomposition of xt at level 3 and (b) ltersignal.

    ology 372 (2009) 8093 83The ANN-WMRA1 and ANN-WMRA2 are established in combi-nation with the WRMA instead of SSA. The idea behind the model-ling is identical to the ANN-SSA1 and ANN-SSA2.

  • Evaluation of model performances

    The Persons correlation coefcient (r) or the coefcient ofdetermination (R2 = r2), have been identied as inappropriatemeasures in hydrologic model evaluation by Legates and McCabe(1999). The coefcient of efciency (CE) (Nash and Sutcliffe,1970) is a good alternative to r or R2 as a goodness-of-t orrelative error measure in that it is sensitive to differences in theobserved and forecasted means and variances. Legates and McCabe(1999) also suggested that a complete assessment of modelperformance should include at least one absolute error measure(e.g., RMSE) as necessary supplement to a relative error measure.

    Besides, the Persistence Index (PI) (Kitanidis And Bras, 1980) wasadopted here for the purpose of checking the prediction lag effect.Three measures were therefore used in this study. They are

    formulated as: CE 1Pi 1nTi T^ i2=Pni1Ti T2, RMSE 1n

    q Pni1Ti T^ i2; and PI 1

    Pni1Ti T^ i2=

    Pni1Ti Til2. In

    these equations, n is the number of observations, T^ i stands forforecasted ow, Ti represents observed ow, T denotes averageobserved ow, and Til is the ow estimate from a so-call persis-tence model (or called nave model) that basically takes the lastow observation (at time i minus the lead time l) as a prediction.CE and PI values of 1 stands for perfect ts.

    5 10 20 30 505

    5.5

    6

    6.5

    7

    7.5

    8

    ln (

    sing

    ular

    val

    ue )

    singular number

    Lushui

    5 10 20 30 508

    8.5

    9

    9.5

    10

    10.5

    11

    ln (

    sing

    ular

    val

    ue )

    singular number

    Wuxi

    L=10L=20L=30L=50

    L=10L=20L=30L=50

    a b

    Fig. 3. Singular spectrum for (a) Lushui and (b) Wuxi with different L.

    5

    5.5

    6

    lue

    )

    Lushui

    10

    10.5

    11

    lue

    )

    Wuxi

    =1=3=5=7=10

    =1=3=5=7=10

    a b

    84 C.L. Wu et al. / Journal of Hydrology 372 (2009) 80931 2 3 4 52.5

    3

    3.5

    4

    4.5

    ln (

    sing

    ular

    vasingular number

    Fig. 4. Singular spectrum for (a) Lush1 2 3 4 57.5

    8

    8.5

    9

    9.5

    ln (

    sing

    ular

    vasingular number

    ui and (b) Wuxi with different s.

  • Application of models to the ow data

    Decomposition of daily ow data

    Decomposition by SSADecomposition of the raw ow data by SSA requires identifying

    the parameter pair (s, L). The choice of L represents a compromisebetween information content and statistical condence. The valueof L should be able to clearly resolve different oscillations hidden inthe original signal. In other words, some leading eigenvaluesshould be identied. Fig. 3 displays the sensitivities of the eigen-value decomposition to the singular number L for Lushui andWuxi.Results show that about ve leading eigenvalues stand out for dif-ferent L, which implies the leading eigenvalues are insensitive to L.

    These leading eigenvalues are associated with lower frequencyoscillations. For the convenience of ltering operation later, L isset a small value of ve in the present study. Fig. 4 presents thesensitivities of the eigenvalue decomposition to the lag time swhen L = 5. Results suggest that the eigenvalues can be distin-guished when s = 1, which means that original signal can be re-solved distinctly. The nal parameter pair (s, L) in SSA weretherefore set as (1, 5) for two studied ow data series.

    Taking the ow data of Lushui as the example, Fig. 5 presentsve RCs and the original ow series excluding the testing data.The RC1 represents an obvious low-frequency oscillation, whichexhibits a similar mode to the original ow series. The other RCsreect high-frequency oscillations, part of which can be deletedso as to improve the mapping between inputs and output of ANN

    200 400 600 800 1000 12000

    20

    40

    RC1

    200 400 600 800 1000 1200

    -10

    01020

    RC2

    200 400 600 800 1000 1200

    -100

    1020

    RC3

    Dis

    char

    ge (m

    3 )

    200 400 600 800 1000 1200

    -505

    1015

    RC4

    200 400 600 800 1000 1200

    -10

    0

    10

    RC5

    Time (day) 200 400 600 800 1000 1200

    0

    50

    100

    Original flow series

    Time (day)

    Fig. 5. Reconstructed components (RCs) by SSA and original ow series of Lushui.

    0

    0.5

    1

    AMI (

    bits

    )

    RC1

    -1

    -0.5

    0

    0.5

    1

    CC

    F

    0

    0.5

    1

    AMI (

    bits

    )

    RC2

    -1

    -0.5

    0

    0.5

    1

    CC

    F

    C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 850 2 4 6 8 10 120 2 4 6 8 10 12

    0 2 4 6 8 10 120

    0.5

    1

    AMI (

    bits

    )

    RC3

    0 2 4 6 8 10 12-1

    -0.5

    0

    0.5

    1

    CC

    F

    0 2 4 6 8 10 120

    0.5

    1

    AMI (

    bits

    )

    RC5

    0 2 4 6 8 10 12-1

    -0.5

    0

    0.5

    1

    CC

    FLag (day)

    Fig. 6. Plots of AMI and CCF between R0 2 4 6 8 10 120 2 4 6 8 10 12

    0 2 4 6 8 10 120

    0.5

    1

    AMI (

    bits

    )

    RC4

    0 2 4 6 8 10 12-1

    -0.5

    0

    0.5

    1

    CC

    F

    0 2 4 6 8 10 12

    0.10.20.30.40.5

    Average of all RCs

    AMI.CCFLag (day)

    C and the raw ow data of Lushui.

  • models. Fig. 6 depicts AMI and cross-correlation function (CCF) be-tween RCs and the original ow data. The last plot in Fig. 6 denotesthe average of AMI and CCF, which was generated by averaging theresults in the plots of ve RCs. The average indicates an overall cor-

    quickly shifts from positive value to negative value for other RCswith the increase of lag time. In essence, the positive or negativevalue of CCF may indicate that the RC makes a positive or negativecontribution to the output of model when the RC is used as the in-

    200 400 600 800 1000 12000

    50

    100

    Original flow series

    Dis

    char

    ge (m

    3 )200 400 600 800 1000 1200

    -350

    3570

    DWC1

    200 400 600 800 1000 1200-20

    02040

    DWC2

    200 400 600 800 1000 1200-20

    02040

    DWC3

    200 400 600 800 1000 1200-20

    020

    DWC4

    200 400 600 800 1000 1200-10

    010

    DWC5

    200 400 600 800 1000 1200-10

    010

    DWC6

    200 400 600 800 1000 1200-505

    10DWC7

    200 400 600 800 1000 1200-505

    10DWC8

    Time (day)200 400 600 800 1000 1200

    05

    10DWC9

    Time (day)

    Fig. 7. Discrete wavelet components (DWCs) and original ow series of Lushui.

    86 C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093relation either being positive or negative. The best positive correla-tion occurs at lag 1. The RC1 among all ve RCs exhibits the bestpositive correlation with the original ow series. The correlation

    1

    its)

    DWC1

    0.5 1 0 2 4 6 8 10 120

    0.5

    AMI(b

    0 2 4 6 8 10 12-1 -0.50

    CC

    F

    0 2 4 6 8 10 120

    0.5

    1

    AMI(b

    its)

    DWC3

    0 2 4 6 8 10 12-1 -0.50 0.5 1

    CC

    F

    0 2 4 6 8 10 120

    0.5

    1

    AMI(b

    its)

    DWC5

    0 2 4 6 8 10 12-1 -0.50 0.5 1

    CC

    F

    0 2 4 6 8 10 120

    0.5

    1

    AMI(b

    its)

    DWC7

    0 2 4 6 8 10 12-1 -0.50 0.5 1

    CC

    F

    0 2 4 6 8 10 120

    0.5

    1

    AMI(b

    its)

    DWC9

    Lag (day)0 2 4 6 8 10 12

    -1 -0.50 0.5 1

    CC

    F

    Fig. 8. Plots of AMI and CCF between DWput of model. Therefore, deleting RCs, which are negative correla-tions with the model output if the average AMI or CCF ispositive, can improve the performance of the forecasting model.

    1

    its)

    DWC2

    0.5 1 0 2 4 6 8 10 120

    0.5

    AMI(b

    0 2 4 6 8 10 12-1 -0.50

    CC

    F

    0 2 4 6 8 10 120

    0.5

    1

    AMI(b

    its)

    DWC4

    0 2 4 6 8 10 12-1 -0.50 0.5 1

    CC

    F

    0 2 4 6 8 10 120

    0.5

    1

    AMI(b

    its)

    DWC6

    0 2 4 6 8 10 12-1 -0.50 0.5 1

    CC

    F

    0 2 4 6 8 10 120

    0.5

    1

    AMI(b

    its)

    DWC8

    0 2 4 6 8 10 12-1 -0.50 0.5 1

    CC

    F

    0 2 4 6 8 10 12

    0.1

    0.2

    0.3Average of all DWCs

    Lag (day)

    AMI.CCF

    C and the raw ow data of Lushui.

  • This is the underlying reason that ANN is coupled with SSA orWRMA in this study.

    Decomposition by WMRAThe WMRA decomposes an original signal into many compo-

    nents at different scales (or frequencies). Each of components playsa distinct role in the original ow series. The low-frequency com-ponent generally reects the identity (periodicity and trend) ofthe signal whereas the high-frequency component uncovers details(Kck and Agiral_Ioglu, 2006). An important issue in the WMRA isto choose the appropriate number of scales. The largest scalesshould be shorter than the size of testing data. The sizes of testing

    data are 550 days (1.25 years) for Lushui and 1826 days (5 years)for Wuxi. The largest scale m is therefore chosen as 8 and 10 forLushui and Wuxi, respectively. Thus, the ow data of Lushui wasdecomposed at 8 wavelet resolution levels (2122232425262728 days), and the ow data of Wuxi was decomposed at 10wavelet resolution levels (212223242526272829210days).Fig. 7 shows the original ow data of Lushui (excluding testingdata) and nine wavelet components (eight details componentsand one approximation component). For the purpose of distinctionwith the components of SSA, one wavelet component at some scaleis expressed by DWC with the power of two. For instance, DWC1stands for the component at the scale of 21 days and DWC2 repre-

    7

    8

    9

    3 )

    ANN-SSA1

    8

    9

    10

    3)

    ANN-WMRA1

    one-daytwo-daythree-day

    one-daytwo-daythree-day

    a b

    0 5 10 15 2010-1

    100

    101

    102

    embedding dimension L

    FNN

    P (%

    )

    Lushui

    0 5 10 15 2010-1

    100

    101

    102

    Wuxi

    embedding dimension L

    FNN

    P (%

    )

    a b

    Fig. 9. FNNP as a function of the embedding dimension for (a) Lushui and (b) Wuxi when s = 1 and Rtol = 15.

    C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 875 4 3 2 13

    4

    5

    6

    Vali-

    RM

    SE (m Number of RCs

    Fig. 10. Performances of (a) ANN-SSA1 and (b) ANN-WMRA1 as a function of9 8 7 6 5 4 3 2 14

    5

    6

    7

    Vali-

    RM

    SE (m Number of DWCs

    p (6L) at different prediction horizons (based on the Lushui ow data).

  • and Wuxi when RP = 2%, and the values of L are 7 and 12 for LushuiandWuxi when RP = 1%. The nal selection of model inputs were sixfor Luishui (i.e., using Qt1, Qt2, Qt3, Qt4, Qt5 and Qt6 as input topredict Qt) and 8 (i.e., using Qt1, Qt2, Qt3, Qt4, Qt5, Qt6, Qt7 andQt8 as input to predict Qt) for Wuxi by trial and error among threepotential model inputs.

    The ensuing task is to optimize the size of the hidden layer withthe chosen three inputs and one output. The optimal size H of thehidden layer was found by systematically increasing the number ofhidden neurons from 1 to 10 until the network performance on thecross-validation set was no longer improved signicantly. Theidentied ANN architecture was: 681 for Lushui and 891 forWuxi. Note that the identied ANNmodel was used as the baselinemodel for the following hybrid operation with data-preprocessingtechniques.

    Implementation of models

    ANN-MA

    Hydrology 372 (2009) 8093sents the component at the scale of 22 days whereas DWC9 de-notes the approximation for the Lushui ow series. The approxima-tion component at the right end of Fig. 7 is the residual whichreects the trend of the ow data. As revealed in Fig. 7, detail com-ponents at scales of 28 (256 days) and 27 (128 days) are character-ized by notable periodicity, which partially exhibits annualoscillation and semi-annual oscillation in the original ow series.Relative weak periodic signals occur at scales of 16 days, 32 daysand 64 days. Other high-frequency components tend to capturethe details (or noises) of the original ow series. Hence, the inputsof model can be ltered by deleting some high-frequency DWCs.

    AMI and CCF between DWCs and the original ow are presentedin Fig. 8. The last plot in Fig. 8 describes the average plots of AMIand CCF, which was generated by averaging the plots of 9 DWCs.The DWCs with lower frequencies including 28, 29 and 210 days al-ways keep positive correlations with the original ow data withina long lag time. The approximation component DWC9 also exhibitsa positive correlation with the original ow data with a long lagtime. It can be seen from other plots of DWCs that the correlationcoefcient shifts between positive and negative values.

    Identication of the ANN architecture

    Six models architectures need to be identied depending on theraw or ltered ow data before models can be applied to the owprediction. The ANN model is used as a paradigm to shed light onthe procedure.

    The architecture identication of the ANN model includesdetermining model inputs and the number of nodes (or neurons)in the hidden layer when there is one model output. The selectionof appropriate model inputs is crucial in model development.There is no any theoretic guide for the selection of model inputsalthough a large number of methods have been reported in litera-ture which was reviewed by Bowden et al. (2005). These methodsappear very subjective. Sudheer et al. (2002) suggested that thestatistical approach depending on cross-, auto- and partial-auto-correlation of the observed data is a good alternative to the trial-and-error method in identifying model inputs. The statisticalmethod was also successfully applied to daily suspended sedimentdata by Kisi (2008). The model input in this method is mainlydetermined by the plot of PACF. The essence of this method is toexamine the dependence between the input and output data series.According to this method, the model inputs were originally consid-ered to take previous 6 daily ows for Luishui and previous 13 dai-ly ows for Wuxi because the PACF within the condence bandoccurs at lag 6 for Luishui and lag 13 for Wuxi (Fig. 1).

    The false nearest neighbors (FNNs) (Kennel et al., 1992; Abarba-nel et al., 1993) is another commonly used method to identifymodel inputs, which is for the perspective of dynamics reconstruc-tion of a system (Wang et al., 2006). The following discussion out-lines the basic concepts of the FNN algorithm. Suppose the pointYi = {xi, xi+s, xi+2s, . . ., xi+(L1)s} has a neighbor Yj = {xj, xj+s, xj+2s, . . . ,xj+(L-1)s}, the criterion that Yj is viewed as a false neighbor of Yi is:

    jxiLs xjLsjkYi Yjk > Rtol 6

    where |||| stands for the distance in a Euclidean sense, Rtol is somethreshold with the common range of 1030. For all points i in thevector state space, Eq. (6) is performed and then the percentageof points which have FNNs is calculated. The algorithm is repeatedfor increasing L until the percentage of FNNs drops to zero (or someacceptable small number, denoted by RP, such as RP = 1%), where L is

    88 C.L. Wu et al. / Journal ofthe target L. Setting Rtol = 15 and s = 1, the Percentage of FNNs(FNNP) as a function of L were calculated for the two ow series,shown in Fig. 9. The values of L are 6 and 8, respectively for LushuiThe window length K (see moving average) can be determinedby varying K from 1 to 10 depending on the identied ANN model.The targeted value of K was associated with the optimal networkperformance in terms of RMSE. The nal K was 3, 5, and 7 at 1-,2-, and 3-day-ahead forecast horizons for the two ow data.

    ANN-SSA1 (or ANN-WMRA1)According to the methodological procedure for the ANN-SSA1

    and ANN-WMRA1, the further tasks include sorting out contribut-ing components from RCs or DWCs and determining the ANN-SSA1architecture. RCs from Lushui were employed to describe theimplementation of the ANN-SSA1.

    The determination of the effective RCs depends on the correla-tion coefcients between RCs and the original ow data (Fig. 6).The procedure includes the following steps:

    Identify that the average of CCF (shown at the right below cornerof Fig. 6) is positive or negative. For 1-day-ahead prediction, theaverage of CCF is positive (0.18) at lag 1.

    Sort the value of CCFs at lag 1 for all RCs in a descending order(in an ascending order, if the average of CCF is negative). Forthe 1-day-ahead prediction, the new order is RC1, RC2, RC3,RC4, and RC5, which is the same as the original order.

    Use the ANN model to conduct predictions in which p (6L) RCsgenerating the new model inputs systematically decreases from

    Table 2Effective components for the ANN-SSA2 and ANN-WMRA2 inputs at variousforecasting horizons (based on the ow data of Lushui).

    Model Modelinputs

    Effective components

    1-day lead 2-day lead 3-day lead

    ANN-SSA2 Qt1 1, 2a 1, 5 1, 4Qt2 1, 5 1, 4 1Qt3 1, 4 1 1Qt4 1 1 1Qt5 1 1 1Qt6 1 1 1

    ANN-WMRA2 Qt1 4, 8, 3, 6, 7, 2, 5, 9b 8, 4, 6, 7, 5, 9 8, 6, 7, 4, 5, 9Qt2 8, 4, 6, 7, 5, 9, 3 8, 6, 7, 4, 5, 9 8, 6, 7, 9, 4, 5Qt3 8, 6, 7, 4, 5, 9, 1 8, 6, 7, 9, 4, 5 8, 6, 7, 9, 5Qt4 8, 6, 7, 9, 4, 5, 1 8, 6, 7, 9, 5 8, 6, 7, 9Qt5 8, 6, 7, 9, 5, 2, 4 8, 6, 7, 9 8, 6, 7, 9Qt6 8, 6, 7, 9, 2, 5 8, 6, 7, 9 8, 6, 7, 9

    a The numbers of 1, 2 denote RC1 and RC2.

    b The numbers of 4, 8, 3, 6, 7, 2, 5, 9 stand for DWC4, DWC8, DWC3, DWC6,

    DWC7, DWC2, DWC5, and DWC9, and the sequence of these numbers is in adescending order of their correlation coefcients.

  • all ve RCs at the beginning to only RC1 at the end. The targetvalue of p is associated with the minimum RMSE amongst veruns of the ANN model.

    Fig. 10 shows the results of the RCs and DWCs lter at all threeprediction horizons. It can be seen from Fig. 10(a) that two compo-nents (RC1and RC2) were remained for 1-day-ahead prediction,two components (RC1 and RC5, because the new order is RC1-RC5-RC2-RC4-RC3) for 2-day-ahead prediction, and only compo-nent (RC1, due to the new order being RC1-RC4-RC5-RC3-RC2)for 3-day-ahead prediction. Fig. 10(b) shows that most of DWCsare kept. For instance, only DWC1 (detail at the scale of 21 day)of all nine DWCs was deleted for 1-step-ahead prediction, twoDWCs (DWC1 and DWC2) were deleted for 2-step-ahead predic-

    tion, and three DWCs (DWC1, DWC2, and DWC3) were removedfor 3-step-ahead prediction. The values of vali-RMSE in Fig. 10 alsoshow that the SSA is superior to the WMRA in the improvement ofthe ANN performance.

    Based on the remained RCs or DWCs, the number of nodes inthe hidden layer of the ANN model is optimized again. The identi-ed architectures of ANN-SSA1 and ANN-WMRA1 were the sameas the original ANN model (i.e., 681 for Lushui and 891 forWuxi).

    ANN-SSA2 (or ANN-WMRA2)Implementation of the ANN-SSA2 or ANN-WMRA2 can be re-

    ferred to Partal and Kisi (2007) and Partal and Cigizoglu (2008)for details. A three-step procedure of the implementation is: to

    0 50 100 1500

    50

    100

    150

    Observed

    Fore

    cast

    ed

    Lushui

    100 200 300 400

    50

    100

    150

    Time (day)

    Dis

    char

    ge (m

    3)

    0 500 1000 15000

    500

    1000

    1500

    Observed

    Fore

    cast

    ed

    Wuxi

    500 1000 1500

    500

    1000

    1500

    Time (day)

    Dis

    char

    ge (m

    3)

    ObservedForecasted

    ObservedForecasted

    RMSE: 6.41CE: 0.42PI: 0.10

    RMSE: 88.9CE: 0.47PI: 0.03

    a b

    c d

    Fig. 11. Scatter plots and hydrographs of the results of 1-day-ahead forecast by the ANN model using the Lushui data (1 and 3) and Wuxi data (2 and 4).

    50

    100

    150

    isch

    arge

    (m3 )

    Lushui

    500

    1000

    1500

    isch

    arge

    (m3 )

    Wuxi

    ObservedForecasted

    ObservedForecasted

    a b

    C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 89150 170 190 210 230 250Time (1 March to 9 June 1988)

    D

    -8 -6 -4 -2 0 2 4 6 80

    0.2

    0.4

    0.6

    0.8

    1

    Lag (day)

    CC

    F

    One-dayTwo-dayThree-day

    cFig. 12. Representative detail of observed and forecasted discharges for 1-day-ahead forefrom the ANN model (1 and 3 for Lushui; 2 and 4 for Wuxi).880 900 920 940 960 9800

    Time (31 May to 8 Sep. 2005)

    D

    -8 -6 -4 -2 0 2 4 6 80

    0.2

    0.4

    0.6

    0.8

    1

    Lag (day)

    CC

    F

    One-dayTwo-dayThree-day

    dcast and CCF between observed and forecasted discharges at three forecast horizons

  • rstly use the SSA or WRMA to decompose each input variable ser-ies of the original ANN model; to then select effective componentsfor each input variable; to nally generate a new input variableseries by summing selected effective components. Obviously, theprocedure is very time-consuming because it has to be repeatedfor each ANN input variable. There is no denite criterion for theselection of RCs or DWCs. A basic principle is to remain these com-ponents that make a positive contribution to the model output. Atrial and error approach was therefore employed in the presentstudy. The value of CCF in Figs. 6 and 8 indicate the contributionof each component in each input variable to the ANN output. Forinstance, the values of CCF at lag 1 denote the correlationcoefcients between components of Qt1 and the output variableQt. Table 2 lists the effective components of each input variable

    for ANN-SSA2 and ANN-WMRA2 based on the Lushui ows. Itcan be seen that the SSA is more effective than the WMRA becausemost of DWCs are remained for each input variable of the ANN-WMRA2 whereas only one or two RCs are kept for each inputvariable of the ANN-SSA2. With newmodel inputs, identied archi-tectures of the ANN-SSA2 and ANN-WMRA2 were also the same asthe original ANN model (i.e., 681 for Lushui and 891 forWuxi).

    Forecasting results and discussion

    Fig. 11 shows the scatter plots and hydrographs of the results of1-day-ahead prediction of the ANN model using the ow data ofLushui andWuxi. The ANNmodel seriously underestimates a num-

    0 50 100 1500

    50

    100

    150

    Fore

    cast

    ed

    ANN-SSA1

    0 50 100 1500

    50

    100

    150

    Observed

    Fore

    cast

    ed

    ANN-SSA2

    0 50 100 1500

    50

    100

    150

    Fore

    cast

    ed

    ANN-WMRA1

    0 50 100 1500

    50

    100

    150

    Observed

    Fore

    cast

    ed

    ANN-WMRA2

    RMSE: 4.67CE: 0.70PI: 0.54

    RMSE: 3.77CE: 0.80PI: 0.69

    RMSE: 3.18CE: 0.86PI: 0.78

    RMSE: 5.85CE: 0.51PI: 0.25

    a b

    c d

    Fig. 13. Scatter plots of observed and forecasted Luishui discharges for 1-day-ahead forecast using (a) ANN-SSA1, (b) ANN-WMRA1, (c) ANN-SSA2, and (d) ANN-WMRA2.

    50

    100

    150

    isch

    arge

    (m3 )

    ANN-SSA1

    50

    100

    150

    isch

    arge

    (m3 )

    ANN-WMRA1

    ObservedForecasted

    ObservedForecasted

    a b

    90 C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093150 170 190 210 230 2500

    D

    150 170 190 210 230 2500

    50

    100

    150

    Time (1 March to 9 June 1988)

    Dis

    char

    ge (m

    3 )

    ANN-SSA2

    ObservedForecasted

    cFig. 14. Representative detail of observed and forecasted Luishui discharges for 1-day-aWMRA2.150 170 190 210 230 2500

    D

    150 170 190 210 230 250

    50

    100

    150

    Time (1 March to 9 June 1988)

    Dis

    char

    ge (m

    3 )

    ANN-WMRA2

    ObservedForecasted

    dhead forecast using (a) ANN-SSA1, (b) ANN-WMRA1, (c) ANN-SSA2, and (d) ANN-

  • ber of moderate and high magnitudes of the ows. The low valuesof CE and PI demonstrate that a time lag may exist between theforecasted and observed ows. A representative detail of thehydrographs is presented in Figs. 12(1) and 12(2), in which the pre-diction lag effect is fairly obvious. Figs. 12(3) and 12(4) illustratethe lag values at 1-, 2-, and 3-day-ahead forecast horizons for Lush-ui andWuxi on a basis of the CCF between forecasted and observeddischarges. The value of CCF at zero lag corresponds to the actualperformance (i.e., correlation coefcient) of the modes. The lag atwhich the value of CCF is maximized, is an expression for the meanlag in the model forecast. Therefore, there were 1, 2, and 4 days lagfor Lushui, and 1, 2, and 3 days lag for Wuxi, which are respectivelyassociated with 1-, 2-, and 3-day-ahead forecasting.

    The scatter plots of simulation results of 1-day-ahead predic-tion based on the Lushui ows by using the ANN-SSA1, ANN-SSA2, ANN-WMRA1, and ANN-WMRA2, are presented in Fig. 13.Each of the four models exhibits a noticeable improvement inthe performance compared with the ANN model. The remarkableimprovement is, however, from the ANN-SSA1 and ANN-SSA2 interms of RMSE, CE and PI. Fig. 14 describes the representative de-tail of the hydrographs of the four prediction models. The laggedpredictions can be clearly found in the detail plots derived fromthe ANN-WMRA1 and ANN-WMRA1, in particular in the latter.Figs. 15 and 16 present the scatter plots and detail parts of thehydrographs from the same four models based on the ows ofWuxi. A great improvement in the model performance can be seen

    0 500 1000 15000

    500

    1000

    1500

    Fore

    cast

    ed

    ANN-SSA1

    0 500 1000 15000

    500

    1000

    1500

    Observed

    Fore

    cast

    ed

    ANN-SSA2

    0 500 1000 15000

    500

    1000

    1500

    Fore

    cast

    ed

    ANN-WMRA1

    0 500 1000 15000

    500

    1000

    1500

    Observed

    Fore

    cast

    ed

    ANN-WMRA2

    RMSE: 78.7CE: 0.58PI: 0.24

    RMSE: 48.1CE: 0.84PI: 0.72

    RMSE: 46.0CE: 0.86PI: 0.74

    RMSE: 69.6CE: 0.68PI: 0.41

    a b

    c d

    Fig. 15. Scatter plots of observed and forecasted Wuxi discharges for 1-day-ahead forecast using (a) ANN-SSA1, (b) ANN-WMRA1, (c) ANN-SSA2, and (d) ANN-WMRA2.

    500

    1000

    1500

    Dis

    char

    ge (m

    3 )

    ANN-SSA1

    500

    1000

    1500

    Dis

    char

    ge (m

    3 )

    ANN-WMRA1a b

    C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 91880 900 920 940 960 9800

    880 900 920 940 960 9800

    500

    1000

    1500

    Time (31 May to 8 Sep. 2005)

    Dis

    char

    ge (m

    3 )

    ANN-SSA2cFig. 16. Representative detail of observed and forecasted Wuxi discharges for 1-day-aWMRA2.880 900 920 940 960 9800

    880 900 920 940 960 9800

    500

    1000

    1500

    Time (31 May to 8 Sep. 2005)

    Dis

    char

    ge (m

    3 )

    ANN-WMRA2dhead forecast using (a) ANN-SSA1, (b) ANN-WMRA1, (c) ANN-SSA2, and (d) ANN-

  • 0 50 100 1500

    50

    100

    150

    Observed

    Fore

    cast

    ed

    Lushui

    100

    150

    (m3 )

    Lag (day)

    RMSE: 2.47CE: 0.91PI: 0.87

    a b

    c d

    xi ((b), (d), and (f)) where (a) and (b) denote scatter plots, (c) and (d) are representativethree prediction levels.

    Table 4Model performances at various forecasting horizons using testing data of Wuxi.

    Model RMSE CE PI

    1a 2a 3a 1 2 3 1 2 3

    ANN 88.9 111.0 114.7 0.47 0.17 0.12 0.03 0.24 0.32ANN-MA 29.4 39.4 41.5 0.94 0.90 0.88 0.89 0.90 0.91

    92 C.L. Wu et al. / Journal of Hydrwhen the four models are compared with the ANN model. In termsof RMSE, CE and PI, the ANN-SSA1 and ANN-SSA2 performed betterthan the ANN-WMRA1 and ANN-WMRA2. The detail plots inFig. 16(a) and (c) also indicate that the ANN-SSA1 and ANN-SSA2can reasonably approximate the ows of Wuxi. In contrast, theANN-WRMA1 and ANN-WRMA1 underestimate quite a number

    150 200 2500

    50

    Time (1 March to 9 June 1988)

    -8 -6 -4 -2 0 2 4 6 80

    0.5

    1

    Lag (day)

    CC

    F

    One-dayTwo-dayThree-day

    Dis

    char

    ge

    e

    Fig. 17. Forecast results from the ANN-MA model for Lushui ((a), (c), and (e)) and Wudetails, and (e) and (f) show the CCF between forecasts and observed discharges atof peak value ows. Furthermore, the lag effect is still visible inFig. 16(b) and (d).

    In terms of the ANN-SSA1 and ANN-SSA2, the scatter plots withlow spread, the low RMSE, and high CE and PI indicate excellentmodel performance. The matched-perfectly detail plots in Figs.16(a) and (c) also show that the two models highly approximatethe ows of Wuxi.

    The ANN-MA simulation results of 1-day-ahead prediction arepresented in Fig. 17. Fig. 17(a), (c), and (e) depicts the scatter plots,the hydrographs and the CCF curves at three forecasting horizonsbased on the ows of Lushui. Fig. 17(b), (d), and (f) demonstratesthe scatter plots, the hydrographs and the CCF curves at three fore-casting horizons depending on the ows of Wuxi. The results fromFig. 17(e) and (f) show that the issue of lagged prediction is com-pletely eliminated by the MA because the maximum CCF occursat zero lag. Compared with the other ve models (ANN, ANN-SSA1, ANN-SSA2, ANN-WMRA1, and ANN-WMRA2), the ANN-MAmodel exhibits the best model performance including the scatter

    Table 3Model performances at various forecasting horizons using testing data of Lushui.

    Model RMSE CE PI

    1a 2a 3a 1 2 3 1 2 3

    ANN 6.41 7.50 8.27 0.42 0.20 0.03 0.10 0.26 0.36ANN-MA 2.47 2.63 2.60 0.91 0.90 0.90 0.87 0.91 0.94ANN-SSA1 3.77 3.85 3.98 0.80 0.79 0.77 0.69 0.81 0.85ANN-SSA2 3.18 3.47 3.85 0.86 0.83 0.79 0.78 0.84 0.86ANN-WMRA1 4.67 5.14 7.17 0.70 0.62 0.27 0.54 0.65 0.52ANN-WMRA2 5.85 6.26 6.91 0.51 0.44 0.32 0.25 0.48 0.55

    a The number of 1, 2, and 3 denote 1-, 2-, and 3-day-ahead forecasts.0 500 1000 15000

    500

    1000

    1500

    Observed

    Fore

    cast

    ed

    Wuxi

    880 900 920 940 960 9800

    500

    1000

    1500

    Time (31 May to 8 Sep. 2005)

    Dis

    char

    ge (m

    3 )-8 -6 -4 -2 0 2 4 6 8

    0

    0.5

    1

    CC

    F

    One-dayTwo-dayThree-day

    RMSE: 29.4CE: 0.94PI: 0.89

    f

    ology 372 (2009) 8093plots with low spread, the low RMSE, and high CE and PI. Thematched-perfectly detail plots in Fig. 17(b) and (d) indicate thatthe two models are fairly adequate in reproducing the observedows of Lushui and Wuxi. In addition, the ANN-MA model alsoshows a great ability in capturing the peak value of ows (depictedin Figs. 17(a)(d)).

    Table 3 and 4 summarizes the forecasting performance of all sixmodels in terms of RMSE, CE, and PI at three prediction horizons.The ANN model shows markedly inferior results compared withthe other ve models. The ANN-MA among all models holds thebest performance at each prediction horizon. It can be also seenthat the performances from the ANN-SSA1 and ANN-SSA2 are sim-ilar, and the same situation appears between the ANN-WMRA1 andANN-WMRA2. However, the models based on SSA provide notice-ably better performance than the models based on WMRA at eachforecast horizon, which means that the SSA is more effective thanthe WMRA in improving the ANN performance in the currentstudy.

    Conclusions

    In this study, the conventional ANN model was coupled withthree different data-preprocessing techniques, i.e., MA, SSA, andWMRA. As a result, six ANN models, the original ANN model,ANN-MA, ANN-SSA1, ANN-SSA2, ANN-WMRA1, and ANN-WMRA2,

    ANN-SSA1 46.0 50.4 50.5 0.86 0.83 0.83 0.74 0.84 0.87ANN-SSA2 48.1 45.3 50.5 0.84 0.86 0.83 0.72 0.87 0.87ANN-WMRA1 69.6 82.3 92.2 0.68 0.55 0.43 0.41 0.59 0.56ANN-WMRA2 78.7 86.9 94.1 0.58 0.49 0.41 0.24 0.54 0.54

    a The number of 1, 2, and 3 denote 1-, 2-, and, 3-day-ahead forecasts.

  • were proposed to forecast two daily ow series of Lushui andWuxi. To apply these models to the ow data, the memory lengthK of MA, the lag time and embedding dimension (s, L) of SSA, andthe largest scale m of WMRA needed to be decided in advance. TheK for 1-, 2-, and 3-day-ahead were 3, 5, and 7 days for each owdata by trial and error. The values of (s, L) were set as the valueof (1, 5) for each ow data by sensitivity analysis. The largest scalem of WMRA was 8 and 10 for Lushui and Wuxi respectivelydepending on the length size of the testing data.

    The results from the original ANN model were disappointingdue to the existence of the prediction lag effect. The analysis ofCCF between predicted and observed ows revealed that the lags

    Fraser, A.M., Swinney, H.L., 1986. Independent coordinates for strange attractorsfrom mutual information. Phys. Rev. A 33 (2), 11341140.

    Howell, J.F., Mahrt, L., 1994. An Adaptive Decomposition: Application to Turbulencein Wavelet in Geophysics. Academic Press, New York. pp. 107128.

    Jain, S.K., Das, A., Drivastava, D.K., 1999. Application of ANN for reservoirinow prediction and operation. J. Water. Resour. Plann. Manage. 125 (5),263271.

    Jain, A., Srinivasulu, S., 2004. Development of effective and efcient rainfallrunoffmodels using integration of deterministic, real-coded genetic algorithms andarticial neural network techniques. Water Resour. Res. 40, W04302.

    Kennel, M.B., Brown, R., Abarbanel, H.D.I., 1992. Determining embedding dimensionfor phase space reconstruction using geometrical construction. Phy. Rev. A. 45(6), 34033411.

    Kisi, O., 2003. River ow modeling using articial neural networks. J. Hydrol. Eng. 9

    C.L. Wu et al. / Journal of Hydrology 372 (2009) 8093 93at three prediction horizons were 1, 2, and 4 days lag for Lushuiand 1, 2, and 3 days lag for Wuxi. All three data-preprocessingtechniques could improve the ANN performance. The ANN-MA,among all six models, performed best and eradicated the lag effect.It could be also seen that the performances from the ANN-SSA1 andANN-SSA2 were similar, and the same situation appeared betweenthe ANN-WMRA1 and ANN-WMRA2. However, the models basedon SSA provided noticeably better performance than the modelsbased on WMRA at each forecast horizon, which meant that theSSA was more effective than the WMRA in improving the ANN per-formance in the current study.

    Under the overall consideration including the model perfor-mance and the complexity of modeling, the ANN-MA model wasoptimal, then the ANN model coupled with SSA, and nally theANN model coupled with WMRA.

    References

    Abarbanel, H.D.I., Brown, R., Sidorowich, J.J., Tsimring, L.S., 1993. The analysis ofobserved chaotic data in physical systems. Rev. Mod. Phys. 65 (4), 13311392.

    Abrahart, R.J., See, L., 2002. Multi-model data fusion for river ow forecasting: anevaluation of six alternative methods based on two contrasting catchments.Hydrol. Earth Syst. Sci. 6 (4), 655670.

    ASCE Task Committee on Application of the Articial Neural Networks inHydrology, 2000. Articial neural networks in hydrology. II: Hydrologicapplications. J. Hydrol. Eng., ASCE 5 (2), 124137.

    Bowden, G.J., Dandy, G.C., Maier, H.R., 2005. Input determination for neural networkmodels in water resources applications: part 1 background and methodology.J. Hydrol. 301, 7592.

    Castellano-Mndeza, M., Gonzlez-Manteigaa, W., Febrero-Bande, M., ManuelPrada-Sncheza, J., Lozano-Caldern, R., 2004. Modeling of the monthly anddaily behavior of the runoff of the Xallas river using BoxJenkins and neuralnetworks methods. J. Hydrol. 296, 3858.

    Daubechies, I., 1992. Ten Lectures on Wavelets CSBM-NSF Series, ApplicationMathematics, vol. 61. SIAM Publication, Philadelphia, PA.

    Dawson, C.W., Wilby, R.L., 1999. A comparison of articial neural networks used forriver ow forecasting. Hydrol. Earth Syst. Sci. 3, 529540.

    Dawson, C.W., Wilby, R.L., 2001. Hydrological modeling using articial neuralnetworks. Prog. Phys. Geogr. 25 (1), 80108.

    De Vos, N.J., Rientjes, T.H.M., 2005. Constraints of articial neural networks forrainfallrunoff modeling: trade-offs in hydrological state representation andmodel evaluation. Hydrol. Earth Syst. Sci. 9, 111126.

    Elsner, J.B., Tsonis, A.A., 1997. Singular Spectrum Analysis: A New Tool in TimeSeries Analysis. Plenum Press, New York.

    Farmer, J.D., Sidorowich, J.J., 1987. Predicting chaotic time series. Phys. Rev. Lett. 59(4), 845848.(1), 6063.Kisi, O., 2005. Daily river ow forecasting using articial neural networks and auto-

    regressive models. Turkish J. Eng. Environ. Sci. 29, 920.Kisi, O., 2008. Constructing neural network sediment estimation models using a

    data-driven algorithm. Math. Comput. Simulat. 79, 94103.Kitanidis, P.K., Bras, R.L., 1980. Real-time forecasting with a conceptual hydrologic

    model, 2, applications and results. Water Resour. Res. 16 (6), 10341044.Kck, M., Agiral_Ioglu, N., 2006. Regression technique for stream ow prediction. J.

    Appl. Stat. 33 (9), 943960.Laio, F., Porporato, A., Revelli, R., Ridol, L., 2003. A comparison of nonlinear ood

    forecasting methods. Water Resour. Res. 39 (5), 1129. doi:10.1029/2002WR001551.

    Lee, C.F., Lee, J.C., Lee, A.C., 2000. Statistics for Business and Financial Economics(second nd version). World Scientic, Singapore.

    Legates, D.R., McCabe Jr., G.J., 1999. Evaluating the use of goodness-of-t measuresin hydrologic and hydroclimatic model validation. Water Resour. Res. 35 (1),233241.

    Lisi, F., Nicolis, Sandri, M., 1995. Combining singular-spectrum analysis and neuralnetworks for time series forecasting. Neural Process. Lett. 2 (4), 610.

    Mallat, S.G., 1989. A theory for multi-resolution signal decomposition: the waveletrepresentation. IEEE Trans. Pattern Anal. Machine Intell. (7), 674692.

    Marques, C.A.F., Ferreira, J., Rocha, A., Castanheira, J., Gonalves, P., Vaz, N., Dias, J.M.,2006. Singular spectral analysis and forecasting of hydrological time series.Phys. Chem. Earth 31, 11721179.

    Muttil, N., Chau, K.W., 2006. Neural network and genetic programming formodelling coastal algal blooms. Int. J. Environ. Pollut. 28 (3/4), 223238.

    Nash, J.E., Sutcliffe, J.V., 1970. River ow forecasting through conceptual models;part I a discussion of principles. J. Hydrol. 10, 282290.

    Partal, T., Cigizoglu, H.K., 2008. Estimation and forecasting of daily suspendedsediment data using waveletneural networks. J. Hydrol. (34), 317331.

    Partal, T., Kisi, O., 2007. Wavelet and Neuro-fuzzy conjunction model forprecipitation forecasting. J. Hydrol. (2), 199212.

    Newbold, P., Carlson, W.L., Thorne, B.M., 2003. Statistics for Business and Economics(fth version). Prentice Hall, Upper Saddle River, NJ.

    Raman, H., Sunilkumar, N., 1995. Multivariate modeling of water resources timeseries using articial neural networks. Hydrol. Sci. J. 40 (2), 145163.

    Sivakumar, B., Jayawardena, A.W., Fernando, T.M.K., 2002. River ow forecasting:use of phasespace reconstruction and articial neural networks approaches. J.Hydrol. 265 (1), 225245.

    Sivapragasam, C., Liong, S.Y., Pasha, M.F.K., 2001. Rainfall and discharge forecastingwith SSA-SVM approach. J. Hydroinformat. 3 (7), 141152.

    Sudheer, K.P., Gosain, A.K., Ramasastri, K.S., 2002. A data-driven algorithm forconstructing articial neural network rainfallrunoff models. Hydrol. Process.16, 13251330.

    Thirumalaiah, K., Deo, M.C., 2000. Hydrological forecasting using neural networks. J.Hydrol. Eng. 5 (2), 180189.

    Torrence, C., Compo, G.P., 1998. A practical guide to wavelet analysis. Bull. Am.Meteorol. Soc. 79, 6178.

    Vautard, R., Yiou, P., Ghil, M., 1992. Singular-spectrum analysis: a toolkit for short,noisy and chaotic signals. Physica D 58, 95126.

    Wang, W., van Gelder, P.H.A.J.M., Vrijling, J.K., Ma, J., 2006. Forecasting dailystreamow using hybrid ANN models. J. Hydrol. 324, 383399.

    Methods to improve neural network performance in daily flows predictionIntroductionStreamflow dataMethods Artificial neural networks Moving average Singular spectrum analysis Wavelet multi-resolution analysis (WMRA) Wavelet transformMulti-resolution analysis (MRA)

    ANNs integrated with data preprocessing techniquesANNANN-MAANN-SSA1 and ANN-SSA2ANN-WMRA1 and ANN-WMRA2

    Evaluation of model performances

    Application of models to the flow data Decomposition of daily flow data Decomposition by SSA Decomposition by WMRA

    Identification of the ANN architecture Implementation of models ANN-MA ANN-SSA1 (or ANN-WMRA1) ANN-SSA2 (or ANN-WMRA2)

    Forecasting results and discussion

    ConclusionsReferences