30
Provisional chapter Input Variable Selection in expert systems based on hybrid Gamma Test-Least Square Support Vector Machine, ANFIS and ANN models Akram Seifi and Hossien Riahi-Madvar Additional information is available at the end of the chapter http://dx.doi.org/10.5772/51210 Keywords Adaptive Network Fuzzy Inference System (ANFIS), Artificial Neural Network (ANN), Gamma Test (GT), Least Square Support Vector Machine (LS-SVM) Uncertainty Analysis, Expert System, Input Variable Selection 1. Introduction The plant water demand is explained and determined by Evapotranspiration (ET), which is the transferring of water to the atmosphere by transpiration and evaporation in a soil–plant system [3, 5]. Reference evapotranspiration (ET o ) is one of the major components of the hy‐ drologic cycle and its accurate estimation from irrigated surfaces is important for many wa‐ ter studies such as irrigation system design and management, crop yield simulation, and crop water requirement studies [18]. The ET o could measured directly by weighing lysime‐ ters, Bowen ratio-energy balance, eddy covariance systems or indirectly calculated by clima‐ tology data. However, it is not always possible to measure ET o as a routine direct measure at meteorological stations [35]. The direct method is not only time-consuming which needs precisely planned experiments, but also because of the limited area of a typical weather sta‐ tion, such enclosure does not provide enough fetch from a representative surface for these measurements to be meaningful [34]. Thus, indirect methods based on climatological data are used for estimating ET o [33]. Recently, FAO introduced the combination equation of Pen‐ man–Monteith that incorporates energy balance and aerodynamic theory modified by Allen et al [3] (FPM) for ETo estimations. FPM is a reference model with the best accuracy for esti‐ mating of ET o that is used for calibrating other models [3, 43, 45, 38]. Therefore, the ET o could calculate indirectly by weather data [12]. However, the limit number of meteorologi‐ © 2012 Seifi and Riahi-Madvar; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Provisional chapter

Input Variable Selection in expert systems based onhybrid Gamma Test-Least Square Support VectorMachine, ANFIS and ANN models

Akram Seifi and Hossien Riahi-Madvar

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51210

Keywords Adaptive Network Fuzzy Inference System (ANFIS), Artificial Neural Network(ANN), Gamma Test (GT), Least Square Support Vector Machine (LS-SVM) UncertaintyAnalysis, Expert System, Input Variable Selection

1. Introduction

The plant water demand is explained and determined by Evapotranspiration (ET), which isthe transferring of water to the atmosphere by transpiration and evaporation in a soil–plantsystem [3, 5]. Reference evapotranspiration (ETo) is one of the major components of the hy‐drologic cycle and its accurate estimation from irrigated surfaces is important for many wa‐ter studies such as irrigation system design and management, crop yield simulation, andcrop water requirement studies [18]. The ETo could measured directly by weighing lysime‐ters, Bowen ratio-energy balance, eddy covariance systems or indirectly calculated by clima‐tology data. However, it is not always possible to measure ETo as a routine direct measure atmeteorological stations [35]. The direct method is not only time-consuming which needsprecisely planned experiments, but also because of the limited area of a typical weather sta‐tion, such enclosure does not provide enough fetch from a representative surface for thesemeasurements to be meaningful [34]. Thus, indirect methods based on climatological dataare used for estimating ETo [33]. Recently, FAO introduced the combination equation of Pen‐man–Monteith that incorporates energy balance and aerodynamic theory modified by Allenet al [3] (FPM) for ETo estimations. FPM is a reference model with the best accuracy for esti‐mating of ETo that is used for calibrating other models [3, 43, 45, 38]. Therefore, the ETo

could calculate indirectly by weather data [12]. However, the limit number of meteorologi‐

© 2012 Seifi and Riahi-Madvar; licensee InTech. This is an open access article distributed under the terms ofthe Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permitsunrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 2: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

cal stations which have reliable meteorological data for application in FPM equation, even indeveloped countries and expensive installation of station equipments bounded the applica‐bility of FPM. On the other hand, using meteorological variables for estimating ETo is a diffi‐cult task for farmers [1]. Therefore, using simple methods that need less data are considered.In recent years, modern and mathematical methods have proposed as new modeling tools ofETo. There are many of these techniques such as SVM, ANFIS and ANN in this case and it isnecessary to asses which technique is most efficient for a particular application.

When there are not accurate data of lysimeter, the FPM approach is proposed as the standardand basic method for estimating ETo. FPM is used for evaluating the results of experimentaland simulator models such as ANN and ANFIS internationality. Manyresearchers have usedthe FPM approach as reference and standard method to assess the results of other indirectmethods [22, 39, 13, 7, 16, 17]. In these studies the input selection of ANN and ANFIS modelsare done by trial an error procedure and this is the major weakness of these studies.

Recently, a new simulation model based on the statistical learning theory that is called Sup‐port Vector Machines (SVMs) have emerged as a data-driven tool in some of complex andpractical fields [21, 47]. The SVM is based on structural risk minimization (SRM) principlewhich theoretically minimizes the expected error from a learning machine, so reduces theover fitting problem in comparison with ANN and ANFIS methods. Although, the SVM hasbeen used in application problems for a relatively short time, but it has been proven that thisintelligence machine is a robust and valuable algorithm for regression and classificationproblems [28]. Based on statistical learning theory, the SVM has many advantages than theback propagation networks (BPNs) which are used in ANN and ANFIS. Firstly, the SVMhas superior generalization ability., Secondly the SVM is capable to produce more accuratepredictions than the ANN. Thirdly, the architecture and weights of the SVM are unique andideal, while in ANN and ANFIS the weight and bias parameters are random. Thus, the SVMmodels are most robust. Finally, the SVM is trained more quickly that this advantage ishelpful for building efficient forecasting models in huge databases [20, 31].

Recently, application of SVM has attracted attention in hydrological engineering [15, 19, 28, 4,20, 23, 31, 9] compared ANN and SVM methods for estimating ETo in greenhouse that both ap‐proaches worked well, but the SVMs model worked better than the ANNs model. Moghadam‐nia et al [24] examined the abilities of SVM technique to improve the accuracy of daily waterevaporation estimation in the Chahnimeh reservoirs of Zabol at the south-east of Iran.

In developing of nonlinear simulation models the proper selection of input variables is achallenging problem. Therefore, a false combination of input variables could prevent thesimulation model from achieving the optimal solution. There are different methods for re‐ducing the number of input variables and selecting effective variables such as PrincipalComponent Analysis (PCA) [32], Gamma test (GT) and forward selection [26]. In this studythe GT was used as an advanced mathematically based technique for optimal selection ofinput variables in ANN, ANFIS and SVM. The Gamma Test and PCA techniques are usedfor evaluating weekly solid waste prediction on ANN operation [27]. The GT estimates theleast mean square error (MSE) that could achieved when modeling the unseen data usingany continuous nonlinear model.

Advances in Expert Systems2

Page 3: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

As mentioned above, The SVM model is a general state of pervasive machines (such asANN and RBF) which does classification without having data distribution model. So, theSVM model is preferable than ANN and ANFIS models. But selecting and finding out theinput variables into SVM, ANN, and ANFIS models is difficult and complicate, also thereisn’t any mathematical approach for it. In previous studies it is done using a classical trialand error approach. Another lack in previous studies comes from climatic comparisons ofintelligence ETo simulators. In other word three limitations of previous studies are: unro‐bustness results of ANN and ANFIS models, weakness in input vector selection and missingof climatic comparisons. So, in this paper, the accuracy of an advanced robust SVM modelnamed as Least Square Support Vector Machine (LSSVM) in modeling of ETo at three cli‐mates of Iran was compared with those of ANFIS and ANN. Also, optimum selection of in‐put variables for LSSVM, ANN and ANFIS models is done using GT technique. The threemain aims of this study were: 1-to develop an expert model based on LSSVM for estimatingETo at three different climates in Iran; 2-to use GT for best selection of variables combina‐tions in nonlinear simulator models such as LSSVM, ANN, and ANFIS instead of using clas‐sical trial and error methods; 3- to compare the accuracy of new developed expert models indifferent climates.

2. Materials and Methods

2.1. Area study and data description

In this study, the data of synoptic stations in three climates in Iran are used. The Dinpashohclimate classification method [6] used for selecting stations. In this method the climate classi‐fication is performed based on P/ETo, which P is long-time mean annual precipitation (mm)and ETo is long-time mean annual ETo (mm/year). Based on the calculated annual relativewetness, Iran was divided subjectively into three distinct climatic regions. These regionswere defined as: (i) humid climate, in which all located sites have P/ETo equal to or above1.0; (ii) arid and semi-arid climate, having sites with P/ETo between 0.1 and 1.0; (iii) extremearid climate, with P/ETo equal or less than 0.1 [6]. In this study, three climate and three sta‐tions in each climate were used (Fig.1). Table 1 lists the characteristics of 9 used stations inthis study.

Prevoius studies on ETo estimation methods have declared that the effective variables onETo are minimum temperature (Tmin), maximum temperature (Tmax) and dew point tempera‐ture (Tdew), wind speed (u2), relative humidity (RH), sunshine hours (n) and precipitation(P). Looking thoroughly the initial data showed that they were incomplete before 1982, sofor all stations the data from 1982-2003 were used. A precise measurement of meteorologicalvariables is important issue in ETo studies. So, it is necessary to investigate the accuracy ofmeteorological data. In this study, the developed method by Allen [2] is used and if themeasurements were under required accuracy it is necessary to correct them. In this paperevaluation of meteorological data were conducted using recommendations in guidelines ofFAO-56 Allen et al [3] and ASCE reports [2]. The FPM method was proposed as standard

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

3

Page 4: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

method for estimating of ETo at international level. This method is used for evaluating theresults of other methods when lysimeter measured data isn't available[22, 13, 39, 16, 7]. simi‐larly the FPM method was used for evaluating results of GT-LSSVM, GT-ANN and GT-AN‐FIS hybrid models.

climate P/ETo altitude latitude longitude station

extreme arid 0.09 1550.40 32˚37' N 51˚40' E Esfahan

extreme arid 0.09 1753.80 30˚15' N 56˚58' E Kerman

extreme arid 0.05 1370.00 29˚28' N 60˚53' E Zahedan

arid and semi-arid 0.21 999.20 36˚16' N 59˚38' E Mashhad

arid and semi-arid 0.5 1484.00 29˚32' N 52˚36' E Shiraz

arid and semi-arid 0.28 1741.50 34˚52' N 48˚32' E Hamadan

humid 1.42 -20.00 36˚54' N 50˚40' E Ramsar

humid 1.32 -6.90 37˚15' N 49˚36' E Rasht

humid 1.57 -20.90 36˚39' N 51˚30' E Noushahr

Table 1. Geographical location and climate stations studied

Figure 1. Location of selected stations in three climates

2.2. Support Vector Machines (SVM)

The SVM is one of the supervised learning methods and is a relatively new approach forfunction approximation and classification. In this section a brief introduction on statisticallearning theory is presented. The SVM introduced by Vapnik [42] for solving pattern recog‐

Advances in Expert Systems4

Page 5: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

nition and classification problem and is based on statistical learning theory. The SVM isused for classification problems such as handwrite detection, object detection, face recogni‐tion, sound classification, and it has shown considerable operation in comparison with othertechniques [28]. The SVM model finds good decision boundaries based on small subset of alltraining examples, which is called the support vector [28]. The most application of SVM hasbeen in Classification issues and less was used in prediction problems.

In modeling process the goal is to select a model from assumed space which is closest to thebase function in the target space. The errors occur from two cases: a) Approximation error:this error is a consequence of the assumed space which being smaller than the target space.So, the base function may locate outside the base space. This poor selection of model spacewill results in large approximation errors. b) Estimation error: this error will produce due tothe learning approach that will result to select non-optimal model (a local solution thanglobal ones) from assumed space. These two errors create the generalization error [11].SVMs can be applied as a regression problem by introducing an alternative loss function tohandle these errors and overcome these problems.

Suppose a set of l samples [(x1, y1), (x2, y2),..., (xl, yl)] which created from an unknown proba‐bility distribution P(x, y), where xi and yi are input vectors and corresponding output valuesin l space(i =1, 2, …, l), respectively. Approximating the relationship between input vectorand output vector creates a function as f(x,α), where α is the parameter vector of the func‐tion. The learning problem now is to select a function (learning machine) from f(x,α) Whichcan predict the output values of y accurately. The goal is to minimize the expected risk R(α)where the only available information is training set, so the best approximate function can bewritten as [46]:

R(α) = ∫L (y, f (x, α))dP(x, y) (1)

Where L(y, f(x, α)) is a measure of losses between the actual output y and the function out‐put f(x, α). The distribution of P(x, y) is unknown. Thus, it needs to minimize the expectedrisk R (α) by empirical risk minimization (ERM) method [Yu et al., 2006].

Remp(α) =1ℓ∑i=1

l(y − f (x, α))2 (2)

The Support vector machines regression (SVMR) is a generalization of support vector ma‐chines to estimate real-valued functions [41]. The conventional method for solving the prob‐lem of regression estimation is applying the ERM principle in Eq. (2), [46]. Because of thisspecific formulation both linear and nonlinear regressions can be performed. The regressionfunction for the linear cases is as f(w,b) = w.x + b . But, linear function approximation has limitpractical applications. Vapnik [41] showed that input vector x can be mapped into a featurespace with higher dimensional by a nonlinear function ф(x) and inner products (Fig. 2):

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

5

Page 6: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

f(w,x) = w.φ(x) + b (3)

Figure 2. mapping input vector into feature space with higher dimensional

Where, w and b are weights and biases of the regression function, which can be estimatedunder Karush-Khuan-Tucher (KKT) conditions. Now the nonlinear regression problem canbe expressed as the following optimization problem.

Minimize12 w 2 + C∑

i=1

l(ξi + ξi

*) (4)

Subject to

yi − (w.ϕ(xi) + b) ≤ ε + ξi

(w.ϕ(xi) + b) − yi ≤ ε + ξi*

ξi,ξi* ≥ 0,i = 1,2,...,l

(5)

Where ξi and ξi * are slack variables that specify upper and lower constraints on the outputsof the system with an error tolerance of ε, and c is positive constant which determines thedegree of penalized losses when a training error occurs (Fig. 3), [46].

So, the dual form of the nonlinear SVR can be expressed as following

Minimize12∑ij

l(ai − ai

*)(aj − aj*) φ(xi).φ(xj) + ε∑

i=1

l(ai + ai

*) −∑i=1

lyi(ai − ai

*) (6)

Subject to

Advances in Expert Systems6

Page 7: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

∑i=1

l(ai − ai

*) = 0

0 ≤ ai ≤ C , i = 1,2,..,l

0 ≤ ai* ≤ C , i = 1,2,..,l

(7)

Figure 3. Nonlinear SVMR with Vapnik’s ε-insensitive loss function

Where ϕ(xi).ϕ(xj) is the inner product of Ф (x i ) and Ф (x j ). a i and a i * are the Lagrangemultiplier [46]. Little information is available for selecting an appropriate nonlinear functionof Ф(x). As computing Ф (x i ) and Ф (x j ) is complicated in the feature space. Hence, kernelfunction k (xi, yj) = φ(xi).φ(xj) is introduced which is any function that satisfy Mercer’s the‐orem for producing inner products in feature space [46]. Finally, nonlinear SVMR functioncan be expressed as follows (Fig. 4).

f(xi) =∑i=1

l( − ai + ai

*)k(xi,xj) + b (8)

There are many types of kernel function in SVM that some popular of them are as fol‐lows. Equations 9 to 12 are Linear, Polynomial, Sigmoid, and Radial basis function ker‐nels respectively.

k(xi,xj) = xi.xj (9)

k(xi,xj) = [(xi.xj) + c]d (10)

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

7

Page 8: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

k (xi,xj) = tanh γ(xi.xj) + c (11)

k (xi.xj) = exp( − γxi − xj

2

2σ 2 ) (12)

Figure 4. The schematic structure of SVR

2.3. Least Square SVM

To find the final SVM model by solving a set of nonlinear equations, The Least Squares Sup‐port Vector Machines (LSSVM) was introduced by Suykens and Vandewalle [37]. TheLSSVM formulation have same constrains with the SVM model but it is performed betterthan the SVM method computationally. In this case, training needs to solve a set of linearfunctions instead of solving quadratic programming problem of the classical SVM model[15]. The LSSVM method is capable to solve both classification and approximation prob‐lems. The LSSVM method effectively reduces the complexity of algorithm. Also, the LSSVMmethod use from all training data for solving optimization problem and producing results,while the SVM method uses from support vectors [36]. In the LSSVM model, the trainingsamples are mapped to a kernel space and a regularization parameter is used to trade offbetween training error and a smooth solution which is the same for all samples [40]. TheLSSVM involves solving the following optimization problem.

Advances in Expert Systems8

Page 9: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Minimizeξ,w,bC2 ξ

2+

12 w2 (13)

Subject to

(w.ϕ(xi) + b) − yi + ξi = 0i = 1,2,...,M

(14)

For a kernel function and the regularization parameter C>0, the LSSVM function is given asfollowing [15]:

f(x) =∑i=1

ℓaik(xi.xj) + b (15)

2.4. Gamma Test

Finding out and selecting the most important and effective variables of a nonlinear and un‐known function is one of the most difficult steps in model development. For this purpose,the Gamma Test was used in this study. The GT is a non-linear modeling and analysis toolwhich can examine an assumed input/output relationship in a numerical data set. In es‐sence, the GT calculates the part of the output variance which cannot be accounted by anysmooth model based on the inputs, even though this model is unknown. A main advantageof this tool is its speed in large databases which consisting thousands points for data sets,While a single run of the GT takes a few seconds [14]. In a set of input-output data, The GTestimates the minimum mean square error (MSE) that achieve by a smooth model, this esti‐mate is called GT statistic. Suppose we have a set of data observations, {(xi,yi), 1 ≤ i ≤M}thatthe output y is determined by x input vectors, where xiєRm are vectors confined to someclosed bounded set CєRm; and yiєR is associated output scalar. In this method, relationshipbetween input-output can be written as:

y = f (x) + r (16)

Where f and r are a smooth function and a random variable, respectively. r represents noise(part of output which can not be calculated by any smooth model) [10]. The GT is an esti‐mate of the model output variance that cannot be calculated by a smooth data model. TheGT is based on the k th (1 ≤ k ≤ p) nearest neighbors x N[i,k] for each vector xi (1 ≤ i ≤ M). Specifi‐cally, the GT is derived from the delta function of the input vectors:

δM (k ) =1

M ∑i=1

M

| x(i) − xN i ,k | 2 1 ≤ k ≤ p (17)

Where |…| denotes Euclidean distance, and Gamma function is given as following �

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

9

Page 10: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

γM (k ) =1

2M ∑i=1

M

| yi − yN i ,k | 2 1 ≤ k ≤ p (18)

Where yN[i,k] is the corresponding y-value for the kth nearest neighbor of xi in Eq. (15). The GTis computed based on a Least Squares regression line which is constructed for p points(δM(k), γM (k)):

γ = Aδ + GT (19)

The intercept on the vertical axis (δ= 0) is the GT value, it can be shown that γM(k)→Var(r) inprobability is as γM(k)→0. The gradient of regression line provides the useful information oncomplexity of the system under study [24]. The GT offers an estimate of the best MSE ach‐ievable using a modeling technique for unknown smooth functions of continuous variables[10]. The GT is a mathematical algorithm, which reduces volume of model developmentwork and creates guidance for proper needed input data and most important variables be‐fore developing model. In this study, the GT is achieved through its implementation inMTLAB commercial software.

2.5. Artificial Neural Networks (ANN)

ANN is a proper mathematical structure having an interconpnected assembly of simpleprocessing elements or nodes [30]. In this study, a model based on a feed forward neuralnetwork with a single hidden layer was used and optimal number of hidden layer was gotthrough trial and error. The back propagation algorithm was used to train the network. Thedefault values of MATLAB software were used for the initial values of weights and bias.

2.6. Adaptive Neuro Fuzzy Inference System (ANFIS)

ANFIS statement is an adaptive fuzzy system which works based on combined abilities ofartificial neural networks and fuzzy logic. ANFIS has the ability of realization and percep‐tion of phenomenon without need for mathematical governing equations [31]. This system isa fuzzy Sugeno by a forwarding network structure. In this study, Gaussian membershipfunction was used for designing ANFIS networks and main training method is back propa‐gation. Number of membership functions for each variable were determined through trialand error.

2.7. Evaluation Criteria

In this study, comparison of the models results were done using coefficient of determination(R2), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), those formulas areas follows:

RMSE = (1N∑i=1

N(Pi − Oi)2)0.5 (20)

Advances in Expert Systems10

Page 11: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

MAE =1N∑i=1

N| Pi − Oi | (21)

R2 =∑i=1

N(Pi − i

−2

∑i=1

N(Pi − 2∑

i=1

N(Oi − 2

(22)

Where N is number of data, P i is predicted value, O i is observed value, is average of pre‐dicted values and is average of observed values. These statistics don’t show any informa‐tion about error distribution of predicted values. To eliminate this lack in modelingprocedure the Mean Absolute Relative Error (MARE) and Threshold Statistics (TS) wereused and models were evaluated using relative absolute error scattering graph. These twocriteria not only are statistical indices of performance as estimated values, but they alsoshow error distribution of model predictions. The TS x index shows error distribution in theestimated values for x% of the estimates. This index is defined as the percentage and is pre‐sented for different levels of absolute relative error. The TS value for x% of the estimates andMARE are calculated by:

T Sx =Yxn × 100 (23)

MARE = | Oi − EiOi

| × 100 (24)

Where Yx is total of computed ETo (out of n total computed) for which absolute relative erroris less than x% from the model [31]. The Oi is ETo value of FPM and Ei is corresponding esti‐mated value by expert models.

2.8. Model development strategy

Different combinations of input variables that affect ETo can be used for training and testingof LSSVM, ANN and ANFIS models. In this study, six variables that have been measured atmeteorological organization of Iran, including minimum, maximum, and dew point temper‐atures, relative humidity, sunshine hours and wind speed are used. Combining of these var‐iables is created 63 different combinations of input variables, also 30 combinations createdwith the placing solar radiation (Rs) instead of sunshine hours. Obviously, some of thesecombinations have more affects on ETo, while the others have less affects on it; thus selectionof proper combination of input variable is important for final model.

In previous studies on ETo a trial and error approach was used for input variable selection.Using total of these 93 combinations in a trial and error approach is time-consuming. On theother hand, there isn't any practical guidance about needed data set or how much data must

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

11

Page 12: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

be used to develop robust expert models for ETo predictions. So in this study, the GT techni‐que was used for determining minimum required data for nonlinear model developments ofETo and to specify the proper vector of input variables and the most important variables thataffect on ETo. Indeed in this study the new method of GT is combined with expert modelsand hybrid prediction models are developed. MATLAB software was used to write compu‐tational programs and develope learning and simulating algorithms.

The LSSVM model needs the parameter estimations like as C. The C parameter determinesthe degree of lightness and weight between the minimization of error and the smoothness ofestimated function. On the other hand, in this study three kernel functions of linear, RBF,and polynomial were used for nonlinear optimization problem. These functions have c andσ 2 calibration parameters, where their values should be determined. These parameters havenot specific values and aren't predetermined. They should be determining separately foreach compound. For this purpose, a series of exponential sequences of parameter values in‐cluding C = 2−5,2−3,...,215 and σ = 0.01,0.5,...,10 were used for each factor. The 10-fold gridsearch algorithm was used for finding the best ratio between these values. Since theamounts of used data were high, the optimization procedure was time-consuming. There‐fore, the model was trained using all the available combinations of these coefficients and thecombination that can cause the least amount of error was selected. The values of regulatoryparameters related to optimization problem and kernel functions were introduced with ma‐trix of input (combinations of meteorological variables) and output (ETo values calculatedfrom FPM function) training data and then the bias values was determined. Modeling wasperformed to predict the desired output values using matrices, parameters used in the pre‐vious stage and determining input matrix from training data. The number of data was highand the model has not the ability of training for these numbers of data. So to reduce runningtime, at first the training data (75 percent of total data) was divided into several parts andthe training was based on all available combinations of coefficients. The best value of coeffi‐cients which had the lowest value of error and the highest coefficient of determination wasdetermined for each part. Another time, the total of data was divided into two parts and foreach part 75 and 25 percent of data was selected for training and testing, respectively. Thetraining was based on the best values obtained in previous stage. By this way a k-fold algo‐rithm is used not only for parameter optimization of LSSVM, but also is used for training ofthe expert models.

3. Results and Discussion

In this section, results of Gamma Test, GT-LSSVM, GT-ANN and GT-ANFIS for estimatingETo at three climates are given separately in each climate. Finally a climatic comparison ofdeveloped hybrid models presented. The main aims in 3.1, 3.2 and 3.2 sections is to declarethe effectives of GT at first and then compare relative acuracct of hybrid expert models. Theaim in 3.4 section is to assess accuracy of developed hybrid models in different climatesstraight forwardly.

Advances in Expert Systems12

Page 13: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

3.1. The results of hybrid expert models in extremely arid climate

In this study, best combinations of input data were determined with Gamma Test to assesstheir influence on the ETo modeling. The Gamma is a determinant criterion that use in se‐lecting the best input combination of input variables. Among possible combinations of vari‐ables (93 combinations), the best combination can be determined by the least of the Gammavalue, which indicates a measure of the best MSE attainable using any modeling methodsfor unseen smooth functions of continuous variables. The best combinations which had thesmallest Gamma values for extremely arid climate are given in Table 2. In this table, the in‐put vector of Tmin, Tmax,Tdew, u2, n has the same input with FPM equation. In Table 2, the leastvalue of Gamma derived from input vector of Tmin, Tmax, RH, Rs, u2 in three stations. Theclose value of Gamma in combinations of 1, 2, 3, and 4 show importance of Tmin, Tmax, u2 andRs variables in estimating ETo. Also, the high value of Gamma in combination of 5 relative toother combinations shows importance of Rs than sunshine hours. It is noticeable that in table2 only the best values of GT presented and those which have the larger values for Gammaindex doesn't include.

Input variablesGamma value

Kerman Esfahan Zahedan

(1)Tmin, Tmax,Tdew, RH, Rs, u2 0.00109 0.00117 0.00155

(2)Tmin, Tmax,Tdew, Rs, u2 0.00103 0.00125 0.00158

(3) Tmin, Tmax, RH, Rs, u2 0.00099 0.00112 0.00142

(4) Tmax,Tdew, RH, Rs, u2 0.00104 0.00115 0.00143

(5) Tmin, Tmax,Tdew, u2, n 0.04943 0.00755 0.00961

Table 2. The Gamma results on the input variables in extremely arid climate

The architectures of the GT-LSSVM (c and σ2 coefficients), GT-ANN (NA: network architec‐ture) and GT-ANFIS (NMF: number of membership functions for each variable) hybridmodels use for the best combinations in three stations of extremely arid climate given inTable 3. Three types of epsilon-SVR kernel functions for estimating ETo were compared withvalues of ETo measured by FPM equation numerically. The RBF kernel function had the bestresults in comparison with linear and polynomial kernel functions and it is superior to GT-ANN and GT-ANFIS hybrid models as given in Table 3. When the results of all the methodswere averaged on stations, more accuracy obtained for GT-LSSVM-RBF, GT-LSSVM-Poly‐nomial, GT-ANFIS, GT-ANN and GT-LSSVM-Linear hybrid models, respectively. Fig. 5shows the distribution of errors in train and test steps at different threshold levels for GT-LSSVM, GT-ANFIS and GT-ANN hybrid models for Kerman station. It can be observedfrom Fig. 5a that the AARE for the GT-LSSVM-RBF hybrid model is at least significantly(30%) in comparison with GT-ANN (41%) and GT-ANFIS (70%) hybrid models in train step.About 98% of the forecasted values for the best combination of GT-LSSVM-RBF, GT-ANNand GT-ANFIS hybrid models had the estimation error of 14%, 21% and 32%, respectively.

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

13

Page 14: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Also, Fig.5b shows that the AARE for the GT-LSSVM-RBF hybrid model is significantly atleast (36%) in comparing with GT-ANN (57%) and GT-ANFIS (91%) hybrid models duringtest stage. About 98% of the forecasted values for the best combination of GT-LSSVM-RBF,GT-ANN and GT-ANFIS hybrid models had the estimation error of 16%, 32% and 22%, re‐spectively. So, an improved AARE without significant reduction in global evaluation statis‐tics suggests the potential of the GT-LSSVM-RBF in comparison with GT-ANN and GT-ANFIS hybrid models for estimating ETo. Same as Fig.5, in Figs.6 and 7 it is declared that theGT-ANFIS and GT-LSSVM-RBF hybrid models have the least error in Esfahan and Zahedanstations, respectively. Fig. 8 shows trends of observed (ETo-FPM) and simulated (ETo-GT-LSSVM-RBF) values at kerman station as a sample of extremely arid climate. Also, in thisfigure error chart of estimated values by GT-LSSVM is shown. It is remarkable that the ETo

estimated values by GT-LSSVM-RBF hybrid model has good trend than the other in com‐parison with the ETo values measured by FPM equation.

Station ModelTraining Testing Optimized

ParameterR2 RMSE MAE R2 RMSE MAE

GT-LSSVM-RBF 0.996 0.115 0.082 0.996 0.124 0.096 C=512, σ2=16

GT-LSSVM-Poly. 0.992 0.166 0.124 0.993 0.148 0.115 C=512

Kerman GT-LSSVM-Linear 0.950 0.416 0.309 0.961 0.290 0.288 C=0.125

(3)* GT-ANN 0.994 0.153 0.121 0.995 0.367 0.300 NA**: 5-11-1

GT-ANFIS 0.990 0.218 0.161 0.992 0.165 0.125 NMF***: 8

GT-LSSVM-RBF 0.995 0.125 0.088 0.995 0.125 0.091 C=512, σ2=16

GT-LSSVM-Poly. 0.992 0.157 0.118 0.992 0.155 0.117 C=512

Esfahan GT-LSSVM-Linear 0.959 0.348 0.262 0.959 0.302 0.313 C=512

(3)* GT-ANN 0.992 0.173 0.137 0.993 0.203 0.174 NA**: 5-11-1

GT-ANFIS 0.990 0.187 0.136 0.991 0.185 0.136 NMF***: 7

GT-LSSVM-RBF 0.995 0.114 0.080 0.995 0.120 0.091 C=512

GT-LSSVM-Poly. 0.991 0.151 0.111 0.993 0.142 0.109 C=512

Zahedan GT-LSSVM-Linear 0.947 0.363 0.270 0.951 0.293 0.283 C=512

(3)* GT-ANN 0.992 0.269 0.232 0.994 0.350 0.278 NA**: 5-14-1

GT-ANFIS 0.992 0.145 0.111 0.994 0.157 0.115 NMF***: 10

*: number of combination from Table 1.**: Architecture of network***: Number of membership func‐tions for each parameter

Table 3. The results of GT-LSSVM, GT-ANN and GT-ANFIS hybrid models in extremely arid climate

Advances in Expert Systems14

Page 15: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Figure 5. Error distribution of Train and test steps at Kerman station

Figure 6. Error distribution of Train and test steps at Esfahan station

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

15

Page 16: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Figure 7. Error distribution of Train and test steps at Zahedan station

Figure 8. Changes process of observed (ETo-FPM) and estimated (ETo-GT-LSSVM-RBF) ETo in extremely arid climate

3.2. The results of hybrid expert models in arid and semi-arid climate

In this section the results of GT and hybrid models in ard and semi-arid climate arepresented. In table 4 the GT results presented. The smallest values of Gamma are de‐rived from input vector composed of Tmin, Tmax, RH, Rs, u2 variables in Shiraz station,Tmin, Tmax,Tdew, RH, Rs, u2 variables in Hamedan station and Tmin, Tmax,Tdew, Rs, u2 varia‐

Advances in Expert Systems16

Page 17: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

bles in Mashhad station. The close values of Gamma in 1, 2, 3, and 4 combinationsshow importance of RH, Tmax, Tmin, u2 and Rs variables in arid and semi-arid climate.The high value of Gamma in combination of 5 than other combinations shows impor‐tance of Rs than sunshine hours.

Input variablesGamma value

Shiraz Hamedan Mashhad

(1)Tmin, Tmax,Tdew, RH, Rs, u2 0.00088 0.00118 0.00107

(2)Tmin, Tmax,Tdew, Rs, u2 0.00100 0.00129 0.00089

(3) Tmin, Tmax, RH, Rs, u2 0.00087 0.00142 0.00105

(4) Tmax,Tdew, RH, Rs, u2 0.00088 0.00131 0.00115

(5) Tmin, Tmax,Tdew, u2, n 0.01632 0.00710 0.00526

Table 4. The Gamma results on the input variables in arid and semi-arid climate

Station ModelTraining Testing Optimized

ParameterR2 RMSE MAE R2 RMSE MAE

GT-LSSVM-RBF 0.996 0.107 0.074 0.996 0.127 0.092 C=256, σ2=12.5

GT-LSSVM-Poly. 0.994 0.138 0.103 0.994 0.149 0.117 C=512

Shiraz GT-LSSVM-Linear 0.971 0.309 0.232 0.972 0.53 0.249 C=512

(3)* GT-ANN 0.992 0.275 0.177 0.996 0.311 0.54 NA**: 5-9-1

GT-ANFIS 0.995 0.128 0.093 0.995 0.134 0.101 NMF***: 9

GT-LSSVM-RBF 0.994 0.136 0.099 0.995 0.148 0.109 C=512, σ2=25

GT-LSSVM-Poly. 0.992 0.164 0.126 0.992 0.183 0.139 C=1

Hamedan GT-LSSVM-Linear 0.936 0.464 0.363 0.941 0.408 0.414 C=512

(1)* GT-ANN 0.992 0.166 0.128 0.993 0.242 0.206 NA**: 6-14-1

GT-ANFIS 0.993 0.148 0.109 0.993 0.161 0.119 NMF***: 11

GT-LSSVM-RBF 0.996 0.136 0.098 0.995 0.152 0.114 C=128, σ2=12.5

GT-LSSVM-Poly. 0.994 0.158 0.120 0.994 0.169 0.134 C=0.5

Mashhad GT-LSSVM-Linear 0.959 0.430 0.335 0.964 0.342 0.348 C=512

(2)* GT-ANN 0.994 0.163 0.126 0.995 0.318 0.53 NA**: 5-28-1

GT-ANFIS 0.994 0.165 0.123 0.993 0.178 0.132 NMF***: 7

*: number of combination from Table 2.**: Architecture of network***: Number of membership func‐tions for each parameter

Table 5. The results of GT-LSSVM, GT-ANN and GT-ANFIS hybrid models in arid and semi-arid climate

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

17

Page 18: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

As given in Table 5, The RBF kernel function in comparison with linear and polynomial ker‐nels and GT-ANN and GT-ANFIS hybrid models have more accurate results. When the re‐sults of all the methods were averaged on these three stations in arid and semi-arid climate,the most accuracy obtained by GT-LSSVM-RBF, GT-ANFIS, GT-LSSVM-Polynomial, GT-ANN and GT-LSSVM-Linear hybrid models in Shiraz and Hamedan stations, respectively.But performance of GT-LSSVM-Polynomial hybrid model was better than GT-ANFIS hybridmodel at Mashhad station. Figs. 9 to 11 show distribution of errors at different threshold lev‐els in train and test steps for GT-LSSVM-RBF, GT-ANFIS and GT-ANN hybrid models inthree stations of arid and semi-arid climate. About 98% of the forecasted values for the bestcombination of GT-LSSVM-RBF, GT-ANN and GT-ANFIS hybrid models have the estima‐tion error of 13%, 17% and 16% in train step, respectively (Fig. 9a) at Shiraz station and intest step this values are 14%, 17% and 16%, respectively (Fig. 9b). So, it is remarkable thatGT-LSSVM-RBF hybrid model has less error curve at Shiraz station for the best combina‐tions than the other hybrid models. Same as the Fig.9, Fig.10 shows that GT-LSSVM-RBF hy‐brid model has the least error in both train and test step at Mashhad station. Also, the GT-ANFIS and GT-ANN hybrid models have less error than the other hybrid models atHamedan station in train and test steps, respectively (Fig. 11). Fig. 12 shows that the estimat‐ed ETo values by GT-LSSVM-RBF hybrid model has superior trend than the others, in com‐parison with ETo values measured by FPM equation.

Figure 9. Error distribution of Training and testing steps at Shiraz station

Advances in Expert Systems18

Page 19: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Figure 10. Error distribution of Training and testing steps at Hamedan station

Figure 11. Error distribution of Training and testing steps at Mashhad station

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

19

Page 20: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Figure 12. Changes process of observed (ETo-FPM) and estimated (ETo-GT-LSSVM-RBF) evapotranspiration in arid andsemi-arid climate

3.3. The results of hybrid expert models in humid climate

The results of developed models in humid climate are presented in this section. In table 6,the smallest value of Gamma is derived with Tmin, Tmax,Tdew, Rs, u2 variables combination atRamsar and Noushahr stations and Tmin, Tmax,Tdew, RH, Rs, u2 variables combination at Rashtstation that introduce them as the best combinations in these stations. The higher Gammavalue for Tmin, Tmax,Tdew, u2, n (FPM combination) than the best combinations in these stationsshow importance of Rs variable relative to sunshine hours.

Input variablesGamma value

Ramsar Rasht Noushahr

(1)Tmin, Tmax,Tdew, RH, Rs, u2 0.00255 0.00182 0.00202

(2)Tmin, Tmax,Tdew, Rs, u2 0.00233 0.00200 0.00185

(3) Tmin, Tmax, RH, Rs, u2 0.00278 0.00213 0.00229

(4) Tmax,Tdew, RH, Rs, u2 0.00351 0.00244 0.00330

(5) Tmin, Tmax,Tdew, u2, n 0.01728 0.01373 0.01398

Table 6. The Gamma results on the input variables in arid and semi-arid climate

The RBF kernel function in comparison with linear and polynomial kernels and GT-ANNand GT-ANFIS hybrid models is better for estimating ETo as given in Table 7. When the re‐sults of all the methods were averaged on these three stations, more accuracy obtained forGT-LSSVM-RBF, GT-LSSVM-Polynomial, GT-ANFIS, GT-ANN and GT-LSSVM-Linear hy‐brid models at Rasht and Noushahr stations, respectively. But the performance of GT-AN‐FIS was better than GT-LSSVM-Polynomial hybrid model at Ramsar station. Figures 13 to 15

Advances in Expert Systems20

Page 21: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

show the distribution of errors at different threshold levels for GT-LSSVM-RBF, GT-ANFISand GT-ANN hybrid models in train and test steps for three stations. About 98% of the fore‐casted values for the best combination of GT-LSSVM-RBF, GT-ANN and GT-ANFIS hybridmodels have the estimation error of 32%, 44% and 35% in train step, respectively (Fig. 13a)at ramsar station and in test step this values are 26%, 29% and 28%, respectively (Fig. 13b).So, it is remarkable that GT-LSSVM-RBF hybrid model has less error curve at Ramsar stationfor the best combinations than the other hybrid models. Same as the Fig.13, Fig.14 showsthat GT-LSSVM-RBF hybrid model has the least error in both train and test step at Rasht sta‐tion. Also, the GT-ANN and GT-LSSVM-RBF hybrid models have less error than the otherhybrid models at Noushahr station in train and test steps, respectively (Fig. 15),. Fig. 16shows the ETo estimated values by GT-LSSVM-RBF hybrid model has good trend than theETo measured by FPM equation.

3.4. Comparative assessment of hybrid GT-expert models in different climates

In This section the performances of developed hybrid expert models compare in three cli‐mates. The aim of this section is further detailed climatic comparison of models. The Gam‐ma Test was conducted in 9 stations that located at three climates to determine the bestcombination of input variables in estimation of ETo. The Gamma values of four selected bestcombinations in stations are shown in Fig. 17. Stations that are located at humid climateshowed the Gamma values bigger than those located in extremely arid, and arid and semi-arid climates. Gamma values of two extremely arid, and arid and semi-arid climates showednearly trend, but the aggregation of Gamma values at arid and semi-arid climate were morethan extremely arid climate. These results express that the Gamma Test showed different re‐sults in different climates, as expected from physical aspects of ETo process.

Station Model Training TestingOptimized

Parameter

R2 RMSE MAE R2 RMSE MAE

GT-LSSVM-RBF 0.990 0.130 0.091 0.991 0.143 0.101 C=512, σ2=16

GT-LSSVM-Poly. 0.986 0.157 0.122 0.988 0.162 0.128 C=0.062

Ramsar GT-LSSVM-Linear 0.956 0.279 0.218 0.957 0.55 0.53 C=512

(2)* GT-ANN 0.966 0.291 0.213 0.988 0.228 0.487 NA**: 5-22-1

GT-ANFIS 0.987 0.147 0.107 0.988 0.156 0.115 NMF**: 9

GT-LSSVM-RBF 0.993 0.128 0.088 0.991 0.143 0.095 C=256, σ2=16

GT-LSSVM-Pol. 0.989 0.159 0.120 0.988 0.172 0.129 C=512

Rasht GT-LSSVM-Linear 0.960 0.304 0.229 0.954 0.276 0.261 C=8

(1)* GT-ANN 0.982 0.240 0.187 0.991 0.210 0.164 NA**: 6-22-1

GT-ANFIS 0.987 0.171 0.118 0.985 0.187 0.128 NMF**: 10

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

21

Page 22: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Station Model Training TestingOptimized

Parameter

GT-LSSVM-RBF 0.993 0.119 0.081 0.993 0.125 0.087 C=512, σ2=16

GT-LSSVM-Poly. 0.988 0.150 0.117 0.990 0.151 0.119 C=0.031

Noushahr GT-LSSVM-Linear 0.958 0.285 0.225 0.957 0.51 0.53 C=512

(2)* GT-ANN 0.990 0.162 0.121 0.992 0.167 0.133 NA**: 6-18-1

GT-ANFIS 0.989 0.151 0.110 0.990 0.166 0.119 NMF**: 7

*: number of combination from Table 3.**: Architecture of network***: Number of membership functionsfor each parameter

Table 7. The results of GT-LSSVM, GT-ANN and GT-ANFIS hybrid models in humid climate

Figure 13. Error distribution of Train and test steps at Ramsar station

In Fig.18 the results of three GT-LSSVM-RBF, GT-ANN and GT-ANFIS hybrid models wereused for estimating ETo at three climates are shown. Comparison of the GT-LSSVM-RBF hy‐brid model results at three climates show that this hybrid model has less error and done themore accurate estimations at extremely arid climate than the other climates. But RMSE val‐ues of GT-LSSVM-RBF hybrid model are not impressive, so this hybrid model can predictETo values accurately at three climates. GT-ANFIS and GT-ANN hybrid models created less

Advances in Expert Systems22

Page 23: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

error values at arid and semi-arid, and humid climates, respectively. Also, the maximumamount of RMSE in GT-LSSVM-RBF, GT-ANN and GT-ANFIS hybrid models at three cli‐mates are 0.152, 0.367 and 0.187, respectively.

Figure 14. Error distribution of Train and test steps at Rasht station

4. Summary and conclusion

This study performed to evaluate performance of a novel hybrid model based on GammaTest and Least Square Support Vector Machine (GT-LSSVM) for estimating ETo at 9 stations.These stations locate in extremely arid, arid and semi-arid, and humid climates in Iran. Themain aims of this study were: 1-to develop a new hybrid GT-LSSVM model to estimate ETo

in three different climates in Iran; 2-to use GT for optimum selection of input variable com‐binations in nonlinear simulator models such as LSSVM, ANN and ANFIS; 3-to compare de‐veloped expert models results in different climates. The GT was performed for finding thebest combination in ETo estimation. A K-fold algorithm is combined with LSSVM model toreinforce modeling strategy. The results of GT showed that three combinations of Tmin,Tmax,Tdew, RH, Rs, u2; Tmin, Tmax,Tdew, Rs, u2 and Tmin, Tmax, RH, Rs, u2 were better than the othercombinations for estimating ETo in these three climates. These combinations had the smallerGamma values than the Tmin, Tmax,Tdew, u2, n combination (FPM) that it show importance of

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

23

Page 24: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Rs variable than the sunshine hours variable for estimating ETo. The GT-LSSVM hybridmodel with RBF kernel performed better than the GT-LSSVM hybrid model with polyno‐mial and linear kernels at all weather stations. The ETo estimates by GT-LSSVM, GT-ANNand GT-ANFIS hybrid models compared and it is concluded that more accuracy obtainedfor SVM-RBF, GT-LSSVM-Polynomial, GT-ANFIS, GT-ANN and GT-LSSVM-Linear com‐bine models, respectively. But the GT-ANFIS hybrid model was better than the GT-LSSVM-Polynomial hybrid model at Shiraz, Hamedan and Ramsar stations. Also, the GT-LSSVM-RBF and GT-ANN hybrid models had the best results in extremely arid climate. In arid andsemi-arid climate, the best results derived by GT-ANFIS hybrid model. The graphs of pre‐diction error distribution of GT-LSSVM-RBF, GT-ANN and GT-ANFIS hybrid models weredrew for the best combinations in any station on training and tasting data. The GT-LSSVM-RBF hybrid model created less error distribution at most stations. The Gamma Test showeddifferent results in different climates and the GT-LSSVM-RBF hybrid model recognized asthe best models at three climates. The innovative hybrid approach used in this study devel‐ops a scientific and progressive advanced framework for input vector selection of non linearphenomenon simulators, without need to know the governing equation or model, like asSVM, ANFIS, ANN and etc.

Figure 15. Error distribution of Train and test steps at Noushahr station

Advances in Expert Systems24

Page 25: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Figure 16. Changes process of observed (ETo-FPM) and estimated (ETo-GT-LSSVM-RBF) ETo in humid climate

Figure 17. The Gamma values in three climates

Figure 18. Comparing a) GT-LSSVM-RBF, b) GT-ANN and c) GT-ANFIS hybrid models at three climates

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

25

Page 26: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Author details

Akram Seifi1 and Hossien Riahi-Madvar2

*Address all correspondence to: [email protected]

1 Modares University, Tehran, Iran

2 Vali-e-Asr University, Rafsanjan, Iran

References

[1] Alam, M., & Trooien, T. P. (2001). Estimating reference evapotranspiration with anatmometer. Applied in Engineering Agriculture, 17, 153-158.

[2] Allen, R.G. (1996). Assessing integrity of weather data for reference evapotranspira‐tion estimation. Irrigation and Drainage Engineering, 122, 97-106.

[3] Allen, R. G., Preira, L. S., Raes, D., & Smith, M. (1998). Crop evapotranspirationguidelines for computing crop water requirement. FAO Irrigation and Drainage Paper,Rome, Italy [56].

[4] Behzad, M., Asghari, K., Eazi, M., & Pallhang, M. (2009). Generalization performanceof support vector machines and neural networks in runoff modeling. Expert Systemswith Applications, 36, 7624-7629.

[5] Bhantana, P., & Lazarovitch, N. (2010). Evapotranspiration, crop coefficient andgrowth of two young pomegranate (Punica granatum L.) varieties under salt stress.Agricultural Water Management, 97, 715-722.

[6] Dinpashoh, Y. (2006). Study of reference crop evapotranspiration in I.R. of Iran. Agri‐cultural Water Management, 84, 123-129.

[7] Dogan, E. (2008). Reference evapotranspiration estimation using adaptive neuro-fuz‐zy inference system. Irrigation and Drainage, 10.1002/ird.445.

[8] Durrant, P. J. (2001). WinGammaTM: a non-linear data analysis and modeling toolwith applications to flood prediction. PhD thesis, Department of Computer Science, Car‐diff University, Wales, UK.

[9] Eslamian, S. S., Abedi-Koupai, J., Amiri, , & Gohari, S. A. (2009). Estimation of dailyreference evapotranspiration using support vector machines and artificial neural net‐works in greenhouse. Environmental Sciences, 4, 439-447.

[10] Evans, D., & Jones, A. J. (2002). A proof of the gamma test. The Royal Society, 458,2759-2799.

Advances in Expert Systems26

Page 27: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

[11] Gunn, S. R. (1998). Support vector machine for classification and regression. TechnicalReport, University of Southampton.

[12] Hou, L. G., Xiao, H. L., Si, J. H., Xiao, S. C., Zhou, M. X., & Yang, Y. G. (2010). Evapo‐transpiration and crop coefficient of Populus euphratica Oliv forest during the grow‐ing season in the extreme arid region northwest China. Agricultural WaterManagement, 97, 351-356.

[13] Jain, S. K., Nayak, P. C., & Sudheer, K. P. (2008). Models for estimating evapotranspi‐ration using artifical neural networks, and their physical interpretation. HydrologicalProcesses, 22, 2225-2234.

[14] Jones, A. J. (2004). New tools in non-linear modeling and prediction. ComputationalManagement Science, s10287-003-0006-1, 109-149.

[15] Khemchandani, R., Jayadeva, S., & Chandra, S. (2009). Regularized least squares fuz‐zy support vector machine time series forecasting. Expert System with Application, 36,132-138.

[16] Kisi, O. (2007). Evapotranspiration modelling from climatic data using a neural com‐puting technique. Hydrological Processes, 21, 1925-1934.

[17] Kizi, O., & Ozturk, O. (2007). Adaptive neurofuuzy computing technique for evapo‐transpiration estimation. Irrigation and Drainage Engineering, 133(4), 368-379.

[18] Kumar, M., Raghuwanshi, N. S., Singh, R., Wallender, W. W., & Pruitt, W. O. (2002).Estimating evapotranspiration using Artificial Neural Network. Irrigation and Drainage En‐gineering, 128, 224-233.

[19] Li, G. F., Chen, G. R., Huang, P. Y., & Chou, Y. C. (2009). Support vector machinebased models for hourly reservoir inflow forecasting during typhoon-warning peri‐ods. Hydrology, 372, 17-29.

[20] Lin, P. H., Kwon, H. H., Sun, L., Lall, U., & Kao, J. J. (2009). A modified support vec‐tor machine based prediction model on stream flow at the shihmen reservoir. Taiwan.Climatology, 10.1002/joc.1954.

[21] Liong, S. Y., & Sivapragasam, C. (2002). Flood stage forecasting with support vectormachines. Journal of the American Water Resources Association, 38, 173-186.

[22] Marti, P., Royuela, A., Manzano, J., & Palau-Salvador, G. (2010). Generalization ofETo ANN models through data supplanting. Irrigation and Drainage Engineering, 136,161-174.

[23] Misra, D., Oommen, Th., Agarwal, A., Mishra, S. K., & Thompson, A. M. (2009). Ap‐plication and analysis of support vector machine based simulation for runoff andsediment yield. Biosystems Engineering, 103, 527-535.

[24] Moghadamnia, A., Ghafari, M., Piri, J., & Han, D. (2008). Evaporation estimation us‐ing support vector machines technique. Engineering and Technology, 33, 14-22.

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

27

Page 28: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

[25] Nayak, P. C., Sudheer, K. P., Rangan, D. M., & Ramasatri, K. S. (2004). A neuro-fuzzycomputing technique for modeling hydrological time series. Hydrology, 291, 52-66.

[26] [26], Noori. R., Karbassi, A. R., Moghaddamnia, A., Han, D., Zokaei-Ashtiani, M. H.,Farokhnia, A., & Ghafari, M. (2011). SVM model performance using PCA, Gammatest, and forward selection techniques for monthly stream flow prediction. Hydrology,10.1016/j.jhydrol.2011.02.021.

[27] Noori, R., Karbassi, A., & Sabahi, M. S. (2010). Evaluation of PCA and Gamma testtechniques on ANN operation for weekly solid waste prediction. Environmental Man‐agement, 91, 767-771.

[28] Pai, P. F., & Hong, W. C. (2007). A recurrent support vector regression model in rain‐fall forecasting. Hydrological Process, 21, 819-827.

[29] Remesan, R., Shamim, M. A., & Han, D. (2008). Model data selection using gammatest for daily solar radiation estimation. Hydrological Processes, 22, 4301-4309.

[30] Riahi-Madvar, H., Ayyoubzadeh, S. A., & Gholizadeh, Atani. M. (2011). Developingan expert system for predicting alluvial channel geometry using ANN Original. Ex‐pert Systems with Applications, 38, 215-222.

[31] Riahi-Madvar, H., Ayyoubzadeh, S. A., Khadangi, E., & Ebadzadeh, M. M. (2009). Anexpert system for predicting longitudinal dispersion coefficient in natural streams byusing ANFIS. Expert Systems with Applications, 36, 8589-8596.

[32] Seifi, A., Mirlatifi, S. M., & Riahi, H. (2011). Developing a combined model of multi‐ple Linear Regression-Principal Component and Factor Analysis (MLR-PCA) for esti‐mation of reference evapotranspiration (case study: Kerman station). Water and Soil,24(6), 1186-1196.

[33] Sentelhas, P. C., Gillespie, T. J., & Santos, E. A. (2010). Evaluation of FAO Penman-Monteith and alternative methods for estimating reference evapotranspiration withmissing data in Southern Ontario, Canada. Agricultural Water Management, 97,635-644.

[34] Stanhill, G. (2002). Is the class A evaporation pan still the most practical and accuratemeteorological method for determining irrigation water requirements? Agriculturaland Forest Meteorology, 112, 233-236.

[35] Strangeways, I. (2001). Back to basics: the’met.enclosure’. Part 7. Evaporation. Weather,56, 419-427.

[36] Suykens, A. K., & Vandewalle, J. (2000). Recurrent least squares support vector ma‐chines. IEEE Trans. Circuits Systems.

[37] Suykens, J. A. K., & Vandewalle, J. (1999). Least Squares Support Vector MachineClassifiers. Kluwer Academic Publishers. Printed in the Netherlands.

Advances in Expert Systems28

Page 29: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

[38] Temesgen, B., Eching, S., Davidoff, B., & Frame, K. (2005). Comparison of some refer‐ence evapotranspiration equations for California. Irrigation and Drainage Engineering.ASCE, 131, 73-84.

[39] Traore, S., Wang, Y. M., & Kerh, T. (2010). Artifical neural network for modeling ref‐erence evapotranspiration complex process in Sudano-Sahelian zone. AgriculturalWater Management, 97, 707-714.

[40] Valyon, J., & Horvath, G. (2005). A robust LS-SVM LSSVMregression. World Academyof Science, Engineering and Technology, 7, 148-153.

[41] Vapnik, V. (2000). The Nature of Statistical Learning Theory, Springer-Verlag, NewYork.

[42] Vapnik, V. N. (1992). Principles of risk minimization for learning theory. Advance inNeural Information Processing Systems, 4, Denver, Morgan Kaufmann.

[43] Walter, I. A., Allen, R. G., Elliott, R., Jensen, M. E., Itenfisu, D., Mecham, B., Howell,T. A., Snyder, R., Brown, P., Echings, S., Spofford, T., Hattendorf, M., Cuenca, R. H.,Wright, J. L., & Martin, D. (2000). ASCE’s standardized reference evapotranspirationequation. Proceedings of the 4th National Irrigation Symposium, ASAE, Phoenix, AZ.

[44] Wang, D., Wang, M., & Qiao, X. (2009). Support vector machines regression andmodeling of greenhouse environment. Computers and Electronics in Agriculture, 66,46-52.

[45] Wright, J. L., Allen, R. G., & Howell, T. A. (2000). Conversion between evapotranspi‐ration references and methods. Proceedings of the 4th National Irrigation Symposium,ASAE, Phoenix, AZ.

[46] Yu, P. S., Chen, S. T., & Chang, I. F. (2006). Support vector regression for real-timeflood stage forecasting. Hydrology, 328, 704-716.

[47] Yu, X. Y., & Liong, S. Y. (2007). Forecasting of hydrology time series with ridge re‐gression in feature space. Journal of Hydrology, 332, 290-302.

Input Variable Selection in expert systems based on hybrid Gamma Test-Least Squarehttp://dx.doi.org/10.5772/51210

29

Page 30: Input Variable Selection in expert systems based on hybrid …cdn.intechopen.com/pdfs/38190/InTech-Input_varilable... · 2012-08-06 · Provisional chapter Input Variable Selection

Advances in Expert Systems30