Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Inter-comparison of spatial estimation schemes for precipitation and
temperature
Yeonsang Hwang1,2,Martyn Clark2,Balaji Rajagopalan1,2, Subhrendu Gangopadhyay1,2 , and
Lauren E. Hay3
1Department of Civil, Environmental and Architectural Engg., University of Colorado, Boulder,
CO
2Co-operative Institute for research in Environmental Sciences (CIRES), University of Colorado,
Boulder, CO
3U.S. Geological Survey, Denver, CO
Abstract
Distributed hydrologic models typically require spatial estimates of precipitation and temperature
from sparsely located observational points to the specific grid points. We compare and contrast
the performance of several statistical methods for the spatial estimation in two climatologically
and hydrologically different basins. The seven methods assessed are: (1) Simple Average; (2)
Inverse Distance Weight Scheme (IDW); (3) Ordinary kriging; (4) Multiple Linear Regression
(MLR); (5) PRISM (Parameter-elevation Regressions on Independent Slopes Model) based
interpolation; (6) Climatological MLR (CMLR); and (7) Locally Weighted Polynomial
Regression (LWP). Regression based methods that used elevation information showed better
performance, in particular, the nonparametric method LWP. LWP is data driven with minimal
assumptions and provides an attractive alternative to MLR in situations with high degrees of
nonlinearities. For daily time scale, we propose a two step process in which, the precipitation
occurrence is first generated via a logistic regression model, and the amount is then estimated
using the interpolation schemes. This process generated the precipitation occurrence effectively.
The results shown in this paper will help guide the selection of appropriate spatial interpolation
methods for use in watershed models for stream flow simulation, forecasts, and also downscaling
of Global Climate Models (GCM) outputs.
Submitted to Water Resources Research
July 2004
1 Introduction
Accurate simulations of streamflow from physical watershed models are often limited by the
ability to capture the spatial variability of precipitation throughout a river basin [Syed et al.,
2003]. Watershed models require estimates of precipitation and temperature among other
variables on a regular grid or at sub-basins [Leavesley et al., 1996]. Even though the accuracy of
precipitation measurements has been improved, problems still exist, in terms of both sparse
spatial coverage of the observations, and difficulties in measuring precipitation during, for
example, snow events. Furthermore, precipitation being an intermittent variable property makes
it more difficult to estimate throughout the basin from sparse observational network.
Spatial interpolation schemes are required that can provide estimates of the mean and
uncertainty of precipitation and temperature at required locations (i.e. on a regular grid or at
centers of sub-basins). Various methods have been developed for this purpose - these range from
simple averaging methods (e.g. Arithmetic mean, Thiessen polygons), to physically based
estimates such as lapse rates, to complex statistical methods (multiple linear regression, locally
weighted polynomial, kriging, optimal interpolation, etc). Regional climate models are also used
to estimate spatial variability in precipitation and other surface climate fields. The skill of these
different methods depends on the space and time scales of the precipitation estimate. For
example, precipitation estimates at daily (or sub-daily) time scales and small sub-basin areas (e.g.
horizontal length scales <50km) are complicated by the intermittent properties of precipitation
[Jothityangkoon et al., 2000; Gupta, 1993; Thornton et al., 1997]. Furthermore, the accuracy of
different methods is regionally dependent. Methods that work well in regions dominated by
large-scale frontal systems may not work well in regions dominated by sporadic thunderstorm
activity. The growing needs of complex hydrologic models require estimations of precipitation
and temperature at small time (typically, daily or six hourly) and space scales.
The purpose of this paper is to provide a comprehensive assessment of the advantages and
limitations of different statistical techniques that are used to estimate spatial variability in
precipitation and temperature. This assessment will be conducted on both daily and monthly
time scales for two river basins in different hydro-climate regimes in the contiguous United
States (U.S.). The different statistical methods are: Simple Average; Inverse Distance Weight
Scheme (IDW); Ordinary Kriging; Multiple Linear Regression (MLR); PRISM (Parameter-
elevation Regressions on Independent Slopes Model) based interpolation; Climatological MLR
(CMLR); and Locally Weighted Polynomial Regression (LWP). The two river basins presented
here (shown in Figure 1) are Animas River at Durango, Colorado (Animas) and Alapaha River at
Statenville, Georgia (Alapaha).
The paper is organized as follows. Section 2 provides literature review of spatial
interpolation methods and section 3 provides description of the spatial interpolation schemes
used in this paper. Section 4 discusses the experimental design followed by descriptions of data
in Section 5. Results from monthly and daily time scale analysis are presented in section 6.
Summary and discussion of the results conclude the paper (section 7).
2. Literature review
Several authors have discussed advantages and limitations of various interpolation schemes
for precipitation and temperature [Dingman, 1994; Myers, 1994; Dirks et al., 1998; Lanza et al.,
2001; Fassnacht et al., 2003] (see Table 1.). These methods range from simple average methods
(e.g. Thiessen polygon [Thiessen, 1911]) to distance based methods such as inverse distance
weighting scheme and ordinary kriging [Franke and Nielson, 1980; Creutin and Obled, 1982] to
techniques that make explicit use of topographic parameters such as elevation in the interpolation
routine [e.g. Daly et al., 1994; Rajagopalan and Lall, 1998; Goovaerts 2000].
Simple methods such as spatial average, Thiessen polygon, and nearest neighbor are easy to
implement and computationally very efficient. However, these methods provide poor estimates,
especially in regions with sparse data network [Thiessen, 1911]. For example, if the grid cells in
the hydrologic model use data from the nearest stations, discontinuity arises in the transition
between the grid cells represented by two different observation stations (e.g., as in Thiessen
polygon method). If all stations in a basin are averaged, then there will be no sub-basin
variability at all [Dirks et al., 1998]. Methods that take into account the distances (i.e. distance
between the estimation point and the observation stations) explicitly have the ability to address
these limitations.
Two of the early methods that use the distances are Inverse Distance Weighting (IDW) and
Ordinary Kriging (OK). In IDW, weights are applied to the observational data based on the
inverse of its distance from the estimation point - the distance is raised to some power [Bussieres
and Hogg, 1989; Dirks et al., 1998]. Optimal power of the inverse distance weighting function
could be calculated based on minimum error, but the power value of 2 is usually acceptable
[Kruizinga and Yperlaan, 1978; Dirks et al., 1998].
OK [Tabios and Salas, 1985] develops weights for the surrounding stations based on both the
co-variability between the stations and estimated co-variability between the surrounding stations
and the estimation points. The spatial co-variability is often quite noisy (and is unknown
between the surrounding stations and estimation points) so is modeled as a function of distance
(e.g., using a variogram model). Fitting the variogram function is the key aspect of OK. Efforts
have been made to fit variograms objectively so as to improve the estimation [Todini et al.,
2001].
In topographically complex regions, information on distance alone is insufficient to produce
good spatial estimates. OK has been extended to include topographical information for better
capturing the spatial variability of hydro-climate variables [Chua and Bras, 1982; Beek et al.,
1992; Pardo-Iguzquiza, 1998; Prudhomme and Reed, 1999; Jeffrey et al., 2001; Kyriakidis et al.,
2001; Erxleben et al., 2002]. For example, Goovarerts [2000] applied a regression scheme to
capture spatial trends of precipitation field and used kriging on the residuals from the regression.
As another extension of OK, indicator kriging can produce improved spatial variability
estimation of intermittent rainfall field [Seo, 1996]. Indicator kriging uses a chosen threshold to
transform the original values into indicator values (0 and 1). The indicator values are then
analyzed to determine spatial variability based on experimental variograms. Kriging has also
been applied in a two-step process of rainfall occurrence and amount estimation [Barancourt et
al., 1992; Mackay et al., 2001].
An alternative to OK is Multiple Linear Regression (MLR). This popular theory is well
developed [Helsel and Hirsch, 1992; Walpole et al., 1998, etc.]. Typically, topographical
information such as latitude, longitude, and elevation is used to fit a linear relationship with the
hydroclimate variable (precipitation or temperature). The fitted linear equation is then used to
estimate the hydroclimate variables spatially [Ollinger et al., 1993; Kurtzman and Kadmon,
1999; Ninyerola et al., 2000; Marquinez et al., 2003]. For example, Daly et al. [1994] used
regression techniques to estimate spatial variability of precipitation and other climate variables
for the entire US. As part of this research, Daly developed the PRISM (Parameter-elevation
Regressions on Independent Slopes Model) system, which has been used in many applications
[Church et al., 1995; Bishop et al., 1998]. Hay et al. [2002] used monthly climatological
regression relationships to estimate the daily variability of climate variables (the stations selected
for use in this method was optimized so as to maximize the fit between modeled and observed
streamflow). Generally, methods that include elevation information tend to perform better,
especially on low-density networks [Creutin and Obled, 1982].
Nonparametric methods that are data driven and do not require assumptions of the underlying
function (e.g., linear) provide an attractive alternative. There are several nonparametric methods
such as Splines, kernel-based [Owosina, 1992], and local polynomials [Loader, 1997;
Rajagopalan and Lall, 1998]. Local polynomials are easy to implement and are skilful. The
estimate at any point is based on a polynomial (of order p) fit to a small number (k) of its nearest
neighbors. The polynomial order and the number of neighbors are obtained using objective
criteria such as cross-validation measures, from the data. If p is equal to 1 (i.e. linear function)
and the number of neighbors includes all the data points then this collapses to a traditional linear
regression. Thus, the local polynomials can be thought of as a super set of linear regressions.
The performance of nonparametric methods on a variety of synthetic and real data sets has been
documented by Owosina [1992].
In many cases, simpler methods show moderate skill with less computational cost [Hevesi et
al., 1992; Goodale et al., 1998; Hartkamp et al., 1999; Kurtzman and Kadmon, 1999; Parajka,
2000]. Application of the interpolation schemes on daily or sub-daily rainfall field tends to have
higher variability compared to the monthly or annual analysis [Bussieres and Hogg, 1989;
Jeffrey et al., 2001]. For example, skills of various kriging methods decrease with increasing
spatial and temporal variability [Borga and Vizzaccaro, 1997; Hartkamp et al., 1999].
3 Interpolation methods
We selected seven different methods for spatial interpolation with varying degrees of
complexity for this inter-comparison effort. The methods are (1)Simple Average, (2)Inverse
Distance Weight scheme, (3)Ordinary Kriging, (4)Multiple Linear Regression (MLR), (5)
PRISM based interpolation, (6)Climatological MLR, (7)Locally Weighted Polynomial regression
(LWP). For the daily time step analysis, logistic regression is also applied with the selected
interpolation schemes.
3.1 Straight Average (SA)
This (SA) method calculates the arithmetic mean of all the available observations in a basin.
Thus the calculated mean is the estimate at any point in the basin. All of the observation points
contribute equally to the estimate of the mean. Consequently, the resulting estimation doesn’t
show any spatial variability.
However, this method is still valuable for the estimation of areal mean of hydroclimate
variables on a small, dense, and well-distributed station network. The estimation of precipitation
is given as,
∑=
=n
iiest p
np
1
1 (1)
where pi is the observed precipitation at i th station, n is the total number of observation stations.
3.2 Inverse distance weight scheme (IDW)
The IDW method assigns weights based on the inverse of the distance to every data points
located within a given search radius centered on the point of estimate The nearest observation
station has the biggest weight and the most remote station has smallest weight. As a constraint,
sum of the weights should be equal to 1. Traditional form of this method is,
∑=
⋅n
iiiest pw=p
1
(2)
∑n
=jkj
ki
i
d
d=w
1
1
1
(3)
where n is the number of stations in search radius, di is the distance between estimation and i th
observation points, dj is the distance between the estimation and each of the j observations, k is
the power ( ∞≤≤ k1 ). In this study, the revised weighting function introduce by Franke &
Nielson [1980], which gave better result than the traditional function, was used:
∑
−
−
n
=j j
j
i
i
i
Rd
dR
Rd
dR
=w
1
2
2
(4)
where R is the maximum distance between the estimation and observation points inside the
search radius. In equation (4) all weights are calculated based on the 3-D distance with the same
units (meters). The estimates are less sensitive to the power of the distance function. Typically,
power value of 2 seems to work fine based on the error measures in applications of precipitation
estimations [Kruizinga and Yperlaan, 1978; Dirks et al., 1998]. One can also incorporate
elevation in calculating the distance thus bringing in the topographical information.
3.3 Ordinary Kriging (OK)
Many spatial interpolation schemes are essentially based on the same form as equation (2)
but use different weighting functions. The OK method calculates weights using a vector of
covariances between the estimation point and surrounding observing stations (D) and also a
matrix of covariances between all observing stations(C):
DCw 1−= (5)
where w is the vector of weight, C is the covariance matrix between observation locations, D is
the vector of covariance between estimation point and observation locations.
Because the covariance structure between the estimation and observations points is
unknown, a model which is a function of distance is needed to determine the weights. This
model is referred to as the variogram [Journel, and Huijbregts, 1978]. To derive this variogram
model, covariance is estimated between all possible observation points within a specified
distance. A smooth function is then fit to the observed covariances. This fitting process involves
selecting a function (e.g., spherical, exponential etc) and its associated parameters (range, nugget,
sill, power etc.).
The variogram fitting is to an extent subjective. Given the limited choice of functions, the
function selected often provides a poor fit to the covariance estimates from the observations.
Consequently, the estimates from equation (5) using the fitted variogram will have high
variability – i.e. poor performance. This is one of the key drawbacks of OK. To illustrate this
point, the variogram for a simple application of well pumping surface estimation problem is
shown in Figure 2. A variogram for January monthly mean precipitation from the Animas basin
is also shown. It can be seen that the “best” fitted variogram does not capture the spatial co-
variability of the data very well (shown as points on the graphs). If the underlying surface is
highly nonstationary (such as the case in this example) obtaining a best fit variogram will be
difficult, thus, greatly limiting the applicability of Kriging.
3.4 Multiple Linear Regression (MLR)
The MLR method assumes that a linear relationship between the predictor variables
(typically topological variables) and a known response variable (precipitation or temperature)
can be fitted and used to estimate the response variable at any desired locations. The model is of
the form:
zbybxbb=pest ⋅+⋅+⋅+ 3210 (6)
where x,y,and z are dependent variables of latitude, longitude and elevation, respectively. 0b , 1b ,
2b , 3b are the regression coefficients. These regression coefficients are estimated by
minimizing the squared errors.
Because of the strong orographic effects on the temperature and precipitation [e.g., Henry,
1919; Sevruk et al., 1998], elevation is included as a predictor variable in most spatial
interpolation researches on MLR. In this formulation, a separate regression equation may be
used for each time step (e.g., months, days). For this research, latitude, longitude, and elevation
are used for the predictor variables, and the separate regression models are used for each time
step.
3.5 Climatological MLR (CMLR)
In MLR, the regression coefficients are used to fit a best model to estimate spatial variability
on each time step. In CMLR, monthly trend (parameters for each climate variables) is assumed
to be preserved through the finer time scale (say, daily time series) but the optimal intercept of
the regression model can be changed for each time step.
As the first step, fixed seasonal coefficients are calculated from the basin climatology.
Using the monthly total (or monthly averaged for temperature) and the predictor variables x, y, z
(latitude, longitude, and elevation, respectively), monthly MLR equations can be set up similar to
that described in section 3.4. In traditional MLR method, coefficients in the model are calculated
repeatedly on each time step.
Secondly, the CMLR method changes intercept of the monthly model for shorter time scale
estimation based on a group of anchor stations [Hay et al., 2002]. This intercept is calculated
using mean values of climate variables and x, y, and z coordinates of the selected group of
observation stations. With the fixed b1, b2, and b3, intercept b0 is determined with the 'optimal'
anchor station sets by Exhaustive Search (ES) analysis [Wilby et al., 1999]. Several error
measures could be used to find best anchor stations. Hay and McCabe [2002] also found that the
accuracy of runoff estimation did not show significant improvement with more than three anchor
stations. Different from the study of Hay et al. [2002], root mean square error between the
dropped observation (cross validated) and estimation was tested as the objective function of the
optimal intercept calculation in this research. In addition, the nearest three station set was used
as the anchor stations and all the results presented in this paper are based on this station choice
option. After computing the optimal intercept for each given time step and for each estimation
point, climate variables are calculated by the following linear regression equation:
zbybxbb=p mthmthmthest ⋅+⋅+⋅+ ,3,2,1opt0, (7)
where mthb ,1 , mthb ,2 , and mthb ,3 are the regression coefficients based on climatology (monthly),
optb ,0 is the intercept for each time step.
3.6 PRISM-based interpolation
The PRISM method is a regression-based spatial interpolation method using climate data. It
was developed to model the strong orographic effect on precipitation and give better estimation
on mountainous terrain. PRISM develops weighted regression functions of elevation and
precipitation to predict the precipitation on each cell’s elevation [Daly et al., 1994]. Using this
method, the Spatial Climate Analysis Service (Oregon State University) has constructed
climatological maps of precipitation and other variables on a 2-km grid for the contiguous U.S.
The PRISM climatology is used for interpolation. Extracting monthly and daily information from
the PRISM climatology involves the following steps:
(i) Anomalies of precipitation and temperature are calculated based on monthly mean values.
The temperature anomalies are calculated as the difference between the observation and long-
term mean monthly mean value. The precipitation anomalies are calculated as the ratio of the
observation and long-term mean monthly total value.
(ii) The anomalies are interpolated to each estimation point using IDW (other schemes may
be used as well).
(iii) PRISM climatology is added to the interpolated values to get the final estimates. The
PRISM value for each estimation point is selected from the nearest PRISM grid point value and
added to the interpolated value:
prismaest
prismaest
pp=p
tt=t
×
+ (8)
where estest p,t are the estimated value of climate variables for each station,
aa p,t are the
interpolated climate variable anomalies, prismprism p,t are the monthly PRISM values from the
nearest grid point.
For the daily time scale precipitation estimation, monthly mean precipitation is divided by
the number of days of each month to get anomalies. PRISM values are added in the same way.
3.7 Locally Weighted Polynomial method (LWP)
The locally weighted polynomial method is similar to the MLR (section 3.4) but the
regression equation is developed using nearest neighbors. A general form of local regression
with one predictor variable is,
iii xp εµ += )( (9)
where )(xµ is the appropriate polynomial function, iε is the estimation error.
The polynomial function involved in this model can be linear or any higher order but linear
function was used for this research [Loader, 1997]. This function is fitted by the minimizing a
locally weighted least square in a given sliding window. However, order of the estimation model
should be carefully selected in order not to give unwanted higher variance in estimation [Loader,
1999]:
∑=
−+−
−n
iii
i xxaaph
xxW
1
210 )))((( (10)
where W( ) is the weighting function, h is the window width, a0, a1 are the coefficients.
Optimal model fit is determined with proper neighbor size around each estimation point.
Using GCV (General Cross Validation statistics), best neighbor size is determined based on the
number of predictor variables and estimation error:
2
1
2
1
−=∑
=
n
m
n
e
GCV
n
i
i
(11)
where ei is the error, n is the number of data points, m is the number of parameters.
In this research, the statistical package LOCFIT [Loader, 1992] was used to fit locally
weighted polynomial model and estimate the variability of climate variables. During the model
fitting, the ratio of neighbor size and the total observation should be greater than certain value. It
depends on the number of predictor variables and number of available data points [Loader,
1992]:
n
m 12min
+×≥α (12)
where α is (neighbor size)/(number of observations).
For the cross validation analysis in this research, lower bound of the alpha was set to (13/n).
The theoretical lower bound could be 7 (= 2*3+1), but it’s still very close to the edge of the
model capacity. In this research, minimum number of observation of 13 was selected to get
smoother but stable estimation.
3.8 Extensions to daily precipitation interpolation
In practice, most watershed models require precipitation estimates at a daily time step.
Jeffrey et al. [2001] suggest two different approaches to get daily interpolated climate variables;
(1) direct interpolation from daily record and (2) generation of the daily values from monthly
interpolated values. They adopted the second approach in their study. Either of the methods
could be problematic if sufficient daily observations are not available. Spatial interpolation
methods, such as those described in the previous section can be used on a daily time scale.
However, due to the intermittent property of daily rainfall, interpolating several zero values tend
to produce unrealistic rainfall fields at the daily time scale – not to mention the potential for
generating negative values.
We use two methods for spatial interpolation at the daily time scale: (1) the interpolation
methods described above are applied to the daily data and negative estimates are replaced with
zero and (2) logistic regression is used to estimate rainfall occurrence and the interpolation
methods described above are applied at locations where rainfall occurrence is generated. The
schematic of applying logistic regression to precipitation occurrence in a basin is shown in
Figure 3.
The station data sets are first transformed into a time series of occurrence (1 = wet days and 0
= dry days). Then, the precipitation amounts are interpolated only on estimated wet days.
Similar to the standard least square regression model (6), the logistic model is consisted with
regression constant 0β and slopes kβ for each predictor variables kx [Clark et al., 2004]:
( )kk xxxp
ββββ +++++−=
...exp1
11
22110
(13)
where, as in ordinary least squares, the beta values are the regression constants, p is the
probability of precipitation occurrence.
The climatological rainfall probability for each dropped stations are also calculated using the
rainfall probability of the surrounding stations from historical data through logistic regression in
order not to give negative or probabilities exceeding unity. If the estimated probability of
rainfall from the logistic regression is less (larger) than the climatological rainfall probability
(threshold) then, the estimating station is set to dry (wet). The climatological probability of
precipitation is calculated as,
∑
∑=
dayrain all
dayrain all
stations available of #
stations rainfall of #
y probabilit rainfall icalClimatolog (14)
The main purpose of two-step process is obtaining realistic rainfall occurrence using logistic
regression before applying the interpolation methods to estimate the precipitation amounts. Due
to data limitations we use three interpolation schemes (IDW, MLR, and LWP) in conjunction
with the logistic regression.
4. Experimental Design
The models described above were applied to monthly and daily precipitation and temperature
observations from two basins. The flowchart for the estimation process for monthly time scale is
shown in Figure 4. Three groups of analysis were arranged according to the available
interpolation schemes shown in Table 2. The flowchart of the two-step method is also shown in
Figures 5. The models are fitted on the observations and a suite of performance measures
estimated in a cross-validated mode for comparison. The performance metrics are described
below.
4.1 Measures for performance comparison
Various measures of performance are used in this research for comparison. Depending on
the measure, choice of basin and the interpolation method, the ranking between the schemes can
change considerably [Bussieres and Hogg, 1989]. In addition to this arbitrariness, a method that
is more elaborate than warranted by the quality of the data could give numerically superior
results but the result could be misleading under the uncertainty of the record itself. In order to
avoid possible unwanted overestimation of each interpolation schemes, only minimum effort was
done on model tuning and each model was kept as simple as possible.
The following measures were chosen for comparison:
a) Bias of the mean of climate variables on each station through time step:
N
tp
N
tpM
N
tio
N
tie
i
∑∑== −= 11
)()( (15)
where N is the number of time steps, pie is the estimated climate variable at station i, pio is
the observed climate variable at station i.
b) Bias of the variance of the climate variables on each station through time step:
−−−−
= ∑∑==
N
tioio
N
tieiei ptpptp
NV
1
2
1
2 ])([])([1
1 (16)
where p is the mean of climate variable at station i.
c) Spearman rank correlation coefficient between the observed values and cross-validated
estimates at each station:
)1(
61
2
1
2
−−=∑
=
nn
dr
n
ii
(17)
where di is the difference between the ranks assigned to the two variables, n is the number
of pairs of data.
For monthly (daily) time scale the monthly (daily) precipitation and temperatures are
estimated (in a cross-validated mode) for each year at each station. The Spearman rank
correlation is computed between the two. Thus, obtaining as many correlation coefficients as
there are number of stations.
Some advantages of the rank correlation coefficients are reported over the traditional
correlation coefficient. First, the rank correlation coefficient doesn’t assume that the relation
between two variables is linearly related. Second, the rank correlation coefficient does not need
normality [Walpole et al., 1998]. Because systematic bias may lead erroneous correlation, this
measure was used with RMSE for better performance assessment.
d) Root mean square error of the estimated values:
N
tptpN
tioie∑
=
−= 1
2))()(( RMSE
(18)
e) Inter-station correlation between the observed values and estimation at each time step. In
the case of the monthly time step, for each year the cross-validated estimates and the
observed values at all the stations are correlated, thus, obtaining as may correlation
coefficients as the number of years.
5 Data
Two basins with different climatic characteristics chosen for this study (Figure 1) as
mentioned earlier are Animas and Alapaha. Animas basin is snowmelt driven with occasional
rain-on-snow events during winter and, have large relief with elevation ranging from 680m to
3700m. The Alapaha basin is at a lower elevation dominated by rainfall events. Among the
various characteristics of watersheds topography has significant impact on spatial variability
estimation of climate variables.
Daily minimum and maximum temperature and precipitation recorded at all the
observational locations were compiled from National Weather Service (NWS) and snow
telemetry (SNOTEL) databases [Hay et al., 2002] – shown in Table 2. To be consistent across
the basins, the data from 1979 onwards were used in the study.
5.1 Data quality control
Measurement and recording errors are common in hydroclimate data sets and can impact
the interpolation schemes [Dingman, 1994]. Reek et al. [1992] noticed that there is significant
number of erroneous daily values in climate variables records. The errors include data entry,
recording, and reformatting errors. Some of the selected data quality control methods examined
in their research – (1) extreme outlier detection, (2) diurnal change limit, (3) inconsistency, (4)
spike check, (5) Z-score test, (6) same value repeat check were applied on the data sets chosen
for this study. These methods are described in the work of Reek et al. [1992].
In order to obtain stable estimates the minimum number of observations required by each
method is different (Table 3). For cross validation purpose, IDW required at least two available
stations because the weight can be given as one if there is only one station available except for
the estimation point. The MLR method needs at least five available points because there are four
unknown coefficients to be determined (coefficients for three coordinates and a intercept) except
for the estimation point that is being hold through the evaluation process.
Data variability varies quite a bit between months and basins, as can be seen in Figure 6.
Figure 6 shows the graph of available total days versus available stations. Each four rows show
the graphs for the basins used in this research. First two columns are for January and the rest are
for July. For example, the first graph (row 1, column 2) shows that most of the stations (15~17)
maintain the entire time steps in January data series of Animas basin. Then, available numbers
of days drop rapidly beyond 17 stations. However, entire time steps are available on less then
half the total stations on the January series of Alapaha basin (2nd row, 1st column). As a result, a
combination of interpolation methods will have to be used to address the data availability issue.
This will be mentioned in the following section.
6 Results
A comparison of the performance on monthly and daily time scales following the flowcharts
in Figure 4 and 5 was performed incorporating the data requirements for the different methods
(Table 3). The methods are applied for all months each basin. Results for January and July are
shown as representative of the wet and dry seasons in the basins. But first, a short comparison on
synthetic data is presented. Data is generated to mimic a ground water surface due to four
pumping wells (see Figure 2). It can be seen that the surface is highly nonlinear with sharp
gradients and curvature at the well points. As mentioned earlier, the “best” fitted variogram
(Figure 2) does not capture the spatial co-variability of the data very well (shown as points on the
graphs). Consequently, the estimates are also poor. We applied MLR and LWP to this data set.
As expected, the MLR fits a global linear function, which clearly is not suitable for this data set,
thus, completely smoothing out the nonlinearity, while the LWP, being a local estimation method,
captures the variability much better. This is demonstrated by the fact that the median biases of
the estimates of the data points are 0.002 and -0.02, and the RMSE values are 0.28 and 0.5,
respectively, for LWP and MLR. Significant improvements in the LWP estimates are apparent.
Extensive comparisons on a variety of synthetic data sets (Owosina, 1992) also bear out this
conclusion.
6.1 Monthly precipitation and temperature
Cross-validated estimates are used to calculate the performance measures. Figures 7 and 8
shows the boxplots performance measures for January and July monthly total precipitation and
maximum temperature, respectively, in the two basins Animas and Alapaha (all of the graphs use
millimeter for precipitation and degree Celsius for temperature). The length of the boxes
indicates the interquartile range of the measures from cross validated estimates at each location
in the basin and the whiskers show the 5th and 95th percentile range; the horizontal line in the box
is the median of the estimates. Larger box length indicates increased variability in the estimates.
In each of the plots, median value of the best performer is shown as a gray solid line. While
there is surprising similarity in the skill among different techniques, several observations are
apparent:
(i) The SA method performs poorly on almost all the measures
(ii) Methods using elevation perform well. In particular, the LWP method seems to show
relatively good performance (smaller box widths and median value close to zero).
(iii) For precipitation, all methods displayed a negative bias in the variance—this is expected
as the noisy character of individual station time series is smoothed.
To assess the performance of each method in reproducing observed spatial variability,
Spearman rank correlations are computed between the cross-validated estimates and the
observed values on the spatial map. The resulting set of spatial correlations are shown as
boxplots in Figure 9. It can be seen that the correlations are higher for January precipitation
compared to July. The correlations from the straight average method are -1. This is because, in
cross-validated the observation of a given year is dropped and is predicted using the rest of the
data, as a result, if a large value is dropped the average from the rest of the data is going to be
small and vice versa, thus, producing a correlation of -1. Note also the correlations on spatial
map for CMLR, which occur because the CMLR method is using the coefficients obtained from
climatology.
CMLR method, which is based on linear regression theory like MLR, showed good
performance especially on mountainous basin. However, CMLR’s competitive but lower
performance in Alapaha basin can be seen from the higher bias of mean compare to that of MLR
(Figures 7 and 8). The main idea of the CMLR method is to get stable estimation using the long-
term climatological trend through the basin for each time step. Keeping the climatological trend
in the coefficients for predictor variables, response variables could be estimated with optimal
intercept on each time step. However, if a basin has large year-to-year (day-to-day) variation on
the climate variables, climatologically estimated model couldn’t get a good estimation for
monthly (daily) time step. In other words, if the slope of the linear fit is already out of phase, the
intercept is not sufficient to catch the variation even though it’s the ‘optimal value’.
To illustrate this point, figure 10 shows the boxplots of the regression coefficients and the
correlation between the observed and estimation of the MLR fits at each station for January and
July precipitation in the Alapaha and Animas basin. For each station for each year an MLR fit is
developed in a cross-validated mode, which is then used to estimate the dropped value. Thus, at
each station we obtain an MLR fit for each year. The correlation and the regression coefficients
from each fit are shown as boxplots for each station. Figure 10 allows us to assess (a) if the
regression coefficients change substantially from year to year; and (b) if these changes are
predictable (e.g., as assessed through the correlation plot). We see that both conditions (a) and
(b) are satisfied, meaning that the use of climatological regression relations, as in CMLR, will be
unable to capture the year-to-year variability, and likely result in lower skill.
The methods were applied to interpolate January precipitation on fine DEM grid (roughly
0.9km x 0.9km) in the Animas basin. The resulting surface maps are shown in Figure 11. Strong
heterogeneity in the rainfall consistent with topography can be seen – with more precipitation in
the higher elevations and less at lower elevations. The straight average washes out the strong
spatial heterogeneity and consequently, the interpolated values are close to 0. IDW and OK tend
to over smooth, while MLR and LWP maps capture the spatial features very well.
Comparing the performances between the distance-based and regression-based schemes
(Figures 7, 8, 9) the importance of incorporating elevation information in the interpolation
methods is apparent.
6.2 Model performances at daily time scale
The first approach for interpolation of precipitation at daily time scales is to apply the
methods on the daily data and replace any estimated negative values with zero. For temperature
this is not an issue as it can take negative and positive values. The boxplots of the performance
measures from this approach are shown in Figures 12 and 13. The observations made from
Figures 7 and 8 are generally valid here as well: the regression-based methods using elevation
seem to perform better. Performance of the interpolation methods on daily time scales was
similar to those seen at the monthly time scales. Spatial map of daily precipitation on for Jan 25,
1979 from a selection of methods is shown in Figure 14. Notice the reduction in the magnitude
of the interpolated values relative to the observed and also the over-smoothing by the
interpolation methods, thus, missing the spatial features seen in the observed. This is largely due
to the fact that the rainfall occurrence is not well generated, which we hope to correct using the
two-step approach.
Logistic regression is first applied to estimate the precipitation occurrence and the
interpolation schemes are then applied to estimate the precipitation at locations with precipitation
occurrence generated by the logistic regression. For the two-step process the options available
based on the data limitations are shown in Table 4 - Case 1 (Case 2) is where IDW (MLR) is
used to interpolate precipitation amounts. As mentioned before, if the sample size on any day is
small, then the IDW method is used for interpolation. Otherwise the regression based methods
are used. Figure 15 shows the bar chart of the ratio of correct hits of the precipitation occurrence
for the Animas and Alapaha River basins. We sum all the days on which the logistic regression
estimated the precipitation occurrence correctly. Notice that using logistic regression greatly
increases the hit rate (from 50 ~ 75% to 80%~91%) relative to using the MLR and setting the
negative values to zero, for both the basins. The hit rate for Case 2 is slightly lower than for
Case 1 because MLR’s estimate of precipitation amounts can be less than zero.
Rainfall occurrence estimation bias was also analyzed by contingency table and Kuiper’s
skill score (KSS) [Wilks, 1995]. The contingency table is consisted of four elements showing
number of days included in each category as shown in Table 5. KSS is calculated as,
))((
)(
dbca
bcadKSS
++−= (19)
A value of 1.0 represents perfect estimation skill for this skill measure. Figure 16 shows the
great improvement in rainfall occurrence estimation skill on both basins and seasons. However,
summer season’s skill for the rainfall-dominated Alapaha basin was lower than winter season
while no difference was observed on in the snow-melt dominated Animas basin. In the Animas
basin, strong elevation effect is believed to provide similar estimation skill for both seasons.
Figure 17 shows the boxplots of performance measures of interpolation of daily precipitation.
Notice that the magnitudes of the biases are significantly smaller (and closer to zero) compared
to that from the single step (i.e. not using the logistic regression for precipitation occurrence
estimation) method in Figure 12. However, a slight negative bias is apparent. This is largely due
to the threshold established in the logistic regression for generating the number of rainy days.
7 Summary and Discussion
We compared a suite of spatial interpolation schemes for precipitation and temperature at
the monthly and daily time scales. For interpolation of daily precipitation a two-step approach
was used. In this, the spatial occurrence of precipitation is first generated using logistic
regression and the precipitation amounts are estimated using the interpolation schemes. Based on
the performance measures, regression based methods that used elevation information performed
quite well in estimating the observations in a cross-validated mode. However, there are some
issues around proper application of various spatial interpolation schemes.
The difficulty with fitting a good variogram to the data rendered poor performance of
Ordinary Kriging. There are variations of ordinary kriging method (indicator kriging, cokriging,
universal kriging, etc) to improve the estimation skill with the consideration of topography and
intermittency of rainfall events [Prudehomme and Reed, 1999; Barancourt et al., 1992].
However, even with this added complexity, one is not assured to find a variogram that has a
good fit to the data.
Among the regression based methods, the main drawback of PRISM based method is the
accuracy in the PRISM climatology, which has its own uncertainty. Also, the use of ratio
corrections for precipitation does not preserve the observed probability density function of daily
precipitation amounts (i.e., it alters the skewness). The CMLR method, which gives more stable
regression model using climatology, was applied for the point estimation in this research and
showed good performance on mountainous basin. In addition to this, there are efforts to improve
the performance of regression based methods by using additional predictor variables such as
slope, wind, and the distance from shoreline [e.g., Ninyerola et al., 2000; Marquinez et al., 2003].
The logistic regression significantly improved the estimation of rainfall occurrence and this
coupled with the regression-based methods seems to be able to capture the high degree of spatial
variability of precipitation on a daily time scale. Due to hydrologic variability in the watersheds
it is often difficult to prescribe a single method for spatial interpolation [Dingman, 1994;
Kruizinga and Yperlaan, 1978]. However, overall, our comparisons seem to suggest MLR and
LWP as competitive methods for spatial interpolation of precipitation and temperature. In the
application presented in this study topography is a dominant factor and the spatial variations of
precipitation and temperature by and large follow linearly with topography. Hence, MLR and
LWP performed comparably. Given the flexibility and data-driven aspect of LWP, it has the
ability to capture any nonlinearity – thus, making it more attractive. The IDW method is also a
nonparametric method similar to LWP and hence, the results from both of them are comparable.
However, LWP is a significant improvement to IDW theoretically.
All of the methods used in this research are also applied on the other two basins (East fork of
the Carson River near Gardnerville, Nevada (Carson) and Cle Elum River near
Roslyn,Washington (Cle Elum)). The results were consistent through these two basins.
A proper rainfall regeneration method should be considered with rainfall intermittency
estimation. In this research, the logistic regression showed no overall bias on the wet day re-
generation above the threshold but the resulting rainfall estimation showed under estimation of
the rainfall amount. This rainfall amount under estimation might be corrected by proper
threshold and probability cutoff value, etc. Further research should be followed.
There are issues beyond the analysis done in this study for the practical use of climate
variable estimation. These include the method of data handling and sensitivity analysis of
interpolation methods to runoff model. Effect of available data point size in a basin and the
observation density should be checked because the density and quality of available data set is a
critical issue on most interpolation methods. This may also include the issue around the search
radius (related to the number of available data). To test this, density of the observation could be
adjusted by randomly selecting the observation station in certain ratio to the total number of
observation with varying search radius. The resulting relation between the network density and
the search radius (number of observation) could be used as the guidance for interpolation scheme
selection according to the network density. In addition, the impact of the possible outliers also
needs to be checked by performing a series of interpolation analysis with a station dropped one
by one.
Because the climate variables are one of the basic input data for runoff models, the sensitivity
of spatially interpolated product should be investigated. However, through the complicated
parameter calibration and modeling procedures, subtle differences between the spatial
interpolation methods (as shown in this study) may be unrecognizable in practical runoff analysis.
Addition to this, there is no guarantee that the best point estimation method gives the best
runoff estimation though runoff models. Among the interpolation methods used in this paper,
the main idea of CMLR method was developed to produce better rainfall amount because the
accuracy of some hydrologic models like PRMS are more depends on the accurate estimation of
rainfall quantity than spatial variability. This is one of the reasons that performance of CMLR
method was not good in this research. Hay et al. [2002] also showed that the prediction of
temperature is important in snow-melt dominated basins.
The results shown in this paper will help selecting proper spatial interpolation schemes not
only for watershed modeling but also for agricultural, climate impact studies using GCM, etc.
However, the differences of performances between the interpolation schemes were often not
clear. It is possible that the different estimations of the spatial variability will result in larger
differences of simulated runoff through a hydrologic model so, the small differences between the
spatial estimations turned into a critical issue, or vice versa. The next step of our research is to
understand the sensitivity of the different spatial estimation on the streamflow simulation. This
work is currently underway.
Acknowledgments
Partial support of this work by NOAA GAPP program (Award NA16GP2806) and the
NOAA RISA Program (Award NA17RJ1229) is thankfully acknowledged.
Reference
Barancourt, C., J.D. Creutin, and J. Rivoirard (1992), A Method for Delineating and Estimating
Rainfall Fields, Water Resour. Res., 28(4), 1133-1144.
Beek, E.G., A. Stein, and L.L.F. Janssen (1992), Spatial variability and interpolation of daily
precipitation amount, Stochastic Hydrol. Hydraul., 6, 209-221.
Bishop, G.D., M.R. Church, and C. Daly (1998), Effects of improved precipitation estimates on
automated runoff mapping: Eastern United States, J. Am. Water Resour. As., 34 (1), 159-166.
Borga, M. and A. Vizzaccaro (1997), On the interpolation of hydrologic variables; formal
eqauvalance of multiquadratic surface fitting and kriging, J. Hydrol., 195, 160-171.
Borga, M., S. Fattorelli, and P. Valentini (1994), Precipitation estimation for flood forecasting in
a mountainous basin, in Advances in Water Resources Technology and Management;
proceedings of The Second European conference on Advances in Water Resouces
Technology and Management, Lisbon, Portugal, 14-18 June, edited by G. Tsakiris and M.A.
Santos, A.A. Balkema, 403-410.
Bussieres, N. and W. Hogg (1989), The Objective Analysis of Daily Rainfall by Distance
Weighting Schemes on a Mesoscale Grid, Atmosphere-Ocean, 27(3), 521-541.
Chua, S.H. and R. Bras (1982), Optimal estimators of mean areal precipitation in regions of
orographic influences, J. Hydrol., 57, 23-48.
Church, M.R., G.D. Bishop, and D.L. Cassell (1995), Maps of regional evapotranspiration and
runoff precipitation ratios in the northeast united-states, J. Hydrol., 168(1-4): 283-298.
Clark, M, S. Gangopadhyay, L. Hay, B. Rajagopalan, and R. Wilby (2004), The Schaake shuffle:
A method for reconstructing space-time variability in forecasted precipitation and temperature
fields, J. Hydrometeorol., 5(1), 243-262.
Creutin, J.D. and C. Obled (1982), Objective Analysis and Mapping Techniques for Rainfall
Fields: An Objective Comparison, Water Resour. Res., 18(2), 413-431.
Daly, C., R.P. Neilson, and D.L. Phillips (1994), A Statistical-Topographic Model for Mapping
Climatologicalal Precipitation over Mountainous Terrain. J. Appl. Meteor., 33, 140-158.
Dingman, S.L. (1994), Physical Hydrology, 1st ed., Prentice Hall, New Jersey.
Dirks, K.N., J.E. Hayb, C.E. Stowa, and D. Harrisa (1998), High-resolution studies of rainfall on
Norfolk Island Part II: Interpolation of rainfall data, J. Hydrol., 208, 187–193.
Erxleben, J., K. Elder, and R. Davis (2002), Comparison of spatial interpolation methods for
estimating snow distribution in the Colorado Rocky Mountains, Hydrol. Process., 16, 3627–
3649.
Fassnacht, S.R., K.S. Dressler, and R.C. Bales (2003), Snow water equivalent interpolation for
the Colorado River Basin from snow telemetry (SNOTEL) data, Water Resour. Res., 39(8),
1208, doi:10.1029/2002WR001512.
Franke, R. and G. Nielson (1980), Smooth interpolation of large sets of scattered data.
International Journal of Numerical Methods in Engineering, 15, 1691-1704.
Goodale, C.L., J.D. Aber., and S.V. Ollinger (1998), Mapping monthly precipitation,
temperature, and solar radiation for Ireland with polynomial regression and a digital
elevation model, Climate Res., 10, 35-49.
Goovaerts, P. (2000), Geostatistical approaches for incorporating elevation into the spatial
interpolation of rainfall, J. Hydrol., 228, 113-129.
Goovaerts, P. (2002), Geostatistical incorporation of spatial coordinates into supervised
classification of hyperspectral data, J. Geograph. Syst., 4, 99–111.
Gupta V.K., and E.C. Waymire (1993), A statistical-analysis of mesoscale rainfall as a random
cascade, J. Appl. Meteorol., 32(2): 251-267.
Hartkamp, A.D., K. De Beurs, A. Stein, and J.W. White (1999), Interpolation Techniques for
Climate Variables, NRG-GIS Series 99-01, Mexico, D.F, CIMMYT.
Hay, L.E. and G.J. McCabe (2002), Spatial varlability in water-balance model performance in
the conterminous United States, J. Am. Water Resour. As., 38(3), 847-860.
Hay, L.E., M.P. Clark, R.L. Wilby, W.J. Gutowski, JR., G.H. Leavesley, Z. Pan, R.W. Arritt, and
E.S. Takle (2002), Use of regional climate model output for hydrologic simulations, J.
Hydrometeorol., 3(5), 571-590.
Helsel, D.R., and R.M. Hirsch, (1992), Statistical Methods in Water Resources, Elsevier,
Amsterdam.
Henry, A.J. (1919), Increase of Precipitation With Altitude, Mon. Weather Rev., 47, 33-41.
Hevesi, J.A., J.D. Istok, and A.L. Flint (1992), Precipitation estimation in mountainous terrain
using multivariate geostatistics. Part I: Structural analysis, J Appl Meteorol, 31(7), 661-676.
Jeffrey, S.J., J.O. Carter, K.B. Moodie, and A.R. Beswick (2001), Using spatial interpolation to
construct a comprehensive archive of Australian climate data, Environmental Modeling &
Software, 16, 309-330.
Jothityangkoon, C., M. Sivapalan, and N.R. Viney (2000), Tests of a space-time model of daily
rainfall in southwestern Australia based on nonhomogeneous random cascades, Water
Resour. Res., 36(1): 267-284.
Journel, A. and C. Huijbregts (1978), Mining Geostatistics, Academic Press, London.
Kastelec, D. and K. Košmelj (2002), Spatial Interpolation of Mean Yearly Precipitation using
Universal Kriging, Developments in Statistics, 17, 149-162.
Kruizinga, S. and G.J. Yperlaan (1978), Spatial interpolation of daily totals of rainfall, J. Hydrol.,
36, 65-73.
Kurtzman, D. and R. Kadmon (1999), Mapping of temperature variables in Israel: a comparison
of different interpolation methods, Climate Res., 13, 33-43.
Kyriakidis, P.G., J. Kim, and N.R. Miller (2001), Geostatistical mapping of precipitation from
rain gauge data using atmospheric and terrain characteristics, J. Appl. Meteorol., 40, 1855-
1877.
Lanza, L.G., J.A. Ramirez, and E. Todini (2001), Stochastic rainfall interpolation and
downscaling, Hydrology & Earth System Sciences, 5(2), 139-143.
Leavesley, G.H., P.J. Restrepo, S.L. Markstrom, M. Dixon, and L.G. Stannard (1996), The
modular modeling system - MMS: User's manual, Open File Report 96-151, U.S. Geological
Survey.
Loader, C. (1992), Local Regression & Likelihood, Springer-Verlag, New York.
Loader, C. (1997), LOCFIT: An introduction, Statistical computation and graphics newsletter,
8(1), 11-17.
Mackay, N.G., R.E. Chandler, C. Onof, and H.S. Wheater (2001), Disaggregation of spatial
rainfall fields for hydrological modelling, Hydrology & Earth System Sciences, 5(2), 165-173.
Marqunez, J., J. Lastra, and P. Garcia (2003), Estimation models for precipitation in
mountainous regions: the use of GIS and multivariate analysis, J. Hydrol., 270, 1-11.
Myers, D.E. (1994), Spatial interpolation: an overview, Geoderma, 62, 17-28.
Ninyerola, M., X. Pons, and J.M. Roure (2000), A methodological approach of climatological
modelling of air temperature and precipitation through GIS techniques, Int. J. Climatol., 20,
1823–1841.
Ollinger, S.V., J.D. Aber, G.M. Lovett, S.E. Millham, R.G. Lathrop, and J.M. Ellis (1993), A
spatial model of atmospheric deposition for the northeastern US, Ecological Applications,
3(3), 459-472.
Owosina, A. (1992), Methods for Assessing the Space and Time Variability of Ground Water
Data. M.S. Thesis, Utah: Utah State University.
Draft
36
Parajka, J. (2000), Estimation of Average Basin Precipitation for Mountain Basins In
Western Tatra Mountains, ERB200-Monitoring and modelling cachment water
quantity and quality September, 27-29,Ghent, Belgium.
Pardo-Iguzquiza, E. (1998), Comparison of geostatistical methods for estimating the areal
average climatological rainfall mean using data on precipitation and topography, Int.
J. Climatol., 18, 1031-1047.
Prudhomme, C. and D.W. Reed (1999), Mapping extreme rainfall in a mountainous
region using geostatistical techniques: A case study in Scotland, Int. J. Climatol., 19,
1337-1356.
Rajagopalan, B. and U. Lall (1998), Locally Weighted Polynomial Estimation of Spatial
Precipitation, Journal of Geographic Information and Decision Analysis, 2(2), 44-51,.
Reek, T., S.R. Doty, and T.W. Owen (1992), A deterministic approach to the validation
of historical daily temperature and precipitation data from the cooperative network. B.
Am. Meteorol. Soc., 73(6)., 753-762.
Seo, D.J. (1996), Nonlinear estimation of spatial distribution of rainfall – An Indicator
cokriging approach, Stochastic Hydrology and Hydraulics, 10, 127-150.
Sevruk,B., K. Matokova-Sadlonova, and L. Toskano (1998), Topography effects on
small-scale precipitation variability in the Swiss pre-Alps, Hydrology, Water
Resources and Ecology in Headwaters (Proceedings of theHeadWater'98 Conference,
Meran/Merano, Italy), 51-58.
Sun, H., P.S. Cornish, and T.M. Daniell (2002), Spatial Variability in Hydrologic
Modeling Using Rainfall-Runoff Model and Digital Elevation Model, J. Hydrol. Eng.,
404-412.
Draft
37
Syed, K.H., D.C. Goodrich, D.E. Myers, and S. Sorooshian (2003), Spatial characteristics
of thunderstorm rainfall fields and their relation to runoff, J. Hydrol., 271, 1–21.
Tabios, G.Q.III and J.D. Salas (1985), A comparative analysis of techniques for spatial
interpolation of precipitation, Water Resour. Bull., 21(3), 356-379.
Thiessen, A.H. (1911), Precipitation averages for large areas. A subsection in the
Climatology data for July 1911, District No. 10, Great Basin, Mon. Weather Rev.,
1082–1084.
Thornton, P.E., S.W. Running, and M.A. White (1997), Generating surfaces of daily
meteorological variables over large regions of complex terrain, J. Hydrol., 190(3-4),
214-251.
Todini, E., F. Pellegrini, and C. Mazzetti (2001), Influence of parameter estimation
uncertainty in Kriging. Part 2 – Test and case study applications, Hydrol. Earth.
System Sci., 5, 225–232.
Walpole, R.E., R.H. Myers, and S.L. Myers (1998), Probability and statistics for
engineers and scientists, 6th ed., Prentice Hall.
Wilby, R.L., L.E. Hay, and G.H. Leavesley (1999), A comparison of downscaled and raw
GCM output: implications for climate change scenarios in the San Juan River basin,
Colorado, J. Hydrol., 225(1-2), 67-91.
Wilks, D.S. (1995), Statistical Methods in the Atmospheric Sciences: An Introduction,
Academic Press.
Young, K.C. (1992), A three-model for interpolating for monthly precipitation values,
Mon. Weather Rev., 120, 2561-2569.
Draft
38
Figure captions
Figure 1: Map showing the study regions
Figure 2: (a) Spatial map of the well draw down surface due to 4 pumping wells (b)
Fitted Variogram (line) and observed estimates (dots) (c) Fitted variogram and observed
estimates of January mean precipitation in the Animas basin.
Figure 3: Schematic of application of logistic regression to the precipitation occurrence
process
Figure 4: Flowchart for spatial interpolation of monthly precipitation and temperature
Figure 5: Flowchart for spatial interpolation of daily precipitation using the two-step
process
Figure 6: Number of observations available in each basin
Figure 7: Boxplots of estimation bias, error, and correlation for January and July
monthly total precipitation (over 12 available stations, Alphabets shown on x-axis
[S/I/K/M/O/P/L] stands for [Straight average/IDW/Kriging/MLR/CMLR/PRISM/LWP])
Figure 8: Same as Figure 7 but for January and July monthly maximum temperature.
Figure 9: Boxplots of correlations on spatial map for January and July monthly total
precipitation
Figure 10: Boxplot of MLR coefficients and correlation for estimation of January and
July monthly total precipitation in Animas and Alapaha basin
Figure 11: Spatial map of estimates from selected methods for January mean monthly
total precipitation in Animas basin (Observed rainfall values are shown as white numbers
on topography plot located at the upper left pane)
Figure 12: Same as Figure 7 but for interpolation of daily precipitation
Draft
39
Figure 13: Same as Figure 8 but for interpolation of daily temperatures
Figure 14: Same as Figure 11 but for daily precipitation on for Jan 25, 1979
Figure 15: Bar charts of rainfall occurrence hit ratio (which is the ratio of correct
estimation of rainy and dry days) simulated from different methods
Figure 16: Skill of rainfall occurrence realization
Figure 17: Same as Figure 7 but from daily interpolation of precipitation using the two-
step process.
Draft
40
Tables Table 1. References of spatial interpolation on rainfall data
Category Methods References
Simple methods Nearest neighbor, Arithetic mean, Thiessen polygon
Thiessen (1911)
Inverse distance weighting, Kruizinga and Yperlaan (1978), Franke and Nielson (1980), Bussieres and Hogg (1989), Dirks et al. (1998), Hartkamp et al. (1999)
Distance based methods
Ordinary kriging Creutin and Obled (1982), Tabios and Salas (1985), Barancourt et al. (1992), Borga et al. (1994), Borga and Vizzaccaro (1997), Sun et al. (2002), Syed et al. (2003)
Cokriging, detrend kriging, indicator kriging, two-step process
Chua and Bras (1982), Barancourt et al. (1992), Beek et al. (1992), Hevesi et al. (1992), Young (1992), Seo (1996), Pardo-Iguzquiza (1998), Kurtzman and Kadmon (1999), Prudhomme and Reed (1999), Goovaerts (2000), Kyriakidis et al. (2001), Mackay et al (2001), Todini et al. (2001), Jeffrey et al. (2001), Goovaerts (2000), Erxleben et al. (2002), Kastelec and Kosmelj(2002),
Topography involved methods
Multi-linear regression, PRISM, CMLR, locally weighted polynomial
Ollinger et al. (1993), Daly et al. (1994), Loader (1997), Goodale et al. (1998), Rajagopalan and Lall (1998), Loader (1999), Ninyerola et al. (2000), Hay et al. (2002), Fassnacht et al. (2003), Marquinez et at. (2003)
Notice that papers present multiple methods are listed in the group by the method that showed best performance in the paper.
Table 2 Data availability in each basin Number of available observations Basin # of
stations Data Period >12 >4 >2
Animas 37 1948.10~1997.4
Alapaha 28 1970.1~1997.9
Methods used
1. Straight average 2. IDW 3. PRISM 4. MLR 5. CMLR 6. Kriging 7. LWP
1. Straight average 2. IDW 3. PRISM 4. MLR 5. CMLR 6. Kriging
1. Straight average 2. IDW 3. PRISM
Table 3 Data requirements for each interpolation scheme
Schemes Min. # of sample* Explanation
Straight average 2 Need min. of 1 excluding the estimation point itself IDW 2 same as above
Kriging 4 Depends on the data structure MLR 5 # of parameters+1(intercept)+1(cross validation)
CMLR 5 same as above PRISM 2 Same as IDW (PRISM data set is needed for application) LWP 10 (2*p+1)+3
* All for cross validation analysis
Draft
41
Table 4 Options for daily interpolation of precipitation in the two-step process
[for all time steps]
If (Plog>Pclim) then,§
Case 1; Use IDW method for all days
Case 2; Use IDW, for # of rainfall station <5 Use MLR, for all other days
Deg
. o
f so
phis
tific
atio
n
Case 3; Use IDW, for # of rainfall station <5 Use MLR, for 5 < # of rainfall station < 12 Use locally weighted polynomial, for # of rainfall station >12
§ Plog; Rainfall probability of the estimating station calculated by logistic regression Pclim; Rainfall probability from climatology of the basin
Table 5. 2X2 contingency table Observed
Wet Dry
Wet a b
Est
imat
ed
Dry c d
Draft
42
Figures
Figure 1. Map showing the study regions
Draft
43
Figure 2 (a) Spatial map of the well draw down surface due to 4 pumping wells (b) Fitted Variogram (line) and observed estimates (dots) (c) Fitted variogram and observed estimates of July mean precipitation in the Animas basin.
Draft
44
Figure 3 Schematic of application of logistic regression to the precipitation occurrence process
x y z r
---- ---- ---- 1
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 1
x y z r
---- ---- ---- 1
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 1
x y z r
---- ---- ---- 1
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 1
Rainfall occurrenceInterpolation bylogit regression
Get interpolation onrainy stations
Apply interpolationschemes
Rainfall field
Basin
x y z r
---- ---- ---- 1
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 1
x y z r
---- ---- ---- 1
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 1
x y z r
---- ---- ---- 1
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 1
x y z r
---- ---- ---- 1
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 1
x y z r
---- ---- ---- 1
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 1
x y z r
---- ---- ---- 1
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 0
---- ---- ---- 1
---- ---- ---- 1
Rainfall occurrenceInterpolation bylogit regression
Get interpolation onrainy stations
Apply interpolationschemes
Rainfall field
Basin
Draft
45
Figure 4 Flowchart for spatial interpolation of monthly precipitation and temperature
Data Correction
For all climate variables
over 80% over 30%
Straight avg. IDW
PRISM
Performance comparison
#>2
Straight avg. IDW
Kriging MLR
CMLR PRISM LWP
# of stations.
Straight avg. IDW
Kriging MLR
CMLR PRISM
30~100%
2<#<4
#>12
Draft
46
Figure 5 Flowchart for spatial interpolation of daily precipitation using the two-step process
Daily Precipitation Data
Use IDW,
Case 1
Use IDW
Performance comparison
3-cases based on # of rain station on the given time step
Logistic regression
→>→<
rain noPP
rainPP
logitclim
logitclim
No # of station. >4
Yes
Case 2
#<4; IDW #>4; MLR
Case 3
#<4; IDW 4<#<12; MLR
#>12; LWP
Set, P=0
All stations dry Yes
No
Draft
47
Figure 6 Number of observations available in each basin
Draft
48
Figure 7 Boxplots of estimation bias, error, and correlation for January and July monthly total precipitation (over 12 available stations, Alphabets shown on x-axis [S/I/K/M/O/P/L] stands for [Straight average/IDW/Kriging/MLR/CMLR/PRISM/LWP])
Draft
49
Figure 8 Same as Figure 7 but for January and July monthly maximum temperature
Draft
50
Figure 9 Boxplots of correlations on spatial map for January and July monthly total
precipitation
Draft
51
Animas Basin
Alapaha Basin
Figure 10 Boxplot of MLR coefficients and correlation for estimation of January and July monthly total precipitation in Animas and Alapaha basin
Draft
52
Figure 11 Spatial map of estimates from selected methods for January mean monthly total precipitation in Animas basin
(Observed rainfall values are shown as white numbers on topography plot located at the upper left pane)
Draft
53
Figure 12 Same as Figure 7 but for interpolation of daily precipitation
Draft
54
Figure 13 Same as Figure 8 but for interpolation of daily temperatures
Draft
55
Figure 14 Same as Figure 11 but for daily precipitation on for Jan 25, 1979
Draft
56
Figure 15 Bar charts of rainfall occurrence hit ratio (which is the ratio of correct estimation of rainy and dry days) simulated from different methods
Animas Basin
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
IDW MLR CASE1 CASE2
KS
S
January
July
Alapaha Basin
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
IDW MLR CASE1 CASE2
KS
S
January
July
Figure 16. Skill of rainfall occurrence realization
Draft
57
Figure 17. Same as Figure 7 but from daily interpolation of precipitation using the two-step process