Civil, Environmental and Architectural Engineering - Inter …civil.colorado.edu/~balajir/my-papers/Yhwang-WRRJul04.pdf · 2004-07-04 · Inter-comparison of spatial estimation schemes

Inter-comparison of spatial estimation schemes for precipitation and

temperature

Yeonsang Hwang1,2,Martyn Clark2,Balaji Rajagopalan1,2, Subhrendu Gangopadhyay1,2 , and

Lauren E. Hay3

1Department of Civil, Environmental and Architectural Engg., University of Colorado, Boulder,

CO

2Co-operative Institute for research in Environmental Sciences (CIRES), University of Colorado,

Boulder, CO

3U.S. Geological Survey, Denver, CO

Abstract

Distributed hydrologic models typically require spatial estimates of precipitation and temperature

from sparsely located observational points to the specific grid points. We compare and contrast

the performance of several statistical methods for the spatial estimation in two climatologically

and hydrologically different basins. The seven methods assessed are: (1) Simple Average; (2)

Inverse Distance Weight Scheme (IDW); (3) Ordinary kriging; (4) Multiple Linear Regression

(MLR); (5) PRISM (Parameter-elevation Regressions on Independent Slopes Model) based

interpolation; (6) Climatological MLR (CMLR); and (7) Locally Weighted Polynomial

Regression (LWP). Regression based methods that used elevation information showed better

performance, in particular, the nonparametric method LWP. LWP is data driven with minimal

assumptions and provides an attractive alternative to MLR in situations with high degrees of

nonlinearities. For daily time scale, we propose a two step process in which, the precipitation

occurrence is first generated via a logistic regression model, and the amount is then estimated

using the interpolation schemes. This process generated the precipitation occurrence effectively.

The results shown in this paper will help guide the selection of appropriate spatial interpolation

methods for use in watershed models for stream flow simulation, forecasts, and also downscaling

of Global Climate Models (GCM) outputs.

Submitted to Water Resources Research

July 2004

1 Introduction

Accurate simulations of streamflow from physical watershed models are often limited by the

ability to capture the spatial variability of precipitation throughout a river basin [Syed et al.,

2003]. Watershed models require estimates of precipitation and temperature among other

variables on a regular grid or at sub-basins [Leavesley et al., 1996]. Even though the accuracy of

precipitation measurements has been improved, problems still exist, in terms of both sparse

spatial coverage of the observations, and difficulties in measuring precipitation during, for

example, snow events. Furthermore, precipitation being an intermittent variable property makes

it more difficult to estimate throughout the basin from sparse observational network.

Spatial interpolation schemes are required that can provide estimates of the mean and

uncertainty of precipitation and temperature at required locations (i.e. on a regular grid or at

centers of sub-basins). Various methods have been developed for this purpose - these range from

simple averaging methods (e.g. Arithmetic mean, Thiessen polygons), to physically based

estimates such as lapse rates, to complex statistical methods (multiple linear regression, locally

weighted polynomial, kriging, optimal interpolation, etc). Regional climate models are also used

to estimate spatial variability in precipitation and other surface climate fields. The skill of these

different methods depends on the space and time scales of the precipitation estimate. For

example, precipitation estimates at daily (or sub-daily) time scales and small sub-basin areas (e.g.

horizontal length scales <50km) are complicated by the intermittent properties of precipitation

[Jothityangkoon et al., 2000; Gupta, 1993; Thornton et al., 1997]. Furthermore, the accuracy of

different methods is regionally dependent. Methods that work well in regions dominated by

large-scale frontal systems may not work well in regions dominated by sporadic thunderstorm

activity. The growing needs of complex hydrologic models require estimations of precipitation

and temperature at small time (typically, daily or six hourly) and space scales.

The purpose of this paper is to provide a comprehensive assessment of the advantages and

limitations of different statistical techniques that are used to estimate spatial variability in

precipitation and temperature. This assessment will be conducted on both daily and monthly

time scales for two river basins in different hydro-climate regimes in the contiguous United

States (U.S.). The different statistical methods are: Simple Average; Inverse Distance Weight

Scheme (IDW); Ordinary Kriging; Multiple Linear Regression (MLR); PRISM (Parameter-

elevation Regressions on Independent Slopes Model) based interpolation; Climatological MLR

(CMLR); and Locally Weighted Polynomial Regression (LWP). The two river basins presented

here (shown in Figure 1) are Animas River at Durango, Colorado (Animas) and Alapaha River at

Statenville, Georgia (Alapaha).

The paper is organized as follows. Section 2 provides literature review of spatial

interpolation methods and section 3 provides description of the spatial interpolation schemes

used in this paper. Section 4 discusses the experimental design followed by descriptions of data

in Section 5. Results from monthly and daily time scale analysis are presented in section 6.

Summary and discussion of the results conclude the paper (section 7).

2. Literature review

Several authors have discussed advantages and limitations of various interpolation schemes

for precipitation and temperature [Dingman, 1994; Myers, 1994; Dirks et al., 1998; Lanza et al.,

2001; Fassnacht et al., 2003] (see Table 1.). These methods range from simple average methods

(e.g. Thiessen polygon [Thiessen, 1911]) to distance based methods such as inverse distance

weighting scheme and ordinary kriging [Franke and Nielson, 1980; Creutin and Obled, 1982] to

techniques that make explicit use of topographic parameters such as elevation in the interpolation

routine [e.g. Daly et al., 1994; Rajagopalan and Lall, 1998; Goovaerts 2000].

Simple methods such as spatial average, Thiessen polygon, and nearest neighbor are easy to

implement and computationally very efficient. However, these methods provide poor estimates,

especially in regions with sparse data network [Thiessen, 1911]. For example, if the grid cells in

the hydrologic model use data from the nearest stations, discontinuity arises in the transition

between the grid cells represented by two different observation stations (e.g., as in Thiessen

polygon method). If all stations in a basin are averaged, then there will be no sub-basin

variability at all [Dirks et al., 1998]. Methods that take into account the distances (i.e. distance

between the estimation point and the observation stations) explicitly have the ability to address

these limitations.

Two of the early methods that use the distances are Inverse Distance Weighting (IDW) and

Ordinary Kriging (OK). In IDW, weights are applied to the observational data based on the

inverse of its distance from the estimation point - the distance is raised to some power [Bussieres

and Hogg, 1989; Dirks et al., 1998]. Optimal power of the inverse distance weighting function

could be calculated based on minimum error, but the power value of 2 is usually acceptable

[Kruizinga and Yperlaan, 1978; Dirks et al., 1998].

OK [Tabios and Salas, 1985] develops weights for the surrounding stations based on both the

co-variability between the stations and estimated co-variability between the surrounding stations

and the estimation points. The spatial co-variability is often quite noisy (and is unknown

between the surrounding stations and estimation points) so is modeled as a function of distance

(e.g., using a variogram model). Fitting the variogram function is the key aspect of OK. Efforts

have been made to fit variograms objectively so as to improve the estimation [Todini et al.,

2001].

In topographically complex regions, information on distance alone is insufficient to produce

good spatial estimates. OK has been extended to include topographical information for better

capturing the spatial variability of hydro-climate variables [Chua and Bras, 1982; Beek et al.,

1992; Pardo-Iguzquiza, 1998; Prudhomme and Reed, 1999; Jeffrey et al., 2001; Kyriakidis et al.,

2001; Erxleben et al., 2002]. For example, Goovarerts [2000] applied a regression scheme to

capture spatial trends of precipitation field and used kriging on the residuals from the regression.

As another extension of OK, indicator kriging can produce improved spatial variability

estimation of intermittent rainfall field [Seo, 1996]. Indicator kriging uses a chosen threshold to

transform the original values into indicator values (0 and 1). The indicator values are then

analyzed to determine spatial variability based on experimental variograms. Kriging has also

been applied in a two-step process of rainfall occurrence and amount estimation [Barancourt et

al., 1992; Mackay et al., 2001].

An alternative to OK is Multiple Linear Regression (MLR). This popular theory is well

developed [Helsel and Hirsch, 1992; Walpole et al., 1998, etc.]. Typically, topographical

information such as latitude, longitude, and elevation is used to fit a linear relationship with the

hydroclimate variable (precipitation or temperature). The fitted linear equation is then used to

estimate the hydroclimate variables spatially [Ollinger et al., 1993; Kurtzman and Kadmon,

1999; Ninyerola et al., 2000; Marquinez et al., 2003]. For example, Daly et al. [1994] used

regression techniques to estimate spatial variability of precipitation and other climate variables

for the entire US. As part of this research, Daly developed the PRISM (Parameter-elevation

Regressions on Independent Slopes Model) system, which has been used in many applications

[Church et al., 1995; Bishop et al., 1998]. Hay et al. [2002] used monthly climatological

regression relationships to estimate the daily variability of climate variables (the stations selected

for use in this method was optimized so as to maximize the fit between modeled and observed

streamflow). Generally, methods that include elevation information tend to perform better,

especially on low-density networks [Creutin and Obled, 1982].

Nonparametric methods that are data driven and do not require assumptions of the underlying

function (e.g., linear) provide an attractive alternative. There are several nonparametric methods

such as Splines, kernel-based [Owosina, 1992], and local polynomials [Loader, 1997;

Rajagopalan and Lall, 1998]. Local polynomials are easy to implement and are skilful. The

estimate at any point is based on a polynomial (of order p) fit to a small number (k) of its nearest

neighbors. The polynomial order and the number of neighbors are obtained using objective

criteria such as cross-validation measures, from the data. If p is equal to 1 (i.e. linear function)

and the number of neighbors includes all the data points then this collapses to a traditional linear

regression. Thus, the local polynomials can be thought of as a super set of linear regressions.

The performance of nonparametric methods on a variety of synthetic and real data sets has been

documented by Owosina [1992].

In many cases, simpler methods show moderate skill with less computational cost [Hevesi et

al., 1992; Goodale et al., 1998; Hartkamp et al., 1999; Kurtzman and Kadmon, 1999; Parajka,

2000]. Application of the interpolation schemes on daily or sub-daily rainfall field tends to have

higher variability compared to the monthly or annual analysis [Bussieres and Hogg, 1989;

Jeffrey et al., 2001]. For example, skills of various kriging methods decrease with increasing

spatial and temporal variability [Borga and Vizzaccaro, 1997; Hartkamp et al., 1999].

3 Interpolation methods

We selected seven different methods for spatial interpolation with varying degrees of

complexity for this inter-comparison effort. The methods are (1)Simple Average, (2)Inverse

Distance Weight scheme, (3)Ordinary Kriging, (4)Multiple Linear Regression (MLR), (5)

PRISM based interpolation, (6)Climatological MLR, (7)Locally Weighted Polynomial regression

(LWP). For the daily time step analysis, logistic regression is also applied with the selected

interpolation schemes.

3.1 Straight Average (SA)

This (SA) method calculates the arithmetic mean of all the available observations in a basin.

Thus the calculated mean is the estimate at any point in the basin. All of the observation points

contribute equally to the estimate of the mean. Consequently, the resulting estimation doesn’t

show any spatial variability.

However, this method is still valuable for the estimation of areal mean of hydroclimate

variables on a small, dense, and well-distributed station network. The estimation of precipitation

is given as,

∑=

=n

iiest p

np

1

1 (1)

where pi is the observed precipitation at i th station, n is the total number of observation stations.

3.2 Inverse distance weight scheme (IDW)

The IDW method assigns weights based on the inverse of the distance to every data points

located within a given search radius centered on the point of estimate The nearest observation

station has the biggest weight and the most remote station has smallest weight. As a constraint,

sum of the weights should be equal to 1. Traditional form of this method is,

∑=

⋅n

iiiest pw=p

1

(2)

∑n

=jkj

ki

i

d

d=w

1

1

1

(3)

where n is the number of stations in search radius, di is the distance between estimation and i th

observation points, dj is the distance between the estimation and each of the j observations, k is

the power ( ∞≤≤ k1 ). In this study, the revised weighting function introduce by Franke &

Nielson [1980], which gave better result than the traditional function, was used:

∑

−

−

n

=j j

j

i

i

i

Rd

dR

Rd

dR

=w

1

2

2

(4)

where R is the maximum distance between the estimation and observation points inside the

search radius. In equation (4) all weights are calculated based on the 3-D distance with the same

units (meters). The estimates are less sensitive to the power of the distance function. Typically,

power value of 2 seems to work fine based on the error measures in applications of precipitation

estimations [Kruizinga and Yperlaan, 1978; Dirks et al., 1998]. One can also incorporate

elevation in calculating the distance thus bringing in the topographical information.

3.3 Ordinary Kriging (OK)

Many spatial interpolation schemes are essentially based on the same form as equation (2)

but use different weighting functions. The OK method calculates weights using a vector of

covariances between the estimation point and surrounding observing stations (D) and also a

matrix of covariances between all observing stations(C):

DCw 1−= (5)

where w is the vector of weight, C is the covariance matrix between observation locations, D is

the vector of covariance between estimation point and observation locations.

Because the covariance structure between the estimation and observations points is

unknown, a model which is a function of distance is needed to determine the weights. This

model is referred to as the variogram [Journel, and Huijbregts, 1978]. To derive this variogram

model, covariance is estimated between all possible observation points within a specified

distance. A smooth function is then fit to the observed covariances. This fitting process involves

selecting a function (e.g., spherical, exponential etc) and its associated parameters (range, nugget,

sill, power etc.).

The variogram fitting is to an extent subjective. Given the limited choice of functions, the

function selected often provides a poor fit to the covariance estimates from the observations.

Consequently, the estimates from equation (5) using the fitted variogram will have high

variability – i.e. poor performance. This is one of the key drawbacks of OK. To illustrate this

point, the variogram for a simple application of well pumping surface estimation problem is

shown in Figure 2. A variogram for January monthly mean precipitation from the Animas basin

is also shown. It can be seen that the “best” fitted variogram does not capture the spatial co-

variability of the data very well (shown as points on the graphs). If the underlying surface is

highly nonstationary (such as the case in this example) obtaining a best fit variogram will be

difficult, thus, greatly limiting the applicability of Kriging.

3.4 Multiple Linear Regression (MLR)

The MLR method assumes that a linear relationship between the predictor variables

(typically topological variables) and a known response variable (precipitation or temperature)

can be fitted and used to estimate the response variable at any desired locations. The model is of

the form:

zbybxbb=pest ⋅+⋅+⋅+ 3210 (6)

where x,y,and z are dependent variables of latitude, longitude and elevation, respectively. 0b , 1b ,

2b , 3b are the regression coefficients. These regression coefficients are estimated by

minimizing the squared errors.

Because of the strong orographic effects on the temperature and precipitation [e.g., Henry,

1919; Sevruk et al., 1998], elevation is included as a predictor variable in most spatial

interpolation researches on MLR. In this formulation, a separate regression equation may be

used for each time step (e.g., months, days). For this research, latitude, longitude, and elevation

are used for the predictor variables, and the separate regression models are used for each time

step.

3.5 Climatological MLR (CMLR)

In MLR, the regression coefficients are used to fit a best model to estimate spatial variability

on each time step. In CMLR, monthly trend (parameters for each climate variables) is assumed

to be preserved through the finer time scale (say, daily time series) but the optimal intercept of

the regression model can be changed for each time step.

As the first step, fixed seasonal coefficients are calculated from the basin climatology.

Using the monthly total (or monthly averaged for temperature) and the predictor variables x, y, z

(latitude, longitude, and elevation, respectively), monthly MLR equations can be set up similar to

that described in section 3.4. In traditional MLR method, coefficients in the model are calculated

repeatedly on each time step.

Secondly, the CMLR method changes intercept of the monthly model for shorter time scale

estimation based on a group of anchor stations [Hay et al., 2002]. This intercept is calculated

using mean values of climate variables and x, y, and z coordinates of the selected group of

observation stations. With the fixed b1, b2, and b3, intercept b0 is determined with the 'optimal'

anchor station sets by Exhaustive Search (ES) analysis [Wilby et al., 1999]. Several error

measures could be used to find best anchor stations. Hay and McCabe [2002] also found that the

accuracy of runoff estimation did not show significant improvement with more than three anchor

stations. Different from the study of Hay et al. [2002], root mean square error between the

dropped observation (cross validated) and estimation was tested as the objective function of the

optimal intercept calculation in this research. In addition, the nearest three station set was used

as the anchor stations and all the results presented in this paper are based on this station choice

option. After computing the optimal intercept for each given time step and for each estimation

point, climate variables are calculated by the following linear regression equation:

zbybxbb=p mthmthmthest ⋅+⋅+⋅+ ,3,2,1opt0, (7)

where mthb ,1 , mthb ,2 , and mthb ,3 are the regression coefficients based on climatology (monthly),

optb ,0 is the intercept for each time step.

3.6 PRISM-based interpolation

The PRISM method is a regression-based spatial interpolation method using climate data. It

was developed to model the strong orographic effect on precipitation and give better estimation

on mountainous terrain. PRISM develops weighted regression functions of elevation and

precipitation to predict the precipitation on each cell’s elevation [Daly et al., 1994]. Using this

method, the Spatial Climate Analysis Service (Oregon State University) has constructed

climatological maps of precipitation and other variables on a 2-km grid for the contiguous U.S.

The PRISM climatology is used for interpolation. Extracting monthly and daily information from

the PRISM climatology involves the following steps:

(i) Anomalies of precipitation and temperature are calculated based on monthly mean values.

The temperature anomalies are calculated as the difference between the observation and long-

term mean monthly mean value. The precipitation anomalies are calculated as the ratio of the

observation and long-term mean monthly total value.

(ii) The anomalies are interpolated to each estimation point using IDW (other schemes may

be used as well).

(iii) PRISM climatology is added to the interpolated values to get the final estimates. The

PRISM value for each estimation point is selected from the nearest PRISM grid point value and

added to the interpolated value:

prismaest

prismaest

pp=p

tt=t

×

+ (8)

where estest p,t are the estimated value of climate variables for each station,

aa p,t are the

interpolated climate variable anomalies, prismprism p,t are the monthly PRISM values from the

nearest grid point.

For the daily time scale precipitation estimation, monthly mean precipitation is divided by

the number of days of each month to get anomalies. PRISM values are added in the same way.

3.7 Locally Weighted Polynomial method (LWP)

The locally weighted polynomial method is similar to the MLR (section 3.4) but the

regression equation is developed using nearest neighbors. A general form of local regression

with one predictor variable is,

iii xp εµ += )( (9)

where )(xµ is the appropriate polynomial function, iε is the estimation error.

The polynomial function involved in this model can be linear or any higher order but linear

function was used for this research [Loader, 1997]. This function is fitted by the minimizing a

locally weighted least square in a given sliding window. However, order of the estimation model

should be carefully selected in order not to give unwanted higher variance in estimation [Loader,

1999]:

∑=

−+−

−n

iii

i xxaaph

xxW

1

210 )))((( (10)

where W( ) is the weighting function, h is the window width, a0, a1 are the coefficients.

Optimal model fit is determined with proper neighbor size around each estimation point.

Using GCV (General Cross Validation statistics), best neighbor size is determined based on the

number of predictor variables and estimation error:

2

1

2

1

−=∑

=

n

m

n

e

GCV

n

i

i

(11)

where ei is the error, n is the number of data points, m is the number of parameters.

In this research, the statistical package LOCFIT [Loader, 1992] was used to fit locally

weighted polynomial model and estimate the variability of climate variables. During the model

fitting, the ratio of neighbor size and the total observation should be greater than certain value. It

depends on the number of predictor variables and number of available data points [Loader,

1992]:

n

m 12min

+×≥α (12)

where α is (neighbor size)/(number of observations).

For the cross validation analysis in this research, lower bound of the alpha was set to (13/n).

The theoretical lower bound could be 7 (= 2*3+1), but it’s still very close to the edge of the

model capacity. In this research, minimum number of observation of 13 was selected to get

smoother but stable estimation.

3.8 Extensions to daily precipitation interpolation

In practice, most watershed models require precipitation estimates at a daily time step.

Jeffrey et al. [2001] suggest two different approaches to get daily interpolated climate variables;

(1) direct interpolation from daily record and (2) generation of the daily values from monthly

interpolated values. They adopted the second approach in their study. Either of the methods

could be problematic if sufficient daily observations are not available. Spatial interpolation

methods, such as those described in the previous section can be used on a daily time scale.

However, due to the intermittent property of daily rainfall, interpolating several zero values tend

to produce unrealistic rainfall fields at the daily time scale – not to mention the potential for

generating negative values.

We use two methods for spatial interpolation at the daily time scale: (1) the interpolation

methods described above are applied to the daily data and negative estimates are replaced with

zero and (2) logistic regression is used to estimate rainfall occurrence and the interpolation

methods described above are applied at locations where rainfall occurrence is generated. The

schematic of applying logistic regression to precipitation occurrence in a basin is shown in

Figure 3.

The station data sets are first transformed into a time series of occurrence (1 = wet days and 0

= dry days). Then, the precipitation amounts are interpolated only on estimated wet days.

Similar to the standard least square regression model (6), the logistic model is consisted with

regression constant 0β and slopes kβ for each predictor variables kx [Clark et al., 2004]:

( )kk xxxp

ββββ +++++−=

...exp1

11

22110

(13)

where, as in ordinary least squares, the beta values are the regression constants, p is the

probability of precipitation occurrence.

The climatological rainfall probability for each dropped stations are also calculated using the

rainfall probability of the surrounding stations from historical data through logistic regression in

order not to give negative or probabilities exceeding unity. If the estimated probability of

rainfall from the logistic regression is less (larger) than the climatological rainfall probability

(threshold) then, the estimating station is set to dry (wet). The climatological probability of

precipitation is calculated as,

∑

∑=

dayrain all

dayrain all

stations available of #

stations rainfall of #

y probabilit rainfall icalClimatolog (14)

The main purpose of two-step process is obtaining realistic rainfall occurrence using logistic

regression before applying the interpolation methods to estimate the precipitation amounts. Due

to data limitations we use three interpolation schemes (IDW, MLR, and LWP) in conjunction

with the logistic regression.

4. Experimental Design

The models described above were applied to monthly and daily precipitation and temperature

observations from two basins. The flowchart for the estimation process for monthly time scale is

shown in Figure 4. Three groups of analysis were arranged according to the available

interpolation schemes shown in Table 2. The flowchart of the two-step method is also shown in

Figures 5. The models are fitted on the observations and a suite of performance measures

estimated in a cross-validated mode for comparison. The performance metrics are described

below.

4.1 Measures for performance comparison

Various measures of performance are used in this research for comparison. Depending on

the measure, choice of basin and the interpolation method, the ranking between the schemes can

change considerably [Bussieres and Hogg, 1989]. In addition to this arbitrariness, a method that

is more elaborate than warranted by the quality of the data could give numerically superior

results but the result could be misleading under the uncertainty of the record itself. In order to

avoid possible unwanted overestimation of each interpolation schemes, only minimum effort was

done on model tuning and each model was kept as simple as possible.

The following measures were chosen for comparison:

a) Bias of the mean of climate variables on each station through time step:

N

tp

N

tpM

N

tio

N

tie

i

∑∑== −= 11

)()( (15)

where N is the number of time steps, pie is the estimated climate variable at station i, pio is

the observed climate variable at station i.

b) Bias of the variance of the climate variables on each station through time step:

−−−−

= ∑∑==

N

tioio

N

tieiei ptpptp

NV

1

2

1

2 ])([])([1

1 (16)

where p is the mean of climate variable at station i.

c) Spearman rank correlation coefficient between the observed values and cross-validated

estimates at each station:

)1(

61

2

1

2

−−=∑

=

nn

dr

n

ii

(17)

where di is the difference between the ranks assigned to the two variables, n is the number

of pairs of data.

For monthly (daily) time scale the monthly (daily) precipitation and temperatures are

estimated (in a cross-validated mode) for each year at each station. The Spearman rank

correlation is computed between the two. Thus, obtaining as many correlation coefficients as

there are number of stations.

Some advantages of the rank correlation coefficients are reported over the traditional

correlation coefficient. First, the rank correlation coefficient doesn’t assume that the relation

between two variables is linearly related. Second, the rank correlation coefficient does not need

normality [Walpole et al., 1998]. Because systematic bias may lead erroneous correlation, this

measure was used with RMSE for better performance assessment.

d) Root mean square error of the estimated values:

N

tptpN

tioie∑

=

−= 1

2))()(( RMSE

(18)

e) Inter-station correlation between the observed values and estimation at each time step. In

the case of the monthly time step, for each year the cross-validated estimates and the

observed values at all the stations are correlated, thus, obtaining as may correlation

coefficients as the number of years.

5 Data

Two basins with different climatic characteristics chosen for this study (Figure 1) as

mentioned earlier are Animas and Alapaha. Animas basin is snowmelt driven with occasional

rain-on-snow events during winter and, have large relief with elevation ranging from 680m to

3700m. The Alapaha basin is at a lower elevation dominated by rainfall events. Among the

various characteristics of watersheds topography has significant impact on spatial variability

estimation of climate variables.

Daily minimum and maximum temperature and precipitation recorded at all the

observational locations were compiled from National Weather Service (NWS) and snow

telemetry (SNOTEL) databases [Hay et al., 2002] – shown in Table 2. To be consistent across

the basins, the data from 1979 onwards were used in the study.

5.1 Data quality control

Measurement and recording errors are common in hydroclimate data sets and can impact

the interpolation schemes [Dingman, 1994]. Reek et al. [1992] noticed that there is significant

number of erroneous daily values in climate variables records. The errors include data entry,

recording, and reformatting errors. Some of the selected data quality control methods examined

in their research – (1) extreme outlier detection, (2) diurnal change limit, (3) inconsistency, (4)

spike check, (5) Z-score test, (6) same value repeat check were applied on the data sets chosen

for this study. These methods are described in the work of Reek et al. [1992].

In order to obtain stable estimates the minimum number of observations required by each

method is different (Table 3). For cross validation purpose, IDW required at least two available

stations because the weight can be given as one if there is only one station available except for

the estimation point. The MLR method needs at least five available points because there are four

unknown coefficients to be determined (coefficients for three coordinates and a intercept) except

for the estimation point that is being hold through the evaluation process.

Data variability varies quite a bit between months and basins, as can be seen in Figure 6.

Figure 6 shows the graph of available total days versus available stations. Each four rows show

the graphs for the basins used in this research. First two columns are for January and the rest are

for July. For example, the first graph (row 1, column 2) shows that most of the stations (15~17)

maintain the entire time steps in January data series of Animas basin. Then, available numbers

of days drop rapidly beyond 17 stations. However, entire time steps are available on less then

half the total stations on the January series of Alapaha basin (2nd row, 1st column). As a result, a

combination of interpolation methods will have to be used to address the data availability issue.

This will be mentioned in the following section.

6 Results

A comparison of the performance on monthly and daily time scales following the flowcharts

in Figure 4 and 5 was performed incorporating the data requirements for the different methods

(Table 3). The methods are applied for all months each basin. Results for January and July are

shown as representative of the wet and dry seasons in the basins. But first, a short comparison on

synthetic data is presented. Data is generated to mimic a ground water surface due to four

pumping wells (see Figure 2). It can be seen that the surface is highly nonlinear with sharp

gradients and curvature at the well points. As mentioned earlier, the “best” fitted variogram

(Figure 2) does not capture the spatial co-variability of the data very well (shown as points on the

graphs). Consequently, the estimates are also poor. We applied MLR and LWP to this data set.

As expected, the MLR fits a global linear function, which clearly is not suitable for this data set,

thus, completely smoothing out the nonlinearity, while the LWP, being a local estimation method,

captures the variability much better. This is demonstrated by the fact that the median biases of

the estimates of the data points are 0.002 and -0.02, and the RMSE values are 0.28 and 0.5,

respectively, for LWP and MLR. Significant improvements in the LWP estimates are apparent.

Extensive comparisons on a variety of synthetic data sets (Owosina, 1992) also bear out this

conclusion.

6.1 Monthly precipitation and temperature

Cross-validated estimates are used to calculate the performance measures. Figures 7 and 8

shows the boxplots performance measures for January and July monthly total precipitation and

maximum temperature, respectively, in the two basins Animas and Alapaha (all of the graphs use

millimeter for precipitation and degree Celsius for temperature). The length of the boxes

indicates the interquartile range of the measures from cross validated estimates at each location

in the basin and the whiskers show the 5th and 95th percentile range; the horizontal line in the box

is the median of the estimates. Larger box length indicates increased variability in the estimates.

In each of the plots, median value of the best performer is shown as a gray solid line. While

there is surprising similarity in the skill among different techniques, several observations are

apparent:

(i) The SA method performs poorly on almost all the measures

(ii) Methods using elevation perform well. In particular, the LWP method seems to show

relatively good performance (smaller box widths and median value close to zero).

(iii) For precipitation, all methods displayed a negative bias in the variance—this is expected

as the noisy character of individual station time series is smoothed.

To assess the performance of each method in reproducing observed spatial variability,

Spearman rank correlations are computed between the cross-validated estimates and the

observed values on the spatial map. The resulting set of spatial correlations are shown as

boxplots in Figure 9. It can be seen that the correlations are higher for January precipitation

compared to July. The correlations from the straight average method are -1. This is because, in

cross-validated the observation of a given year is dropped and is predicted using the rest of the

data, as a result, if a large value is dropped the average from the rest of the data is going to be

small and vice versa, thus, producing a correlation of -1. Note also the correlations on spatial

map for CMLR, which occur because the CMLR method is using the coefficients obtained from

climatology.

CMLR method, which is based on linear regression theory like MLR, showed good

performance especially on mountainous basin. However, CMLR’s competitive but lower

performance in Alapaha basin can be seen from the higher bias of mean compare to that of MLR

(Figures 7 and 8). The main idea of the CMLR method is to get stable estimation using the long-

term climatological trend through the basin for each time step. Keeping the climatological trend

in the coefficients for predictor variables, response variables could be estimated with optimal

intercept on each time step. However, if a basin has large year-to-year (day-to-day) variation on

the climate variables, climatologically estimated model couldn’t get a good estimation for

monthly (daily) time step. In other words, if the slope of the linear fit is already out of phase, the

intercept is not sufficient to catch the variation even though it’s the ‘optimal value’.

To illustrate this point, figure 10 shows the boxplots of the regression coefficients and the

correlation between the observed and estimation of the MLR fits at each station for January and

July precipitation in the Alapaha and Animas basin. For each station for each year an MLR fit is

developed in a cross-validated mode, which is then used to estimate the dropped value. Thus, at

each station we obtain an MLR fit for each year. The correlation and the regression coefficients

from each fit are shown as boxplots for each station. Figure 10 allows us to assess (a) if the

regression coefficients change substantially from year to year; and (b) if these changes are

predictable (e.g., as assessed through the correlation plot). We see that both conditions (a) and

(b) are satisfied, meaning that the use of climatological regression relations, as in CMLR, will be

unable to capture the year-to-year variability, and likely result in lower skill.

The methods were applied to interpolate January precipitation on fine DEM grid (roughly

0.9km x 0.9km) in the Animas basin. The resulting surface maps are shown in Figure 11. Strong

heterogeneity in the rainfall consistent with topography can be seen – with more precipitation in

the higher elevations and less at lower elevations. The straight average washes out the strong

spatial heterogeneity and consequently, the interpolated values are close to 0. IDW and OK tend

to over smooth, while MLR and LWP maps capture the spatial features very well.

Comparing the performances between the distance-based and regression-based schemes

(Figures 7, 8, 9) the importance of incorporating elevation information in the interpolation

methods is apparent.

6.2 Model performances at daily time scale

The first approach for interpolation of precipitation at daily time scales is to apply the

methods on the daily data and replace any estimated negative values with zero. For temperature

this is not an issue as it can take negative and positive values. The boxplots of the performance

measures from this approach are shown in Figures 12 and 13. The observations made from

Figures 7 and 8 are generally valid here as well: the regression-based methods using elevation

seem to perform better. Performance of the interpolation methods on daily time scales was

similar to those seen at the monthly time scales. Spatial map of daily precipitation on for Jan 25,

1979 from a selection of methods is shown in Figure 14. Notice the reduction in the magnitude

of the interpolated values relative to the observed and also the over-smoothing by the

interpolation methods, thus, missing the spatial features seen in the observed. This is largely due

to the fact that the rainfall occurrence is not well generated, which we hope to correct using the

two-step approach.

Logistic regression is first applied to estimate the precipitation occurrence and the

interpolation schemes are then applied to estimate the precipitation at locations with precipitation

occurrence generated by the logistic regression. For the two-step process the options available

based on the data limitations are shown in Table 4 - Case 1 (Case 2) is where IDW (MLR) is

used to interpolate precipitation amounts. As mentioned before, if the sample size on any day is

small, then the IDW method is used for interpolation. Otherwise the regression based methods

are used. Figure 15 shows the bar chart of the ratio of correct hits of the precipitation occurrence

for the Animas and Alapaha River basins. We sum all the days on which the logistic regression

estimated the precipitation occurrence correctly. Notice that using logistic regression greatly

increases the hit rate (from 50 ~ 75% to 80%~91%) relative to using the MLR and setting the

negative values to zero, for both the basins. The hit rate for Case 2 is slightly lower than for

Case 1 because MLR’s estimate of precipitation amounts can be less than zero.

Rainfall occurrence estimation bias was also analyzed by contingency table and Kuiper’s

skill score (KSS) [Wilks, 1995]. The contingency table is consisted of four elements showing

number of days included in each category as shown in Table 5. KSS is calculated as,

))((

)(

dbca

bcadKSS

++−= (19)

A value of 1.0 represents perfect estimation skill for this skill measure. Figure 16 shows the

great improvement in rainfall occurrence estimation skill on both basins and seasons. However,

summer season’s skill for the rainfall-dominated Alapaha basin was lower than winter season

while no difference was observed on in the snow-melt dominated Animas basin. In the Animas

basin, strong elevation effect is believed to provide similar estimation skill for both seasons.

Figure 17 shows the boxplots of performance measures of interpolation of daily precipitation.

Notice that the magnitudes of the biases are significantly smaller (and closer to zero) compared

to that from the single step (i.e. not using the logistic regression for precipitation occurrence

estimation) method in Figure 12. However, a slight negative bias is apparent. This is largely due

to the threshold established in the logistic regression for generating the number of rainy days.

7 Summary and Discussion

We compared a suite of spatial interpolation schemes for precipitation and temperature at

the monthly and daily time scales. For interpolation of daily precipitation a two-step approach

was used. In this, the spatial occurrence of precipitation is first generated using logistic

regression and the precipitation amounts are estimated using the interpolation schemes. Based on

the performance measures, regression based methods that used elevation information performed

quite well in estimating the observations in a cross-validated mode. However, there are some

issues around proper application of various spatial interpolation schemes.

The difficulty with fitting a good variogram to the data rendered poor performance of

Ordinary Kriging. There are variations of ordinary kriging method (indicator kriging, cokriging,

universal kriging, etc) to improve the estimation skill with the consideration of topography and

intermittency of rainfall events [Prudehomme and Reed, 1999; Barancourt et al., 1992].

However, even with this added complexity, one is not assured to find a variogram that has a

good fit to the data.

Among the regression based methods, the main drawback of PRISM based method is the

accuracy in the PRISM climatology, which has its own uncertainty. Also, the use of ratio

corrections for precipitation does not preserve the observed probability density function of daily

precipitation amounts (i.e., it alters the skewness). The CMLR method, which gives more stable

regression model using climatology, was applied for the point estimation in this research and

showed good performance on mountainous basin. In addition to this, there are efforts to improve

the performance of regression based methods by using additional predictor variables such as

slope, wind, and the distance from shoreline [e.g., Ninyerola et al., 2000; Marquinez et al., 2003].

The logistic regression significantly improved the estimation of rainfall occurrence and this

coupled with the regression-based methods seems to be able to capture the high degree of spatial

variability of precipitation on a daily time scale. Due to hydrologic variability in the watersheds

it is often difficult to prescribe a single method for spatial interpolation [Dingman, 1994;

Kruizinga and Yperlaan, 1978]. However, overall, our comparisons seem to suggest MLR and

LWP as competitive methods for spatial interpolation of precipitation and temperature. In the

application presented in this study topography is a dominant factor and the spatial variations of

precipitation and temperature by and large follow linearly with topography. Hence, MLR and

LWP performed comparably. Given the flexibility and data-driven aspect of LWP, it has the

ability to capture any nonlinearity – thus, making it more attractive. The IDW method is also a

nonparametric method similar to LWP and hence, the results from both of them are comparable.

However, LWP is a significant improvement to IDW theoretically.

All of the methods used in this research are also applied on the other two basins (East fork of

the Carson River near Gardnerville, Nevada (Carson) and Cle Elum River near

Roslyn,Washington (Cle Elum)). The results were consistent through these two basins.

A proper rainfall regeneration method should be considered with rainfall intermittency

estimation. In this research, the logistic regression showed no overall bias on the wet day re-

generation above the threshold but the resulting rainfall estimation showed under estimation of

the rainfall amount. This rainfall amount under estimation might be corrected by proper

threshold and probability cutoff value, etc. Further research should be followed.

There are issues beyond the analysis done in this study for the practical use of climate

variable estimation. These include the method of data handling and sensitivity analysis of

interpolation methods to runoff model. Effect of available data point size in a basin and the

observation density should be checked because the density and quality of available data set is a

critical issue on most interpolation methods. This may also include the issue around the search

radius (related to the number of available data). To test this, density of the observation could be

adjusted by randomly selecting the observation station in certain ratio to the total number of

observation with varying search radius. The resulting relation between the network density and

the search radius (number of observation) could be used as the guidance for interpolation scheme

selection according to the network density. In addition, the impact of the possible outliers also

needs to be checked by performing a series of interpolation analysis with a station dropped one

by one.

Because the climate variables are one of the basic input data for runoff models, the sensitivity

of spatially interpolated product should be investigated. However, through the complicated

parameter calibration and modeling procedures, subtle differences between the spatial

interpolation methods (as shown in this study) may be unrecognizable in practical runoff analysis.

Addition to this, there is no guarantee that the best point estimation method gives the best

runoff estimation though runoff models. Among the interpolation methods used in this paper,

the main idea of CMLR method was developed to produce better rainfall amount because the

accuracy of some hydrologic models like PRMS are more depends on the accurate estimation of

rainfall quantity than spatial variability. This is one of the reasons that performance of CMLR

method was not good in this research. Hay et al. [2002] also showed that the prediction of

temperature is important in snow-melt dominated basins.

The results shown in this paper will help selecting proper spatial interpolation schemes not

only for watershed modeling but also for agricultural, climate impact studies using GCM, etc.

However, the differences of performances between the interpolation schemes were often not

clear. It is possible that the different estimations of the spatial variability will result in larger

differences of simulated runoff through a hydrologic model so, the small differences between the

spatial estimations turned into a critical issue, or vice versa. The next step of our research is to

understand the sensitivity of the different spatial estimation on the streamflow simulation. This

work is currently underway.

Acknowledgments

Partial support of this work by NOAA GAPP program (Award NA16GP2806) and the

NOAA RISA Program (Award NA17RJ1229) is thankfully acknowledged.

Reference

Barancourt, C., J.D. Creutin, and J. Rivoirard (1992), A Method for Delineating and Estimating

Rainfall Fields, Water Resour. Res., 28(4), 1133-1144.

Beek, E.G., A. Stein, and L.L.F. Janssen (1992), Spatial variability and interpolation of daily

precipitation amount, Stochastic Hydrol. Hydraul., 6, 209-221.

Bishop, G.D., M.R. Church, and C. Daly (1998), Effects of improved precipitation estimates on

automated runoff mapping: Eastern United States, J. Am. Water Resour. As., 34 (1), 159-166.

Borga, M. and A. Vizzaccaro (1997), On the interpolation of hydrologic variables; formal

eqauvalance of multiquadratic surface fitting and kriging, J. Hydrol., 195, 160-171.

Borga, M., S. Fattorelli, and P. Valentini (1994), Precipitation estimation for flood forecasting in

a mountainous basin, in Advances in Water Resources Technology and Management;

proceedings of The Second European conference on Advances in Water Resouces

Technology and Management, Lisbon, Portugal, 14-18 June, edited by G. Tsakiris and M.A.

Santos, A.A. Balkema, 403-410.

Bussieres, N. and W. Hogg (1989), The Objective Analysis of Daily Rainfall by Distance

Weighting Schemes on a Mesoscale Grid, Atmosphere-Ocean, 27(3), 521-541.

Chua, S.H. and R. Bras (1982), Optimal estimators of mean areal precipitation in regions of

orographic influences, J. Hydrol., 57, 23-48.

Church, M.R., G.D. Bishop, and D.L. Cassell (1995), Maps of regional evapotranspiration and

runoff precipitation ratios in the northeast united-states, J. Hydrol., 168(1-4): 283-298.

Clark, M, S. Gangopadhyay, L. Hay, B. Rajagopalan, and R. Wilby (2004), The Schaake shuffle:

A method for reconstructing space-time variability in forecasted precipitation and temperature

fields, J. Hydrometeorol., 5(1), 243-262.

Creutin, J.D. and C. Obled (1982), Objective Analysis and Mapping Techniques for Rainfall

Fields: An Objective Comparison, Water Resour. Res., 18(2), 413-431.

Daly, C., R.P. Neilson, and D.L. Phillips (1994), A Statistical-Topographic Model for Mapping

Climatologicalal Precipitation over Mountainous Terrain. J. Appl. Meteor., 33, 140-158.

Dingman, S.L. (1994), Physical Hydrology, 1st ed., Prentice Hall, New Jersey.

Dirks, K.N., J.E. Hayb, C.E. Stowa, and D. Harrisa (1998), High-resolution studies of rainfall on

Norfolk Island Part II: Interpolation of rainfall data, J. Hydrol., 208, 187–193.

Erxleben, J., K. Elder, and R. Davis (2002), Comparison of spatial interpolation methods for

estimating snow distribution in the Colorado Rocky Mountains, Hydrol. Process., 16, 3627–

3649.

Fassnacht, S.R., K.S. Dressler, and R.C. Bales (2003), Snow water equivalent interpolation for

the Colorado River Basin from snow telemetry (SNOTEL) data, Water Resour. Res., 39(8),

1208, doi:10.1029/2002WR001512.

Franke, R. and G. Nielson (1980), Smooth interpolation of large sets of scattered data.

International Journal of Numerical Methods in Engineering, 15, 1691-1704.

Goodale, C.L., J.D. Aber., and S.V. Ollinger (1998), Mapping monthly precipitation,

temperature, and solar radiation for Ireland with polynomial regression and a digital

elevation model, Climate Res., 10, 35-49.

Goovaerts, P. (2000), Geostatistical approaches for incorporating elevation into the spatial

interpolation of rainfall, J. Hydrol., 228, 113-129.

Goovaerts, P. (2002), Geostatistical incorporation of spatial coordinates into supervised

classification of hyperspectral data, J. Geograph. Syst., 4, 99–111.

Gupta V.K., and E.C. Waymire (1993), A statistical-analysis of mesoscale rainfall as a random

cascade, J. Appl. Meteorol., 32(2): 251-267.

Hartkamp, A.D., K. De Beurs, A. Stein, and J.W. White (1999), Interpolation Techniques for

Climate Variables, NRG-GIS Series 99-01, Mexico, D.F, CIMMYT.

Hay, L.E. and G.J. McCabe (2002), Spatial varlability in water-balance model performance in

the conterminous United States, J. Am. Water Resour. As., 38(3), 847-860.

Hay, L.E., M.P. Clark, R.L. Wilby, W.J. Gutowski, JR., G.H. Leavesley, Z. Pan, R.W. Arritt, and

E.S. Takle (2002), Use of regional climate model output for hydrologic simulations, J.

Hydrometeorol., 3(5), 571-590.

Helsel, D.R., and R.M. Hirsch, (1992), Statistical Methods in Water Resources, Elsevier,

Amsterdam.

Henry, A.J. (1919), Increase of Precipitation With Altitude, Mon. Weather Rev., 47, 33-41.

Hevesi, J.A., J.D. Istok, and A.L. Flint (1992), Precipitation estimation in mountainous terrain

using multivariate geostatistics. Part I: Structural analysis, J Appl Meteorol, 31(7), 661-676.

Jeffrey, S.J., J.O. Carter, K.B. Moodie, and A.R. Beswick (2001), Using spatial interpolation to

construct a comprehensive archive of Australian climate data, Environmental Modeling &

Software, 16, 309-330.

Jothityangkoon, C., M. Sivapalan, and N.R. Viney (2000), Tests of a space-time model of daily

rainfall in southwestern Australia based on nonhomogeneous random cascades, Water

Resour. Res., 36(1): 267-284.

Journel, A. and C. Huijbregts (1978), Mining Geostatistics, Academic Press, London.

Kastelec, D. and K. Košmelj (2002), Spatial Interpolation of Mean Yearly Precipitation using

Universal Kriging, Developments in Statistics, 17, 149-162.

Kruizinga, S. and G.J. Yperlaan (1978), Spatial interpolation of daily totals of rainfall, J. Hydrol.,

36, 65-73.

Kurtzman, D. and R. Kadmon (1999), Mapping of temperature variables in Israel: a comparison

of different interpolation methods, Climate Res., 13, 33-43.

Kyriakidis, P.G., J. Kim, and N.R. Miller (2001), Geostatistical mapping of precipitation from

rain gauge data using atmospheric and terrain characteristics, J. Appl. Meteorol., 40, 1855-

1877.

Lanza, L.G., J.A. Ramirez, and E. Todini (2001), Stochastic rainfall interpolation and

downscaling, Hydrology & Earth System Sciences, 5(2), 139-143.

Leavesley, G.H., P.J. Restrepo, S.L. Markstrom, M. Dixon, and L.G. Stannard (1996), The

modular modeling system - MMS: User's manual, Open File Report 96-151, U.S. Geological

Survey.

Loader, C. (1992), Local Regression & Likelihood, Springer-Verlag, New York.

Loader, C. (1997), LOCFIT: An introduction, Statistical computation and graphics newsletter,

8(1), 11-17.

Mackay, N.G., R.E. Chandler, C. Onof, and H.S. Wheater (2001), Disaggregation of spatial

rainfall fields for hydrological modelling, Hydrology & Earth System Sciences, 5(2), 165-173.

Marqunez, J., J. Lastra, and P. Garcia (2003), Estimation models for precipitation in

mountainous regions: the use of GIS and multivariate analysis, J. Hydrol., 270, 1-11.

Myers, D.E. (1994), Spatial interpolation: an overview, Geoderma, 62, 17-28.

Ninyerola, M., X. Pons, and J.M. Roure (2000), A methodological approach of climatological

modelling of air temperature and precipitation through GIS techniques, Int. J. Climatol., 20,

1823–1841.

Ollinger, S.V., J.D. Aber, G.M. Lovett, S.E. Millham, R.G. Lathrop, and J.M. Ellis (1993), A

spatial model of atmospheric deposition for the northeastern US, Ecological Applications,

3(3), 459-472.

Owosina, A. (1992), Methods for Assessing the Space and Time Variability of Ground Water

Data. M.S. Thesis, Utah: Utah State University.

Draft

36

Parajka, J. (2000), Estimation of Average Basin Precipitation for Mountain Basins In

Western Tatra Mountains, ERB200-Monitoring and modelling cachment water

quantity and quality September, 27-29,Ghent, Belgium.

Pardo-Iguzquiza, E. (1998), Comparison of geostatistical methods for estimating the areal

average climatological rainfall mean using data on precipitation and topography, Int.

J. Climatol., 18, 1031-1047.

Prudhomme, C. and D.W. Reed (1999), Mapping extreme rainfall in a mountainous

region using geostatistical techniques: A case study in Scotland, Int. J. Climatol., 19,

1337-1356.

Rajagopalan, B. and U. Lall (1998), Locally Weighted Polynomial Estimation of Spatial

Precipitation, Journal of Geographic Information and Decision Analysis, 2(2), 44-51,.

Reek, T., S.R. Doty, and T.W. Owen (1992), A deterministic approach to the validation

of historical daily temperature and precipitation data from the cooperative network. B.

Am. Meteorol. Soc., 73(6)., 753-762.

Seo, D.J. (1996), Nonlinear estimation of spatial distribution of rainfall – An Indicator

cokriging approach, Stochastic Hydrology and Hydraulics, 10, 127-150.

Sevruk,B., K. Matokova-Sadlonova, and L. Toskano (1998), Topography effects on

small-scale precipitation variability in the Swiss pre-Alps, Hydrology, Water

Resources and Ecology in Headwaters (Proceedings of theHeadWater'98 Conference,

Meran/Merano, Italy), 51-58.

Sun, H., P.S. Cornish, and T.M. Daniell (2002), Spatial Variability in Hydrologic

Modeling Using Rainfall-Runoff Model and Digital Elevation Model, J. Hydrol. Eng.,

404-412.

Draft

37

Syed, K.H., D.C. Goodrich, D.E. Myers, and S. Sorooshian (2003), Spatial characteristics

of thunderstorm rainfall fields and their relation to runoff, J. Hydrol., 271, 1–21.

Tabios, G.Q.III and J.D. Salas (1985), A comparative analysis of techniques for spatial

interpolation of precipitation, Water Resour. Bull., 21(3), 356-379.

Thiessen, A.H. (1911), Precipitation averages for large areas. A subsection in the

Climatology data for July 1911, District No. 10, Great Basin, Mon. Weather Rev.,

1082–1084.

Thornton, P.E., S.W. Running, and M.A. White (1997), Generating surfaces of daily

meteorological variables over large regions of complex terrain, J. Hydrol., 190(3-4),

214-251.

Todini, E., F. Pellegrini, and C. Mazzetti (2001), Influence of parameter estimation

uncertainty in Kriging. Part 2 – Test and case study applications, Hydrol. Earth.

System Sci., 5, 225–232.

Walpole, R.E., R.H. Myers, and S.L. Myers (1998), Probability and statistics for

engineers and scientists, 6th ed., Prentice Hall.

Wilby, R.L., L.E. Hay, and G.H. Leavesley (1999), A comparison of downscaled and raw

GCM output: implications for climate change scenarios in the San Juan River basin,

Colorado, J. Hydrol., 225(1-2), 67-91.

Wilks, D.S. (1995), Statistical Methods in the Atmospheric Sciences: An Introduction,

Academic Press.

Young, K.C. (1992), A three-model for interpolating for monthly precipitation values,

Mon. Weather Rev., 120, 2561-2569.

Draft

38

Figure captions

Figure 1: Map showing the study regions

Figure 2: (a) Spatial map of the well draw down surface due to 4 pumping wells (b)

Fitted Variogram (line) and observed estimates (dots) (c) Fitted variogram and observed

estimates of January mean precipitation in the Animas basin.

Figure 3: Schematic of application of logistic regression to the precipitation occurrence

process

Figure 4: Flowchart for spatial interpolation of monthly precipitation and temperature

Figure 5: Flowchart for spatial interpolation of daily precipitation using the two-step

process

Figure 6: Number of observations available in each basin

Figure 7: Boxplots of estimation bias, error, and correlation for January and July

monthly total precipitation (over 12 available stations, Alphabets shown on x-axis

[S/I/K/M/O/P/L] stands for [Straight average/IDW/Kriging/MLR/CMLR/PRISM/LWP])

Figure 8: Same as Figure 7 but for January and July monthly maximum temperature.

Figure 9: Boxplots of correlations on spatial map for January and July monthly total

precipitation

Figure 10: Boxplot of MLR coefficients and correlation for estimation of January and

July monthly total precipitation in Animas and Alapaha basin

Figure 11: Spatial map of estimates from selected methods for January mean monthly

total precipitation in Animas basin (Observed rainfall values are shown as white numbers

on topography plot located at the upper left pane)

Figure 12: Same as Figure 7 but for interpolation of daily precipitation

Draft

39

Figure 13: Same as Figure 8 but for interpolation of daily temperatures

Figure 14: Same as Figure 11 but for daily precipitation on for Jan 25, 1979

Figure 15: Bar charts of rainfall occurrence hit ratio (which is the ratio of correct

estimation of rainy and dry days) simulated from different methods

Figure 16: Skill of rainfall occurrence realization

Figure 17: Same as Figure 7 but from daily interpolation of precipitation using the two-

step process.

Draft

40

Tables Table 1. References of spatial interpolation on rainfall data

Category Methods References

Simple methods Nearest neighbor, Arithetic mean, Thiessen polygon

Thiessen (1911)

Inverse distance weighting, Kruizinga and Yperlaan (1978), Franke and Nielson (1980), Bussieres and Hogg (1989), Dirks et al. (1998), Hartkamp et al. (1999)

Distance based methods

Ordinary kriging Creutin and Obled (1982), Tabios and Salas (1985), Barancourt et al. (1992), Borga et al. (1994), Borga and Vizzaccaro (1997), Sun et al. (2002), Syed et al. (2003)

Cokriging, detrend kriging, indicator kriging, two-step process

Chua and Bras (1982), Barancourt et al. (1992), Beek et al. (1992), Hevesi et al. (1992), Young (1992), Seo (1996), Pardo-Iguzquiza (1998), Kurtzman and Kadmon (1999), Prudhomme and Reed (1999), Goovaerts (2000), Kyriakidis et al. (2001), Mackay et al (2001), Todini et al. (2001), Jeffrey et al. (2001), Goovaerts (2000), Erxleben et al. (2002), Kastelec and Kosmelj(2002),

Topography involved methods

Multi-linear regression, PRISM, CMLR, locally weighted polynomial

Ollinger et al. (1993), Daly et al. (1994), Loader (1997), Goodale et al. (1998), Rajagopalan and Lall (1998), Loader (1999), Ninyerola et al. (2000), Hay et al. (2002), Fassnacht et al. (2003), Marquinez et at. (2003)

Notice that papers present multiple methods are listed in the group by the method that showed best performance in the paper.

Table 2 Data availability in each basin Number of available observations Basin # of

stations Data Period >12 >4 >2

Animas 37 1948.10~1997.4

Alapaha 28 1970.1~1997.9

Methods used

1. Straight average 2. IDW 3. PRISM 4. MLR 5. CMLR 6. Kriging 7. LWP

1. Straight average 2. IDW 3. PRISM 4. MLR 5. CMLR 6. Kriging

1. Straight average 2. IDW 3. PRISM

Table 3 Data requirements for each interpolation scheme

Schemes Min. # of sample* Explanation

Straight average 2 Need min. of 1 excluding the estimation point itself IDW 2 same as above

Kriging 4 Depends on the data structure MLR 5 # of parameters+1(intercept)+1(cross validation)

CMLR 5 same as above PRISM 2 Same as IDW (PRISM data set is needed for application) LWP 10 (2*p+1)+3

* All for cross validation analysis

Draft

41

Table 4 Options for daily interpolation of precipitation in the two-step process

[for all time steps]

If (Plog>Pclim) then,§

Case 1; Use IDW method for all days

Case 2; Use IDW, for # of rainfall station <5 Use MLR, for all other days

Deg

. o

f so

phis

tific

atio

n

Case 3; Use IDW, for # of rainfall station <5 Use MLR, for 5 < # of rainfall station < 12 Use locally weighted polynomial, for # of rainfall station >12

§ Plog; Rainfall probability of the estimating station calculated by logistic regression Pclim; Rainfall probability from climatology of the basin

Table 5. 2X2 contingency table Observed

Wet Dry

Wet a b

Est

imat

ed

Dry c d

Draft

42

Figures

Figure 1. Map showing the study regions

Draft

43

Figure 2 (a) Spatial map of the well draw down surface due to 4 pumping wells (b) Fitted Variogram (line) and observed estimates (dots) (c) Fitted variogram and observed estimates of July mean precipitation in the Animas basin.

Draft

44

Figure 3 Schematic of application of logistic regression to the precipitation occurrence process

x y z r

---- ---- ---- 1

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 1

x y z r

---- ---- ---- 1

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 1

x y z r

---- ---- ---- 1

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 1

Rainfall occurrenceInterpolation bylogit regression

Get interpolation onrainy stations

Apply interpolationschemes

Rainfall field

Basin

x y z r

---- ---- ---- 1

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 1

x y z r

---- ---- ---- 1

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 1

x y z r

---- ---- ---- 1

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 1

x y z r

---- ---- ---- 1

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 1

x y z r

---- ---- ---- 1

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 1

x y z r

---- ---- ---- 1

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 0

---- ---- ---- 1

---- ---- ---- 1

Rainfall occurrenceInterpolation bylogit regression

Get interpolation onrainy stations

Apply interpolationschemes

Rainfall field

Basin

Draft

45

Figure 4 Flowchart for spatial interpolation of monthly precipitation and temperature

Data Correction

For all climate variables

over 80% over 30%

Straight avg. IDW

PRISM

Performance comparison

#>2

Straight avg. IDW

Kriging MLR

CMLR PRISM LWP

# of stations.

Straight avg. IDW

Kriging MLR

CMLR PRISM

30~100%

2<#<4

#>12

Draft

46

Figure 5 Flowchart for spatial interpolation of daily precipitation using the two-step process

Daily Precipitation Data

Use IDW,

Case 1

Use IDW

Performance comparison

3-cases based on # of rain station on the given time step

Logistic regression

→>→<

rain noPP

rainPP

logitclim

logitclim

No # of station. >4

Yes

Case 2

#<4; IDW #>4; MLR

Case 3

#<4; IDW 4<#<12; MLR

#>12; LWP

Set, P=0

All stations dry Yes

No

Draft

47

Figure 6 Number of observations available in each basin

Draft

48

Figure 7 Boxplots of estimation bias, error, and correlation for January and July monthly total precipitation (over 12 available stations, Alphabets shown on x-axis [S/I/K/M/O/P/L] stands for [Straight average/IDW/Kriging/MLR/CMLR/PRISM/LWP])

Draft

49

Figure 8 Same as Figure 7 but for January and July monthly maximum temperature

Draft

50

Figure 9 Boxplots of correlations on spatial map for January and July monthly total

precipitation

Draft

51

Animas Basin

Alapaha Basin

Figure 10 Boxplot of MLR coefficients and correlation for estimation of January and July monthly total precipitation in Animas and Alapaha basin

Draft

52

Figure 11 Spatial map of estimates from selected methods for January mean monthly total precipitation in Animas basin

(Observed rainfall values are shown as white numbers on topography plot located at the upper left pane)

Draft

53

Figure 12 Same as Figure 7 but for interpolation of daily precipitation

Draft

54

Figure 13 Same as Figure 8 but for interpolation of daily temperatures

Draft

55

Figure 14 Same as Figure 11 but for daily precipitation on for Jan 25, 1979

Draft

56

Figure 15 Bar charts of rainfall occurrence hit ratio (which is the ratio of correct estimation of rainy and dry days) simulated from different methods

Animas Basin

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IDW MLR CASE1 CASE2

KS

S

January

July

Alapaha Basin

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IDW MLR CASE1 CASE2

KS

S

January

July

Figure 16. Skill of rainfall occurrence realization

Draft

57

Figure 17. Same as Figure 7 but from daily interpolation of precipitation using the two-step process

Documents

Civil, Environmental and Architectural Engineering - Inter …civil.colorado.edu/~balajir/my-papers/Yhwang-WRRJul04.pdf · 2004-07-04 · Inter-comparison of spatial estimation schemes