GRASP: generalized regression analysis and spatial predictionusers.clas.ufl.edu/.../References_for_links/Marmot/GRASP.pdf · GRASP: generalized regression analysis and spatial prediction

GRASP: generalized regression analysis and spatial prediction

Anthony Lehmann *, Jacob McC. Overton, John R. Leathwick

Manaaki Whenua, Landcare Research, Private Bag 3127, Hamilton, New Zealand

Abstract

We present generalized regression analysis and spatial prediction (GRASP) conceptually as a method for producing

spatial predictions using statistical models, and introduce and demonstrate a specific implementation in Splus that

facilitates the process. We put forward GRASP as a new name encapsulating an existing concept that aims at making

spatial predictions using generalized regression analysis. Regression modeling is used to establish relationships between

a response variable and a set of spatial predictors. The regression relationships are then used to make spatial predictions

of the response. The GRASP process requires point measurements of the response, as well as regional coverages of

predictor variables that are statistically (and preferably causally) important in determining the patterns of the response.

This approach to spatial prediction is becoming more commonplace, and it is useful to define it as a general concept.

For instance, GRASP could use a survey of the abundance of a species (the response), and existing spatial coverages of

environmental (e.g. climate, landform) variables (the predictors) for a region. A multiple regression can be used to

establish the statistical relationship between the species abundance and the environmental variables. These regression

relationships can then be used to predict the species abundance from the environmental surfaces. This process defines

relationships in environmental space and uses these relationships to predict in geographic space. We introduce GRASP

(the implementation) as an interface and collection of functions in Splus designed to facilitate modern regression

analysis and the use of these regressions for making spatial predictions. GRASP standardizes the modeling process and

makes it more reproducible and less subjective, while preserving analysis flexibility. The set of functions provides a

toolbox that allows quick and easy data checking, model building and evaluation, and calculation of predictions. The

current version uses generalized additive models (GAMs), a modern non-parametric regression technique the

advantages of which are discussed. We demonstrate the use of the GRASP implementation to model and predict the

natural distributions of two components of New Zealand fern biodiversity: (1) the natural distribution of an icon

species, silver fern (Cyathea dealbata ); and (2) the natural pattern of total fern species richness. Key steps are

demonstrated, including data preparation, options setting, data exploration, model building, model validation and

interpretation, and spatial prediction.

# 2002 Elsevier Science B.V. All rights reserved.

Keywords: Biodiversity; Sustainable management; Spatial prediction; Generalized additive models; Geographic information systems;

New Zealand; Ferns

Abbreviations: CART, classification and regression trees; GAM, generalized additive models; GIS, geographic information system;

GLM, generalized linear models; Logistic R, logistic regression; LSR, least square regression; ROC, receiver operating characteristics.

* Corresponding author. Present address: Swiss Centre for Faunal Cartography (CSCF), Terreaux 14, CH-2000 Neuchatel,

Switzerland. Fax: �/41-32-717-7969

E-mail address: [email protected] (A. Lehmann).

Ecological Modelling 157 (2002) 189�/207

www.elsevier.com/locate/ecolmodel

0304-3800/02/$ - see front matter # 2002 Elsevier Science B.V. All rights reserved.

PII: S 0 3 0 4 - 3 8 0 0 ( 0 2 ) 0 0 1 9 5 - 3

mailto:[email protected]

1. Introduction

Conservation management requires information

on a diverse range of environmental attributes,

including species distributions, biodiversity values,

natural ecosystem potential, human uses, threats

and risks. Often this information takes the form of

spatial predictions (e.g. maps) that present the best

estimates of these attributes across the landscape.

Ideally, these spatial predictions are produced

using rigorous methods for integrating and gen-

eralizing underlying information.

A diverse range of ecosystem characteristics

have been measured and predicted from their

main environmental drivers. These include species

distribution (Bio et al., 1998; Franklin, 1998;

Leathwick, 1998; Lehmann, 1998), richness hot-

spots (Lehmann et al., 2002), soil characteristics

(McKenzie and Ryan, 1999), forest and agricul-

tural productivity, and tourism attraction. The

resulting spatial predictions of ecosystem charac-

teristics can be integrated further. Species richness

and natural community composition can, for

instance, be estimated by summing several species

predictions (Guisan and Theurillat, 2000; Leath-

wick, 2002; Lehmann et al., 2002). Alternatively,

biotic domains can be defined from the classifica-

tion of predicted species distribution (Peters and

Thackway, 1998; Overton et al., 2000; Leathwick,

2001). This approach of vegetation ecology has

proved to be very valuable to biodiversity research

(Austin, 1999a) with good examples of applica-

tions from both Australia (e.g. Ferrier et al.,

2002a,b) and New Zealand (e.g. Overton et al.,

2002).

Methods used to make spatial predictions

should meet several criteria:

1) They should be general enough to deal with

the wide variety of attributes that need to bepredicted.

2) They should be rigorous and data-defined, to

make predictions in an objective and defen-

sible manner.

3) They should be standardized to produce uni-

form results, and streamlined to facilitate the

required analyses.

The general concept of generalized regressionanalysis and spatial prediction (GRASP), pre-

sented in the following section, fulfils the first

two criteria, where its implementation, also pre-

sented in this paper, contributes to the final

criterion by allowing quick and easy predictions

of several responses at once. We intentionally

define GRASP as a duality*/a general method,

and a specific implementation.

2. Generalized regression analysis and spatialprediction

GRASP plays two important roles in quantita-

tive ecology and ecosystem management. First, it

develops statistical relationships between a vari-

able of interest (e.g. species distribution or rich-

ness) and environmental variables providing

descriptions of biodiversity patterns (e.g. climate,soil characteristics). Second, it provides a method

for making strictly data-defined spatial predictions

using point measurements of the attributes. The

concept of GRASP has been used in numerous

ecological applications (Table 1), including recent

ones that have used the GRASP implementation

demonstrated here, (e.g. Cawsey et al., 2002;

Lehmann et al., 2002; Overton et al., 2002; Rayet al., 2002; Zaniewski et al., 2002).

2.1. GRASP compared with other approaches

GRASP is one of a number of possible methods

for making spatial predictions from point mea-

surements (e.g. Guisan and Zimmermann, 2000).

Different approaches to spatial prediction have

different situations in which they are most suita-

ble. Some GRASP characteristics that set it apart

from other methods of spatial prediction are:

1) GRASP fits response surfaces as a function of

predictors in environmental space, and thenuses the spatial pattern of the predictor

surfaces to predict the response in geographic

space.

2) GRASP uses statistical models, rather than

mechanistic models, to determine the relation-

ship between the response and the predictors.

A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207190

3) GRASP uses regression models to determinethe statistical relationship between the re-

sponse and predictors.

The first of these differences distinguishes

GRASP from spatial prediction techniques such

as surface-fitting algorithms or from geostatistical

approaches such as kriging. These techniques

estimate surfaces directly in geographic space,

rather than in predictor space as is done in

GRASP. Surface fitting and geostatistical ap-

proaches are vital for the estimation of funda-

mental surfaces, such as elevation, or climate

surfaces, but have distinct limitations for other

purposes. In the example demonstrated here, we

predict the natural distribution of fern biodiversity

by estimating the relationships between fern dis-

tributions and environment using sites largely free

from human disturbance. These environmental

relationships are then used to predict for all areas

of New Zealand as if they were undisturbed by

humans. This provides an estimate of natural (in

the absence of human influence) fern biodiversity

of New Zealand. A similar approach could be used

to predict the likely effects of climate change on

natural distributions of species. Neither of these

applications would be possible without first esti-

mating a surface in environmental surface.

The use of statistical models to determine

relationships between the responses and the pre-

dictors distinguishes GRASP from more process-

based approaches to spatial prediction. For many

purposes, mechanistic models may provide more

robust predictions, but are much more difficult to

produce than statistical models, requiring a huge

amount of work to be carefully data-defined.

While this approach is sometimes used for ecosys-

tem processes like carbon sequestration or net

primary productivity, it is virtually impossible for

the distribution of species, something for which

GRASP is ideally suited. Process-based models

that will be used for spatial prediction require

spatial estimates of the input parameters (for

which GRASP could be used), and assumptions

that processes are spatially coherent and consis-

tent. Process-based models can have huge numbers

of parameters, with many parameters not rigor-

ously fitted to data, or fitted to data in a small

region. In this case, extrapolations outside data

with such a model is as dangerous as it is with a

statistical model. Furthermore, for the models to

be data defined still requires careful fitting of

parameters, which is eventually a statistical ex-

ercise. In this light, the distinction between statis-

tical models and mechanistic models for prediction

becomes less clear. Statistical and mechanistic

models should be viewed as complementary tools

in coordinated programs for observational, experi-

mental and theoretical researches.

Finally, the use of regression models to estimate

environmental relationships distinguishes the

GRASP approach from other approaches such

Table 1

Regression analyses used to make spatial predictions

Method Ys distribution Response curve

characteristics

Combination with

GIS

Statistical

references

Recent examples

LSR Gaussian Parametric, quadratic

(X, X2, X3...)

Substitution of Xs by

maps in GIS

(Draper and Smith,

1981)

(Lehmann et al., 1997;

Wolgemuth, 1998)

Logistic

R

Binomial Parametric, quadratic

(X, X2, X3...)


maps in GIS

(Hosmer and

Lemeshow, 1989)

(Narumalani et al., 1997)

GLM Gaussian, binomial,

Poisson. . .

Parametric, quadratic

(X, X2, X3...)


maps in GIS

(McCullagh and

Nelder, 1997)

(Guisan et al., 1998)

GAM Gaussian, binomial,

Poisson. . .

Non-parametric,

smoothed, any shape

Prediction in statistical

package, export to GIS

(direct or lookup table)

(Hastie and

Tibshirani, 1990)

(Bio et al., 1998;

Lehmann, 1998)

CART Gaussian Non-linear,

non-additive

Prediction in statistical

package

(Breiman et al.,

1984)

(Franklin, 1998; Iverson

and Prasad, 1998)

A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207 191

as CCA (e.g. Guisan et al., 1998) or hyperdimensional niche modeling (e.g. Hirzel et al.,

2002; Zaniewski et al., 2002). These methods

define the species distributions in environmental

space in a multivariate sense. They do not differ

much from the GRASP general concept, but

differences exist in application. Neural network

models are closer in their applications to the

GRASP methodology described here (e.g. Brosseand Lek, 2000).

2.2. GAMs in ecology

Among possible multiple regression methods

(Table 1), we concentrate on GAMs for several

reasons. GAMs are non-parametric extension of

GLMs, which are themselves a generalization of

classical least square regression (LSR). WhileGLMs extend the application of classical regres-

sion into other statistical distributions (binomial,

Poisson, gamma, negative binomial), GAMs esti-

mate response curves with a non-parametric

smoothing function instead of parametric terms

(e.g. ax�/bx2). This allows exploration of shapes of

species response curves to environmental gradi-

ents, and allows the fitting of statistical models inbetter agreement with ecological theory (Austin,

1999b, 2002). GAMs are described in details in

(Hastie and Tibshirani, 1990) and are fitted here

within S-Plus (Chambers and Hastie, 1993). Other

analysis methods such as CART and ANN (see

Table 1) are also capable of fitting robust models

to ecological data and may be better suited for

modeling interactions. However, at this stage,these methods lose in ecological interpretability

by not allowing observation of response curve

shapes, and in statistical inference by not selecting

significant variables based on appropriate statis-

tical distributions.

GAMs have been used in numerous applications

to predict species distribution as a function of their

environment: by Yee and Mitchell (1991); incomparison with GLMs (e.g. Austin and Meyers,

1996; Bio et al., 1998; Franklin, 1998); in compar-

ison with ANNs (e.g. Brosse and Lek, 2000); to

assess the effect of environmental changes (e.g.

Leathwick et al., 1996; Lehmann, 1998); to assess

the effect of interspecific competition (e.g. Leath-

wick, 1998; Leathwick and Austin, 2001; Leath-wick, 2002); to predict species richness (e.g.

Leathwick et al., 1998; Lehmann et al., 2002);

and to predict community composition (e.g. Caw-

sey et al., 2002).

2.3. Purpose of the GRASP implementation

The GRASP implementation was primarily

designed to facilitate the production of spatial

predictions from sparsely distributed data frompoint surveys. It provides a tool and a user-

friendly interface to promote the use of spatial

predictions in environmental management (e.g.

Lehmann et al., 2002). It has been developed as

a graphical interface with underlying functions

(Lehmann et al., 1999), using S-PLUS v.4.5 and

above (Mathsoft Inc., Seattle, WA) under MS

WINDOWS environment. Up-to-date informationon GRASP can be found on the following website:

http://www.cscf.ch/grasp.

We present here the GRASP implementation in

more details with two spatial prediction examples:

(i) a species distribution, and (ii) total species

richness. Both examples use data on New Zealand

fern distribution (Lehmann et al., 2002).

3. Demonstration of GRASP

3.1. Fern distribution (response variables)

A selection of 19 875 RECCE plots (Allen, 1992)

were selected from the New Zealand National

Indigenous Vegetation Survey (NIVS, Wiser et al.,

2001) and data describing the presence or absence

of fern species were extracted. These plots were

collected between the 1950s and 1980s withinindigenous forests that were largely free from

anthropogenic disturbances. From the pool of

observed taxa, we selected 43 fern species (Leh-

mann et al., 2002) on the basis of their percentage

of occurrence and taxonomic criteria as potential

indicators of indigenous forest species diversity.


http://www.cscf.ch/grasp

3.2. Environmental information (spatial

predictors)

Climate surfaces predicting mean monthly pre-

cipitation, temperature, solar radiation and hu-

midity produced from climate station data

(Leathwick and Stephens, 1998) using thin-plate

splines (Hutchinson and Gessler, 1994), were used

to estimate climate parameters for each vegetation

plot. Derived predictors more directly related to

forest plant physiology and ecology were then

calculated as described in Leathwick and White-

head (2001), Leathwick and Austin (2001)). Mean

annual solar radiation (Sa) determines potential

site productivity, while mean annual temperature

(Ta), soil water deficit (W ), vapor pressure deficit

(VPD) and the lowest monthly ratio of rainfall to

potential evapotranspiration (R /E ) determine ac-tual productivity. Seasonality of both temperature

and solar radiation (Tw and Sw) indicate the

departure of winter values from the annual mean

and can be interpreted as opposites to continen-

tality gradients (Table 2).

The influence of landform was analyzed using

estimates of slope (S ), drainage (D ) and lithology

(L ) (Table 2). Relevant data were extracted from

the New Zealand Land Resource Inventory

(NZLRI) (Newsome, 1992). Additional work was

needed to convert descriptions of soil parent

material into units more relevant to the predictionof plant distributions, resulting in a variable

describing the distributions of 15 classes of soil

parent materials.

In the following sections, spatial predictions of

Cyathea dealbata and fern species richness are

made using the predictors presented in Table 2.

The GRASP method used in both cases is

described in five steps, and in more details in(Lehmann et al., 1999).

3.3. Data exploration

Exploratory analyses were performed before

starting the process of model selection in order

to investigate possible data problems such aspresence of outliers and correlation between pre-

dictors.

First the environmental envelopes of species

were calculated as the hyper-rectangle in environ-

mental space within the minimum and maximum

of the species distribution along each environmen-

tal variable. The environmental envelope of C.

dealbata excluded 8456 from the calibration dataset and 93 939 from the prediction data set. This

step was useful to avoid the influence of large

numbers of absences on each side of environmen-

tal gradients (Austin and Meyers, 1996), and

should result in more accurate estimation of the

Table 2

Environmental predictors used to model fern distribution of New Zealand improved from (Leathwick, 1998)

Abbreviation Variable name Mean (range)

Ta (8C) Mean annual temperature 9.3 (1.5�/15.9)

Tw (8C) Temperature seasonality 0.4 (�/3.2�/4.2)

Sa (MJ m�2 per

day)

Mean annual solar radiation 13.6 (11.7�/15.3)

Sw (MJ m�2 per

day)

Solar radiation seasonality 0.2 (�/0.7�/1.1)

VPD (%) October Vapor Pressure Deficit 0.25 (0�/0.57)

W (MPa days) Soil Water Deficit 0.59 (0.11�/2.13)

R /E Minimum rainfall to potential evapotranspiration 2.5 (0.5�/8.5)

S Slope 28.6 (1.5�/40)

D Drainage: V�/very good; G�/good; I�/intermediate; P�/poor; VP�/very poor

L Lithology: Metamorphic: MG�/Gneiss, Granite, MS�/Shist Plutonic: PD�/Diorite,

PG�/Gabbro, PU�/Ultramafic Quaternary: QA�/Alluvium, QL�/Loess, QO�/Organic,

QS�/Sand Sedimentary: SL�/Limestone, SS�/Strong, SW�/Weak Volcanic:VA�/Andesite,

VB�/Basalt, VR�/Rhyolite


additive model, and substantially speed up the

modeling process.

We mapped the presence�/absences of C. deal-

bata to check their spatial distribution and to

identify geographic outliers in the data. Such

outliers could have been erroneous data, such as

field misidentifications, or typographical errors.

The map of C. dealbata showed its distribution in

the North Island and in the northern tip of the

South Island (Fig. 1). A few dubious observation

points occurring along the West Coast of the

South Island were kept in the analysis because

the effect of these possible outliers was compen-

sated by a large number of absences. The clustered

distribution of all vegetation plots reflected the

underlying unevenness in the sampling of plots

contained within the National Indigenous Vegeta-

tion Survey database. This distribution corre-

sponded closely to the distribution of remnant

forests in New Zealand covering approximately

26% of the total land area.

We then described the environmental space

occupied by species with specially designed histo-

grams (Fig. 2) and scatter plots of response versus

predictors (Fig. 3). C. dealbata was found on

warmer sites, with high solar radiation, rather dry

air and soil conditions, low ratio of rainfall to

evapotranspiration, average slope, and well-

drained soils formed on sedimentary and volcanic

parent materials.

Finally, correlations between the chosen pre-

dictors were calculated to allow removal of

correlated predictors. This was useful because

correlated predictor variables can cause trouble

in estimating additive surfaces. The highest corre-

lation (r�/0.793) between predictors was observed

between VPD and W (Fig. 4). However, this

correlation was not high enough to justify remov-

ing one variable or another from the modeling

process. Note that seasonality measures (Tw and

Sw) were not correlated at all with their original

variables (Ta and Sa). These measures of season-

ality allowed taking into account extreme events

such as low winter temperature without adding a

variable like minimum winter temperature that

would have been highly correlated with Ta.

3.4. Model selection

For C. dealbata presence�/absence data, a quasi-

binomial model was chosen, whereas for richness,

a quasi-Poisson model was most appropriate. In

both cases, a stepwise procedure was used to select

significant predictors. A starting model including

all continuous predictors smoothed with 4 degrees

of freedom was first fitted. The significance ofeither dropping smooth terms or converting them

to a linear form was then tested using an analysis

of variance (ANOVA; F-test) for quasi models. At

each step, the less significant change was kept and

serves as starting point for the next step.

The stepwise selection of statistically significant

environmental predictors for C. dealbata distribu-

tion (1) and fern richness (2) selected the followingmodels:

1) C. dealbata probability: s (Ta,4)�/s(Tw,4)�/

s(Sa,4)�/s(Sw,4)�/s(VPD,1)�/s (W ,4)�/

s(R.E,4)�/s (S ,4)�/L.

2) Species richness: s(Ta,4)�/s(Tw,4)�/s (Sa,4)�/

s(Sw,4)�/s(VPD,4)�/s(W ,4)�/s (R.E,4)�/

s(S ,1)�/L�/D .

where s�/spline smoother, 4 and 1 are degrees of

freedom for the spline smoother.

Most variables were kept in both models. Only

D was removed from the Cyathea model, whileVPD was reduced to its linear form. Only S was

reduced to its linear form in the richness model.

The dispersion parameter estimated by the quasi-

binomial model is 1.005 and 1.576 for the Poisson

model. In this case, the choice of using quasi

models, which estimate the dispersion parameter,

instead of using a default value of 1, was only

justified for the Poisson model, and did not changethe analysis outcome for the binomial model.

3.5. Model validation

The selected models were evaluated by two

methods. The first method was a plot of the

observed response values against the values pre-

dicted by the model. The second method was a

cross-validation of the model. Correlation between

the actual and predicted values was then calculated


Fig. 1. Spatial distribution of (A) all 19 875 vegetation plots and (B) presences of C . dealbata from the New Zealand National Indigenous Vegetation Survey.

A.

Leh

ma

nn

eta

l./

Eco

log

ical

Mo

dellin

g1

57

(2

00

2)

18

9�

/20

71

95

to assess the goodness of fit of Poisson model,

whereas the ROC test (Fielding and Bell, 1997)

was used for binomial model.

Both the cross-validation and the simple valida-tion of the Cyathea model presented high ROC

values (ROC�/0.94) with no difference between

them, indicating good model stability (Fig. 5). The

correlation between observed and predicted spe-

cies richness was relatively high (r�/0.74 and 0.75,

respectively), corresponding to more than half

(r2�/0.56) the variation in species richness ex-

plained by the model (Fig. 5).

3.6. Model interpretations

Three analyses were used to help interpret and

understand the regression models, including: (1)

plots of the regression model, (2) plots of response

curves that result from the model, and (3) graphs

of the overall contribution of the variables to the

model. The first two allowed the visualization of

how the response variable varies as a function of

the predictor variables, while the third allowed the

assessment of the relative importance of the

predictor variables in explaining the variation in

the response variable.

Inspection of graphs showing the contribution

of each predictor to the Cyathea model confirmed

the dominant role of Ta. When dropping each

predictor from the final model, Ta was the only

variable whose contribution could not be compen-

sated for by a combination of other variables. The

alone contribution showed the potential of each

variable to explain Cyathea distribution. Here, the

Fig. 2. Environmental distribution of C . dealbata in the New Zealand National Indigenous Vegetation Survey. Entire histogram bars

represent the distribution of all plots, where darker areas represent the number of plots occupied by the species in each bar. This

number is also written on top of each bar. The plain line is the ratio between occupied and not occupied plots and the dashed lines

correspond to the overall mean proportion of occupied plots.


Fig. 3. Plots of responses vs. predictors for (A) C . dealbata presence�/absences, and (B) fern richness. A smooth line is added to the

graph.


picture was quite different, with the principal

contributions ranked in the following order: Ta,

Sa, RE, L , VPD, W (Fig. 6).

One of the central parts of the interpretation of

GAM models was the description of the predic-

tor’s partial response curves. Here, C. dealbata is

characterized by a strong positive response to Ta

and Sa, by rather warm winter temperatures (high

Tw), by negative responses to two gradients of

water deficit (VPD, W ), and by higher rainfalls

(R /E ) (Fig. 7). Species richness was predicted to be

higher in productive environments characterized

by high Ta and low Sa, extreme temperature

seasonality, low solar radiation, high solar radia-

tion seasonality, and high humidity gradients

(VPD, W , R /E ; Fig. 7).

The plot of combined response curves for C.

dealbata and another modeled species (Dicksonia

squarrosa ) was characterized by a small difference

in response to Ta and an opposite response to Sa

and R /E (Fig. 8). Other gradients played relatively

small effects when all other variables were set to

their optimum values. These combined response

plots proved to have a great potential to study

competitive interactions between species (e.g.

Leathwick, 2002).

Fig. 4. Predictors correlation matrix.


3.7. Spatial predictions

We implemented two methods for making

predictions in GRASP, depending on the number

of points for which the predictions have to be

made. In the present examples, we are predicting

on 250 000 km2 for all New Zealand. Since it is a

manageable number of pixels, the predictions are

made within GRASP in Splus. With larger num-

bers of points, regression models are exported to

lookup tables and predictions are made in Arcview

as first proposed by Ferrier (pers. communica-

tion).

C. dealbata ’s natural distribution (Fig. 9a) was

predicted to be higher in the North Island on low,

warm and humid lands, and lower on its higher

and colder center. This species was predicted to

occur only on the northern tip of the South Island.

Species richness prediction (Fig. 9b) indicated that

the highest richness should occur in central and

southern parts of the North Island, and in westernand southern South Island.

4. Discussion

GRASP fits a response surface as a function of

environmental space, and then predicts in geo-

graphic space by using the spatial patterns of the

environmental surfaces. This process has two main

purposes and a number of advantages illustrated

by our example of predicting the natural fern

species richness (see also Lehmann et al., 2002).

First, the large-scale relationship between bioticcharacteristics and the environment are described,

such as species richness in relation to climate and

soil characteristics (Fig. 7b). This is a powerful

tool in gaining ecological insight or guiding

management activities. The observed relationships

may suggest hypotheses on the ecological drivers

Fig. 5. Cross validation of (A) C . dealbata model, (B) fern richness model.


of fern biodiversity, or more appropriate ways to

manage fern biodiversity hotspots. As such, these

statistical relationships form an important role in a

coordinated research program for observational,

experimental and theoretical research. Biotic inter-

actions such as competition can also be integrated

and studied using these relationships (e.g. Leath-

wick, 2002).

Second, spatial predictions can be made by

using the modeled relationships to interpolate

within the environmental space defined by the

available data (Fig. 9a) and, with caution, to

extrapolate in time or space out of this environ-

mental envelope. Spatial predictions of species

distributions can however be used in many differ-

ent ways as presented by Overton et al. (2002) in

their information pyramids for informed biodiver-

sity conservation, for instance to define biodiver-sity hotspots (e.g. Lehmann et al., 2002), to plan

natural reserve networks (e.g. Ferrier et al.,

2002a), or to classify habitats as described in

Ferrier et al. (2002b).

While some limitations on the GRASP process

are difficult to avoid (data availability and quality,

correlative instead of causal relationships, assump-

tion of system in equilibrium), several improve-ments can be expected along the different steps of

the modeling process.

4.1. Data requirements and limitations

The availability of adequate biological and

environmental data for GAM analysis and spatial

Fig. 6. Contributions of selected predictors in modeling C . dealbata distribution, (A) when dropped from the final model, (B) on their

own.


Fig. 7. Partial response curves of (A) C . dealbata and (B) species richness models.


Fig. 8. Combined response curves of two tree ferns of New Zealand, C. dealbata and D . squarrosa .

A.

Leh

ma

nn

eta

l./

Eco

log

ical

Mo

dellin

g1

57

(2

00

2)

18

9�

/20

72

02

predictions is constrained by several factors (Bio,

2000):

. heterogeneous data distribution in geographic

and environmental space;

. undefined data representation of the modeled

process (e.g. direct vs. indirect gradients; Austin

and Graywood, 1994; Austin, 1999b);

. inadequate geographic scale, as patterns and

relationships vary with scales (Franklin, 1995);

. inaccurate measurements of responses and/or

predictor variables;

. inaccurate measurements of geographic posi-

tion;

. spatial and temporal data autocorrelation.

According to these criteria, adequate data for

plant or animal distribution modeling are rare in

most countries. Available data often come from

herbaria or collections and do not meet the above

listed criteria (Hirzel et al., 2002; Zaniewski et al.,

2002). When data seem available in good quantity,

their quality is often dubious due to identification

problems, lack of survey consistency and inap-

propriate sampling strategy.

In most studies, the effect of data auto-correla-

tion on sample independence is neglected (Bio,

2000) due to the lack of understanding of its

impact on parameters estimations, residuals dis-

tribution, model selection and evaluation, and

spatial predictions.Environmental predictor data quality is also an

important concern even though methods and

availability seem more satisfactory in this area.

While most countries have climatic stations and

geological maps from which estimates of environ-

mental predictors can be obtained, the difficulty

here is in interpolation or class boundary accu-

racy. Improvements in environmental information

at finer scales are expected in most countries from

Fig. 9. Spatial predictions of (A) C . dealbata distribution and (B) fern species richness of New Zealand.


digital elevation models at low-resolution (20 m)and from remote sensing information.

In summary, data requirements for GRASP are

principally precise response and predictor mea-

surements, a good spatial cover, and precise

geographic coordinates based on a defined sam-

pling strategy. Even though GRASP analysis

cannot improve the existing data, it can help

identify data outliers and needs for new datacollection based on adequate sampling strategies.

GRASP is presently addressing these problems

only by descriptive exploratory methods which

allows the user to get a better understanding of his

data before modeling it in practice, most of the

modeling efforts mentioned in this paper have

revealed that the information contained in imper-

fect available data was still capable of generatingsensible and interpretable results.

4.2. Evolution of GRASP

While the GRASP implementation relies on

well-tested statistical methods currently used in

ecological studies, it has been designed to be

continuously updated and improved to meet user

requirements and ecological interpretation needs.

The next major improvements can be expected

with:

. analyses of spatial residuals (Bio, 2000);

. methods for presence only data (Zaniewski etal., 2002);

. methods for species competition analyses

(Leathwick, 2002);

. all possible model selection (Ray et al., 2002);

. choice of optimal smoother degrees of freedom

(Wood, 2000);

. integration of predictor interactions;

. improved cross-validation and variable selec-tion;

. more accurate error estimation.

Even though GAM analyses seem to meet most

species’ modeling requirements, we wish to inte-

grate generalised linear models (GLM) and neural

networks (ANN) as comparative modeling meth-

ods in GRASP. GLMs present the advantage of

parametrically defined methods with robust theo-

retical background associated with a larger num-ber of analytical tools. ANNs are non-linear

methods that could be preferred when quick and

accurate predictions are more important than

statistical and/or ecological interpretations.

Furthermore, ANNs might be more appropriate

to handle interactions between predictors, and can

therefore be used to assess the amount of informa-

tion that was possibly not captured with GAMs.More sophisticated evolutions, such as parallel

modeling of two or more species at the same time,

are being developed and are already available on

dedicated software (Yee and Mackenzie, 2002).

4.3. Required validation

Validation is one of the crucial parts of themodeling process. It can be achieved by comparing

observed and predicted response on an indepen-

dent data set (Guisan et al., 1998) or by cross-

validation when a data set can be split in several

subsets (Lehmann et al., 2002). It is not clear

whether independent datasets are really preferable,

as is generally claimed (Guisan and Zimmermann,

2000). We think that cross-validation is generallymore practical because it creates random subsets

relatively independent and allows the use of all

available data in the modeling process. By using

entirely independent datasets, we risk comparing

different sampling strategies instead of evaluating

a model.

Methods to assess model quality are still under

development and depend on the distribution of thedata. For binomial models, several methods are

available (Fielding and Bell, 1997) but three

methods are generally used: Kappa statistics (e.g.

Lehmann, 1998); Hosmer and Lemeshow test (e.g.

Bio et al., 1998); ROC techniques (e.g. Lehmann et

al., 2002). Our experience with these methods

favors the use of ROC statistics because, unlike

Kappa, the ROC method avoids the problem ofchoosing a threshold value and, unlike Hosmer

test, it avoids the problem of choosing a number of

groups or a sorting method. With Poisson or

Gaussian models, simple correlation measures

between observed and predicted values are very

efficient.


Model validation is also commonly made on

spatial predictions themselves. The plotting of

point observations on top of predicted values to

see how geographical patterns are reproduced is a

good empirical method to judge model quality.

Additional tests of spatial autocorrelation of

residuals (e.g. Bio, 2000) are also possible to assess

whether the model was unable to capture spatial

patterns independent of environmental predictors.

4.4. Possible interpretations

Regression models rely on correlation and not

causal relationships. Nevertheless, the interpreta-

tion of species models brings valuable information

on species distribution along direct environmental

gradients, and helps formulate new hypotheses on

the respective rules of abiotic and biotic factors

(dispersal, competition). To diminish the risks of

misinterpretation, we carefully chose direct envir-

onmental gradients (Austin and Smith, 1989) that

must have some physiological meanings for species

growth or distribution. One method of examining

the effect of dispersal would be to model the

contribution of the environment, and then attempt

to explain the residuals by Easting, Northing and/

or their interaction. Similarly, competition issues

can be addressed by including in the environment

data set the information on the distribution of a

competing species (Leathwick, 1998; Leathwick

and Austin, 2001).

Among available methods for predictions (e.g.

GLM, GAM, ANN), GAM seemed preferable to

us because of the ecological interpretability of its

non-parametric response curves. It has also the

advantage of being statistically well defined allow-

ing for good inference but to be flexible enough to

fit the data closely. It appears as a good compro-

mise between GLM on one side and ANN on the

other one. Improvements are needed however to

reach a statistical method that would agree fully

with our understanding of ecological complexity

(Austin, 2002), taking into account species inter-

actions, spatial auto-correlation, and predictors

interactions.

4.5. Improved predictions

GRASP first models a species’ realized environ-

mental niche, and then it predicts its natural

spatial distribution in geographical space. This

explains why care must be taken using these

models to predict outside the present environmen-

tal envelope of one species either in space or time.

This type of statistical model also relies on the

acceptance of an equilibrium state that does not

exist in nature, but that can be accepted at a given

time and at a given scale. Once admitted, this

equilibrium greatly simplifies the modeling ap-

proach and allows the production of the quick and

accurate predictions needed for management pur-

poses.

A drawback of GAM, and other non-parame-

trical methods such as ANN, is the difficulty to

export a model out of the software package where

it was fitted. This is particularly a problem when

the aim is to build a spatial prediction in a GIS.

Large predictions data sets will become increas-

ingly common especially with the need for envir-

onment managers to use predictions at a fine

resolution. Making spatial predictions at a 20 m

resolution for instance can be achieved at a local

scale on a limited area but is presently difficult on

larger regions. Going from 1000 m to 20 m

resolution requires 2500 times more memory.

Two solutions exist for non-parametrical methods

to build spatial predictions, (i) with smaller

datasets, calculate the predictions directly within

the statistical package using a prediction environ-

mental data set, and export them to the GIS, (ii)

with larger datasets, use a lookup table to describe

the response curves for each variable along a

reduced number of values, and use the obtained

lookup table in a GIS to reclassify the predictors

accordingly to their contribution to the model.Further improvements in building large size

spatial predictions from GAMs should include

the possibility of using bivariate smoothed pre-

dictors such as geographical coordinates, possible

interactions between predictors and spatial auto-

correlation between responses. All these para-

meters cannot be included in univariate lookup

tables at the moment.


5. Conclusions

The ability of generalised additive models to

model data following different types of distribu-

tion (normal, binomial, Poisson, . . .), the possibi-

lity of combining continuous and factor

explanatory variables, as well as the data-

driven shape of response curve are all important

characteristics in ecological modeling. These

reasons motivated us to develop GRASP functions

to automate the process of modeling and predict-

ing. The main advantage of this auto-

mation is the quick and easy update of models

and predictions with new data or new

models specifications. This also allows the model-

ing process to be standardized for several

responses at once, and therefore, to makes

it more reproducible and less subjective. Allowing

editing and adapting functions or models

keep flexibility for any special cases and at

any stage. Furthermore, the user-friendly inter-

face of GRASP has proven to bring more

scientists into the practice of making

spatial predictions, which should help im-

proving this field of ecological modeling for better

applications in environmental management.

Acknowledgements

The data used in this study was drawn from the

National Vegetation Survey database, and was

mainly collected by staff of the former New

Zealand Forest Service whose efforts are gratefully

acknowledged. This work benefited greatly from

the comments and help of Mike Austin, Simon

Barry, Antoine Guisan, Margaret Cawsey, Emma-

nuel Castella, Thomas Yee and Geoff Pegler. We

acknowledge the support of the Foundation for

Research, Science and Technology under contract

number CO9642: a methodology for the selection

of biodiversity indicators. We also acknowledge

the contribution of the Swiss National Science

Foundation to Anthony’s postdoctoral project.

References

Allen, R.B., 1992. RECCE an Inventory Method for Describing

New Zealand Vegetation. Forest Research Institute,

Christchurch, p. 25.

Austin, M.P., 1999a. The potential contribution of vegetation

ecology to biodiversity research. Ecography 22, 465�/484.

Austin, M.P., 1999b. A silent clash of paradigms: some

inconsistencies in community ecology. Oikos 86, 170�/178.

Austin, M.P., 2002. Spatial prediction of species distribution:

an interface between ecological theory and statistical

modeling. Ecol. Model. 157 (2�/3), 101�/118.

Austin, M.P., Graywood, M.J., 1994. Current problems of

environmental gradients and species response curves in

relation to continuum theory. J. Veg. Sci. 5, 473�/482.

Austin, M.P., Meyers, J.A., 1996. Current approaches to

modeling the environmental niche of eucalypts: implication

for management of forest biodiversity. Forest Ecol. Man-

age. 85, 95�/106.

Austin, M.P., Smith, T.M., 1989. A new model for the

continuum concept. Vegetation 83, 35�/47.

Bio, A.M.F., 2000. Does vegetation suit our models? Data and

model assumptions and the assessment of species distribu-

tion in space. Ph.D. thesis, Utrecht University.

Bio, A.M.F., Alkemade, R., Barendregt, A., 1998. Determining

alternative models for vegetation response analysis: a non

parametric approach. J. Veg. Sci. 9, 5�/16.

Breiman, L.J., Freidman, J., Olshen, R., Stone, C., 1984.

Classification and Regression Trees. Wadsworth, Belmont,

CA.

Brosse, S., Lek, S., 2000. Modelling roach (Rutilus rutilus )

microhabitat using linear and nonlinear techniques. Fresh-

water Biol. 44, 441�/452.

Cawsey, E.M., Austin, M.P., Baker, B.L., 2002. Regional

vegetation mapping in Australia: a case study in the

practical use of statistical modeling. Biodivers. Conserv.

11, 2239�/2274.

Chambers, J.M., Hastie, T.J., 1993. Statistical Models. Chap-

man & Hall, London, p. 608.

Draper, N.R., Smith, H., 1981. Applied Regression Analysis.

Wiley, New York, p. 709.

Ferrier, S. Drielsma, M., Manion, G., Watson, G. 2002b.

Extended statistical approaches to modeling spatial pattern

in biodiversity: the north-east New South Wales experience.

II. Community-level modeling, 11, 2309�/2338.

Ferrier, S., Watson, G., Pearce, J. Drielsma, M., 2002b.

Extended statistical approaches to modeling spatial pattern

in biodiversity: the north-east New South Wales experience.

I. Species-level modeling, 11, 2275�/2307.

Fielding, A.H., Bell, J.F., 1997. A review of methods for the

assessment of prediction errors in conservation presence/

absence models. Environ. Conserv. 24, 38�/49.

Franklin, J., 1995. Predictive vegetation mapping: geographical

modeling of biospatial patterns in relation to environmental

gradients. Prog. Phys. Geogr. 19, 474�/499.


Franklin, J., 1998. Predicting the distribution of shrub species

in southern California from climate and terrain-derived

variables. J. Veg. Sci. 9, 733�/748.

Guisan, A., Theurillat, J.-P., 2000. Equilibrium modeling of

alpine plant distribution and climate change: how far can we

go? Phytocoenologia 30, 353�/384.

Guisan, A., Zimmermann, N.E., 2000. Predictive habitat

distribution models in ecology. Ecol. Model. 135, 147�/186.

Guisan, A., Theurillat, J.-P., Kienast, F., 1998. Predicting the

potential distribution of plant species in an alpine environ-

ment. J. Veg. Sci. 9, 65�/74.

Hastie, T.J., Tibshirani, R.J., 1990. Generalized Additive

Models. Chapman & Hall, London, p. 335.

Hirzel, A.H., Hausser, J., Perrin, N., 2002. Ecological-niche

factor analysis: how to compute habitat-suitability maps

without absence data? Ecology, in press.

Hosmer, D.W.J., Lemeshow, S., 1989. Assessing the fit of the

model. In: Applied Logistic Regression. Wiley, New York.

Hutchinson, M.F., Gessler, P.E., 1994. Splines*/more than just

a smooth interpolator. Geoderma 62, 45�/67.

Iverson, L.R., Prasad, A., 1998. Estimating regional plant

biodiversity with GIS modeling. Divers. Distrib. 4, 49�/61.

Leathwick, J.R., 1998. Are New Zealand’s Nothofagus species

in equilibrium with their environment. J. Veg. Sci. 9, 719�/

732.

Leathwick, J.R., 2001. New Zealand’s potential forest pattern

as predicted from current species-environment relationships.

New Zealand J. Botany 39, 447�/464.

Leathwick, J.R., Austin, M.P., 2001. Competitive interactions

between tree species in New Zealand’s old growth indigen-

ous forests. Ecology 82, 2560�/2573.

Leathwick, J.R., Stephens, R.T.T., 1998. Climate surfaces for

New Zealand. Contract Report LC9798/126, Landcare

Research, Hamilton.

Leathwick, J.R., Whitehead, D., 2001. Soil and atmospheric

water deficits, and the distributions of New Zealand’s

indigenous tree species. Funct. Ecol. 15, 233�/242.

Leathwick, J.R., Burns, B.R., Clarkson, B.D., 1998. Environ-

mental correlates of tree alpha-diversity in New Zealand

primary forests. Ecography 21, 235�/246.

Leathwick, J.R., 2002. Incorporating the effects of inter-specific

competition when modeling species distributions at land-

scape scales. Biodivers. Conserv., in press.

Leathwick, J.R., Whitehead, D., McLeod, M., 1996. Predicting

changes in the composition of New Zealand’s indigenous

forests in response to global warning: a modeling approach.

Environ. Software 11, 81�/90.

Lehmann, A., 1998. GIS modeling of submerged macrophyte

distribution using generalized additive models. Plant Ecol.

139, 113�/124.

Lehmann, A., Jaquet, J.-M., Lachavanne, J.-B., 1997. A GIS

approach of aquatic plant spatial heterogeneity in relation

to sediment and depth gradient, Lake Geneva, Switzerland.

Aquat. Bot. 347�/361.

Lehmann, A., Overton, J.McC., Leathwick, J.R., 1999.

GRASP: Generalized Regression Analysis and Spatial

Predictions, User’s manual. Landcare Research, Hamilton,

New Zealand.

Lehmann, A., Leathwick, J.R., Overton, J.McC., 2002. Asses-

sing New Zealand fern diversity from spatial predictions of

species assemblages. Biodivers. Conserv. 11, 2217�/2238.

McCullagh, P., Nelder, J.A., 1997. Generalized Linear Models.

Monographs on Statistics and Applied Probability. Chap-

man & Hall, London, p. 511.

McKenzie, N.J., Ryan, P.J., 1999. Spatial prediction of soil

properties using environmental correlation. Geoderma 89,

67�/94.

Narumalani, S., Jensen, J.R., Althausen, J.D., Burkhalter, S.G.,

Mackey, H., 1997. Aquatic macrophyte modeling using GIS

and logistic multiple regression. Photogramm. Eng. Remote

Sens. 63, 41�/49.

Newsome, P.F.J., 1992. New Zealand Resource Inventory.

ARC/INFO data manual, DSIR.

Overton, J.McC., Leathwick, J.R., Lehmann, A., 2000. Predict

first, classify later*/a new paradigm of spatial classification

for environmental management: a revolution in the map-

ping of vegetation, soil, land cover, and other environmental

information. In: Fourth International Conference on Inte-

grating GIS and Environmental Modeling (GIS/EM4),

Banff, Alta., Canada.

Overton, J.McC., Leathwick, J.R., Stephens, R.T.T., Lehmann,

A., 2002. Information pyramids for informed ecosystem

management. Biodivers. Conserv. 11, 2093�/2116.

Peters, D., Thackway, R., 1998. A New Biogeographic Regio-

nalisation for Tasmania. Tasmanian Parks and Wildlife

Service GIS Section.

Ray, N., Lehmann, A., Joly, P., 2002. Modelling spatial

distribution of amphibian populations: a GIS approach

based on habitat matrix permeability. Biodivers. Conserv.

11, 2243�/2265.

Wiser, S.K., Bellingham, P.J., Burrows, L.E., 2001. Managing

biodiversity information: development of New Zealand’s

National Vegetation Survey databank. New Zealand J.

Ecol. 25, 1�/17.

Wolgemuth, T., 1998. Modelling floristic richness on a regional

scale: a case study in Switzerland. Biodivers. Conserv. 7,

159�/177.

Wood, S.N., 2000. Modelling and smoothing parameter

estimation with multiple quadratic penalties. J. R. Stat.

Soc. 62, 413�/428.

Yee, T.W., Mitchell, N.D., 1991. Generalized additive models

in plant ecology. J. Veg. Sci. 2, 587�/602.

Yee, T.W., Mackenzie, M., 2002. Vector generalized additive

models plant ecology. Ecol. Model. 157 (2�/3), 141�/156.

Zaniewski, A.E., Lehmann, A., Overton, J.McC., 2002. Pre-

dicting species distribution using presence-only data: a case

study of native New Zealand ferns. Ecol. Model. 157 (2�/3),

259�/278.


Documents

GRASP: generalized regression analysis and spatial predictionusers.clas.ufl.edu/.../References_for_links/Marmot/GRASP.pdf · GRASP: generalized regression analysis and spatial prediction