Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
GRASP: generalized regression analysis and spatial prediction
Anthony Lehmann *, Jacob McC. Overton, John R. Leathwick
Manaaki Whenua, Landcare Research, Private Bag 3127, Hamilton, New Zealand
Abstract
We present generalized regression analysis and spatial prediction (GRASP) conceptually as a method for producing
spatial predictions using statistical models, and introduce and demonstrate a specific implementation in Splus that
facilitates the process. We put forward GRASP as a new name encapsulating an existing concept that aims at making
spatial predictions using generalized regression analysis. Regression modeling is used to establish relationships between
a response variable and a set of spatial predictors. The regression relationships are then used to make spatial predictions
of the response. The GRASP process requires point measurements of the response, as well as regional coverages of
predictor variables that are statistically (and preferably causally) important in determining the patterns of the response.
This approach to spatial prediction is becoming more commonplace, and it is useful to define it as a general concept.
For instance, GRASP could use a survey of the abundance of a species (the response), and existing spatial coverages of
environmental (e.g. climate, landform) variables (the predictors) for a region. A multiple regression can be used to
establish the statistical relationship between the species abundance and the environmental variables. These regression
relationships can then be used to predict the species abundance from the environmental surfaces. This process defines
relationships in environmental space and uses these relationships to predict in geographic space. We introduce GRASP
(the implementation) as an interface and collection of functions in Splus designed to facilitate modern regression
analysis and the use of these regressions for making spatial predictions. GRASP standardizes the modeling process and
makes it more reproducible and less subjective, while preserving analysis flexibility. The set of functions provides a
toolbox that allows quick and easy data checking, model building and evaluation, and calculation of predictions. The
current version uses generalized additive models (GAMs), a modern non-parametric regression technique the
advantages of which are discussed. We demonstrate the use of the GRASP implementation to model and predict the
natural distributions of two components of New Zealand fern biodiversity: (1) the natural distribution of an icon
species, silver fern (Cyathea dealbata ); and (2) the natural pattern of total fern species richness. Key steps are
demonstrated, including data preparation, options setting, data exploration, model building, model validation and
interpretation, and spatial prediction.
# 2002 Elsevier Science B.V. All rights reserved.
Keywords: Biodiversity; Sustainable management; Spatial prediction; Generalized additive models; Geographic information systems;
New Zealand; Ferns
Abbreviations: CART, classification and regression trees; GAM, generalized additive models; GIS, geographic information system;
GLM, generalized linear models; Logistic R, logistic regression; LSR, least square regression; ROC, receiver operating characteristics.
* Corresponding author. Present address: Swiss Centre for Faunal Cartography (CSCF), Terreaux 14, CH-2000 Neuchatel,
Switzerland. Fax: �/41-32-717-7969
E-mail address: [email protected] (A. Lehmann).
Ecological Modelling 157 (2002) 189�/207
www.elsevier.com/locate/ecolmodel
0304-3800/02/$ - see front matter # 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 3 0 4 - 3 8 0 0 ( 0 2 ) 0 0 1 9 5 - 3
1. Introduction
Conservation management requires information
on a diverse range of environmental attributes,
including species distributions, biodiversity values,
natural ecosystem potential, human uses, threats
and risks. Often this information takes the form of
spatial predictions (e.g. maps) that present the best
estimates of these attributes across the landscape.
Ideally, these spatial predictions are produced
using rigorous methods for integrating and gen-
eralizing underlying information.
A diverse range of ecosystem characteristics
have been measured and predicted from their
main environmental drivers. These include species
distribution (Bio et al., 1998; Franklin, 1998;
Leathwick, 1998; Lehmann, 1998), richness hot-
spots (Lehmann et al., 2002), soil characteristics
(McKenzie and Ryan, 1999), forest and agricul-
tural productivity, and tourism attraction. The
resulting spatial predictions of ecosystem charac-
teristics can be integrated further. Species richness
and natural community composition can, for
instance, be estimated by summing several species
predictions (Guisan and Theurillat, 2000; Leath-
wick, 2002; Lehmann et al., 2002). Alternatively,
biotic domains can be defined from the classifica-
tion of predicted species distribution (Peters and
Thackway, 1998; Overton et al., 2000; Leathwick,
2001). This approach of vegetation ecology has
proved to be very valuable to biodiversity research
(Austin, 1999a) with good examples of applica-
tions from both Australia (e.g. Ferrier et al.,
2002a,b) and New Zealand (e.g. Overton et al.,
2002).
Methods used to make spatial predictions
should meet several criteria:
1) They should be general enough to deal with
the wide variety of attributes that need to bepredicted.
2) They should be rigorous and data-defined, to
make predictions in an objective and defen-
sible manner.
3) They should be standardized to produce uni-
form results, and streamlined to facilitate the
required analyses.
The general concept of generalized regressionanalysis and spatial prediction (GRASP), pre-
sented in the following section, fulfils the first
two criteria, where its implementation, also pre-
sented in this paper, contributes to the final
criterion by allowing quick and easy predictions
of several responses at once. We intentionally
define GRASP as a duality*/a general method,
and a specific implementation.
2. Generalized regression analysis and spatialprediction
GRASP plays two important roles in quantita-
tive ecology and ecosystem management. First, it
develops statistical relationships between a vari-
able of interest (e.g. species distribution or rich-
ness) and environmental variables providing
descriptions of biodiversity patterns (e.g. climate,soil characteristics). Second, it provides a method
for making strictly data-defined spatial predictions
using point measurements of the attributes. The
concept of GRASP has been used in numerous
ecological applications (Table 1), including recent
ones that have used the GRASP implementation
demonstrated here, (e.g. Cawsey et al., 2002;
Lehmann et al., 2002; Overton et al., 2002; Rayet al., 2002; Zaniewski et al., 2002).
2.1. GRASP compared with other approaches
GRASP is one of a number of possible methods
for making spatial predictions from point mea-
surements (e.g. Guisan and Zimmermann, 2000).
Different approaches to spatial prediction have
different situations in which they are most suita-
ble. Some GRASP characteristics that set it apart
from other methods of spatial prediction are:
1) GRASP fits response surfaces as a function of
predictors in environmental space, and thenuses the spatial pattern of the predictor
surfaces to predict the response in geographic
space.
2) GRASP uses statistical models, rather than
mechanistic models, to determine the relation-
ship between the response and the predictors.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207190
3) GRASP uses regression models to determinethe statistical relationship between the re-
sponse and predictors.
The first of these differences distinguishes
GRASP from spatial prediction techniques such
as surface-fitting algorithms or from geostatistical
approaches such as kriging. These techniques
estimate surfaces directly in geographic space,
rather than in predictor space as is done in
GRASP. Surface fitting and geostatistical ap-
proaches are vital for the estimation of funda-
mental surfaces, such as elevation, or climate
surfaces, but have distinct limitations for other
purposes. In the example demonstrated here, we
predict the natural distribution of fern biodiversity
by estimating the relationships between fern dis-
tributions and environment using sites largely free
from human disturbance. These environmental
relationships are then used to predict for all areas
of New Zealand as if they were undisturbed by
humans. This provides an estimate of natural (in
the absence of human influence) fern biodiversity
of New Zealand. A similar approach could be used
to predict the likely effects of climate change on
natural distributions of species. Neither of these
applications would be possible without first esti-
mating a surface in environmental surface.
The use of statistical models to determine
relationships between the responses and the pre-
dictors distinguishes GRASP from more process-
based approaches to spatial prediction. For many
purposes, mechanistic models may provide more
robust predictions, but are much more difficult to
produce than statistical models, requiring a huge
amount of work to be carefully data-defined.
While this approach is sometimes used for ecosys-
tem processes like carbon sequestration or net
primary productivity, it is virtually impossible for
the distribution of species, something for which
GRASP is ideally suited. Process-based models
that will be used for spatial prediction require
spatial estimates of the input parameters (for
which GRASP could be used), and assumptions
that processes are spatially coherent and consis-
tent. Process-based models can have huge numbers
of parameters, with many parameters not rigor-
ously fitted to data, or fitted to data in a small
region. In this case, extrapolations outside data
with such a model is as dangerous as it is with a
statistical model. Furthermore, for the models to
be data defined still requires careful fitting of
parameters, which is eventually a statistical ex-
ercise. In this light, the distinction between statis-
tical models and mechanistic models for prediction
becomes less clear. Statistical and mechanistic
models should be viewed as complementary tools
in coordinated programs for observational, experi-
mental and theoretical researches.
Finally, the use of regression models to estimate
environmental relationships distinguishes the
GRASP approach from other approaches such
Table 1
Regression analyses used to make spatial predictions
Method Ys distribution Response curve
characteristics
Combination with
GIS
Statistical
references
Recent examples
LSR Gaussian Parametric, quadratic
(X, X2, X3...)
Substitution of Xs by
maps in GIS
(Draper and Smith,
1981)
(Lehmann et al., 1997;
Wolgemuth, 1998)
Logistic
R
Binomial Parametric, quadratic
(X, X2, X3...)
Substitution of Xs by
maps in GIS
(Hosmer and
Lemeshow, 1989)
(Narumalani et al., 1997)
GLM Gaussian, binomial,
Poisson. . .
Parametric, quadratic
(X, X2, X3...)
Substitution of Xs by
maps in GIS
(McCullagh and
Nelder, 1997)
(Guisan et al., 1998)
GAM Gaussian, binomial,
Poisson. . .
Non-parametric,
smoothed, any shape
Prediction in statistical
package, export to GIS
(direct or lookup table)
(Hastie and
Tibshirani, 1990)
(Bio et al., 1998;
Lehmann, 1998)
CART Gaussian Non-linear,
non-additive
Prediction in statistical
package
(Breiman et al.,
1984)
(Franklin, 1998; Iverson
and Prasad, 1998)
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207 191
as CCA (e.g. Guisan et al., 1998) or hyperdimensional niche modeling (e.g. Hirzel et al.,
2002; Zaniewski et al., 2002). These methods
define the species distributions in environmental
space in a multivariate sense. They do not differ
much from the GRASP general concept, but
differences exist in application. Neural network
models are closer in their applications to the
GRASP methodology described here (e.g. Brosseand Lek, 2000).
2.2. GAMs in ecology
Among possible multiple regression methods
(Table 1), we concentrate on GAMs for several
reasons. GAMs are non-parametric extension of
GLMs, which are themselves a generalization of
classical least square regression (LSR). WhileGLMs extend the application of classical regres-
sion into other statistical distributions (binomial,
Poisson, gamma, negative binomial), GAMs esti-
mate response curves with a non-parametric
smoothing function instead of parametric terms
(e.g. ax�/bx2). This allows exploration of shapes of
species response curves to environmental gradi-
ents, and allows the fitting of statistical models inbetter agreement with ecological theory (Austin,
1999b, 2002). GAMs are described in details in
(Hastie and Tibshirani, 1990) and are fitted here
within S-Plus (Chambers and Hastie, 1993). Other
analysis methods such as CART and ANN (see
Table 1) are also capable of fitting robust models
to ecological data and may be better suited for
modeling interactions. However, at this stage,these methods lose in ecological interpretability
by not allowing observation of response curve
shapes, and in statistical inference by not selecting
significant variables based on appropriate statis-
tical distributions.
GAMs have been used in numerous applications
to predict species distribution as a function of their
environment: by Yee and Mitchell (1991); incomparison with GLMs (e.g. Austin and Meyers,
1996; Bio et al., 1998; Franklin, 1998); in compar-
ison with ANNs (e.g. Brosse and Lek, 2000); to
assess the effect of environmental changes (e.g.
Leathwick et al., 1996; Lehmann, 1998); to assess
the effect of interspecific competition (e.g. Leath-
wick, 1998; Leathwick and Austin, 2001; Leath-wick, 2002); to predict species richness (e.g.
Leathwick et al., 1998; Lehmann et al., 2002);
and to predict community composition (e.g. Caw-
sey et al., 2002).
2.3. Purpose of the GRASP implementation
The GRASP implementation was primarily
designed to facilitate the production of spatial
predictions from sparsely distributed data frompoint surveys. It provides a tool and a user-
friendly interface to promote the use of spatial
predictions in environmental management (e.g.
Lehmann et al., 2002). It has been developed as
a graphical interface with underlying functions
(Lehmann et al., 1999), using S-PLUS v.4.5 and
above (Mathsoft Inc., Seattle, WA) under MS
WINDOWS environment. Up-to-date informationon GRASP can be found on the following website:
http://www.cscf.ch/grasp.
We present here the GRASP implementation in
more details with two spatial prediction examples:
(i) a species distribution, and (ii) total species
richness. Both examples use data on New Zealand
fern distribution (Lehmann et al., 2002).
3. Demonstration of GRASP
3.1. Fern distribution (response variables)
A selection of 19 875 RECCE plots (Allen, 1992)
were selected from the New Zealand National
Indigenous Vegetation Survey (NIVS, Wiser et al.,
2001) and data describing the presence or absence
of fern species were extracted. These plots were
collected between the 1950s and 1980s withinindigenous forests that were largely free from
anthropogenic disturbances. From the pool of
observed taxa, we selected 43 fern species (Leh-
mann et al., 2002) on the basis of their percentage
of occurrence and taxonomic criteria as potential
indicators of indigenous forest species diversity.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207192
3.2. Environmental information (spatial
predictors)
Climate surfaces predicting mean monthly pre-
cipitation, temperature, solar radiation and hu-
midity produced from climate station data
(Leathwick and Stephens, 1998) using thin-plate
splines (Hutchinson and Gessler, 1994), were used
to estimate climate parameters for each vegetation
plot. Derived predictors more directly related to
forest plant physiology and ecology were then
calculated as described in Leathwick and White-
head (2001), Leathwick and Austin (2001)). Mean
annual solar radiation (Sa) determines potential
site productivity, while mean annual temperature
(Ta), soil water deficit (W ), vapor pressure deficit
(VPD) and the lowest monthly ratio of rainfall to
potential evapotranspiration (R /E ) determine ac-tual productivity. Seasonality of both temperature
and solar radiation (Tw and Sw) indicate the
departure of winter values from the annual mean
and can be interpreted as opposites to continen-
tality gradients (Table 2).
The influence of landform was analyzed using
estimates of slope (S ), drainage (D ) and lithology
(L ) (Table 2). Relevant data were extracted from
the New Zealand Land Resource Inventory
(NZLRI) (Newsome, 1992). Additional work was
needed to convert descriptions of soil parent
material into units more relevant to the predictionof plant distributions, resulting in a variable
describing the distributions of 15 classes of soil
parent materials.
In the following sections, spatial predictions of
Cyathea dealbata and fern species richness are
made using the predictors presented in Table 2.
The GRASP method used in both cases is
described in five steps, and in more details in(Lehmann et al., 1999).
3.3. Data exploration
Exploratory analyses were performed before
starting the process of model selection in order
to investigate possible data problems such aspresence of outliers and correlation between pre-
dictors.
First the environmental envelopes of species
were calculated as the hyper-rectangle in environ-
mental space within the minimum and maximum
of the species distribution along each environmen-
tal variable. The environmental envelope of C.
dealbata excluded 8456 from the calibration dataset and 93 939 from the prediction data set. This
step was useful to avoid the influence of large
numbers of absences on each side of environmen-
tal gradients (Austin and Meyers, 1996), and
should result in more accurate estimation of the
Table 2
Environmental predictors used to model fern distribution of New Zealand improved from (Leathwick, 1998)
Abbreviation Variable name Mean (range)
Ta (8C) Mean annual temperature 9.3 (1.5�/15.9)
Tw (8C) Temperature seasonality 0.4 (�/3.2�/4.2)
Sa (MJ m�2 per
day)
Mean annual solar radiation 13.6 (11.7�/15.3)
Sw (MJ m�2 per
day)
Solar radiation seasonality 0.2 (�/0.7�/1.1)
VPD (%) October Vapor Pressure Deficit 0.25 (0�/0.57)
W (MPa days) Soil Water Deficit 0.59 (0.11�/2.13)
R /E Minimum rainfall to potential evapotranspiration 2.5 (0.5�/8.5)
S Slope 28.6 (1.5�/40)
D Drainage: V�/very good; G�/good; I�/intermediate; P�/poor; VP�/very poor
L Lithology: Metamorphic: MG�/Gneiss, Granite, MS�/Shist Plutonic: PD�/Diorite,
PG�/Gabbro, PU�/Ultramafic Quaternary: QA�/Alluvium, QL�/Loess, QO�/Organic,
QS�/Sand Sedimentary: SL�/Limestone, SS�/Strong, SW�/Weak Volcanic:VA�/Andesite,
VB�/Basalt, VR�/Rhyolite
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207 193
additive model, and substantially speed up the
modeling process.
We mapped the presence�/absences of C. deal-
bata to check their spatial distribution and to
identify geographic outliers in the data. Such
outliers could have been erroneous data, such as
field misidentifications, or typographical errors.
The map of C. dealbata showed its distribution in
the North Island and in the northern tip of the
South Island (Fig. 1). A few dubious observation
points occurring along the West Coast of the
South Island were kept in the analysis because
the effect of these possible outliers was compen-
sated by a large number of absences. The clustered
distribution of all vegetation plots reflected the
underlying unevenness in the sampling of plots
contained within the National Indigenous Vegeta-
tion Survey database. This distribution corre-
sponded closely to the distribution of remnant
forests in New Zealand covering approximately
26% of the total land area.
We then described the environmental space
occupied by species with specially designed histo-
grams (Fig. 2) and scatter plots of response versus
predictors (Fig. 3). C. dealbata was found on
warmer sites, with high solar radiation, rather dry
air and soil conditions, low ratio of rainfall to
evapotranspiration, average slope, and well-
drained soils formed on sedimentary and volcanic
parent materials.
Finally, correlations between the chosen pre-
dictors were calculated to allow removal of
correlated predictors. This was useful because
correlated predictor variables can cause trouble
in estimating additive surfaces. The highest corre-
lation (r�/0.793) between predictors was observed
between VPD and W (Fig. 4). However, this
correlation was not high enough to justify remov-
ing one variable or another from the modeling
process. Note that seasonality measures (Tw and
Sw) were not correlated at all with their original
variables (Ta and Sa). These measures of season-
ality allowed taking into account extreme events
such as low winter temperature without adding a
variable like minimum winter temperature that
would have been highly correlated with Ta.
3.4. Model selection
For C. dealbata presence�/absence data, a quasi-
binomial model was chosen, whereas for richness,
a quasi-Poisson model was most appropriate. In
both cases, a stepwise procedure was used to select
significant predictors. A starting model including
all continuous predictors smoothed with 4 degrees
of freedom was first fitted. The significance ofeither dropping smooth terms or converting them
to a linear form was then tested using an analysis
of variance (ANOVA; F-test) for quasi models. At
each step, the less significant change was kept and
serves as starting point for the next step.
The stepwise selection of statistically significant
environmental predictors for C. dealbata distribu-
tion (1) and fern richness (2) selected the followingmodels:
1) C. dealbata probability: s (Ta,4)�/s(Tw,4)�/
s(Sa,4)�/s(Sw,4)�/s(VPD,1)�/s (W ,4)�/
s(R.E,4)�/s (S ,4)�/L.
2) Species richness: s(Ta,4)�/s(Tw,4)�/s (Sa,4)�/
s(Sw,4)�/s(VPD,4)�/s(W ,4)�/s (R.E,4)�/
s(S ,1)�/L�/D .
where s�/spline smoother, 4 and 1 are degrees of
freedom for the spline smoother.
Most variables were kept in both models. Only
D was removed from the Cyathea model, whileVPD was reduced to its linear form. Only S was
reduced to its linear form in the richness model.
The dispersion parameter estimated by the quasi-
binomial model is 1.005 and 1.576 for the Poisson
model. In this case, the choice of using quasi
models, which estimate the dispersion parameter,
instead of using a default value of 1, was only
justified for the Poisson model, and did not changethe analysis outcome for the binomial model.
3.5. Model validation
The selected models were evaluated by two
methods. The first method was a plot of the
observed response values against the values pre-
dicted by the model. The second method was a
cross-validation of the model. Correlation between
the actual and predicted values was then calculated
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207194
Fig. 1. Spatial distribution of (A) all 19 875 vegetation plots and (B) presences of C . dealbata from the New Zealand National Indigenous Vegetation Survey.
A.
Leh
ma
nn
eta
l./
Eco
log
ical
Mo
dellin
g1
57
(2
00
2)
18
9�
/20
71
95
to assess the goodness of fit of Poisson model,
whereas the ROC test (Fielding and Bell, 1997)
was used for binomial model.
Both the cross-validation and the simple valida-tion of the Cyathea model presented high ROC
values (ROC�/0.94) with no difference between
them, indicating good model stability (Fig. 5). The
correlation between observed and predicted spe-
cies richness was relatively high (r�/0.74 and 0.75,
respectively), corresponding to more than half
(r2�/0.56) the variation in species richness ex-
plained by the model (Fig. 5).
3.6. Model interpretations
Three analyses were used to help interpret and
understand the regression models, including: (1)
plots of the regression model, (2) plots of response
curves that result from the model, and (3) graphs
of the overall contribution of the variables to the
model. The first two allowed the visualization of
how the response variable varies as a function of
the predictor variables, while the third allowed the
assessment of the relative importance of the
predictor variables in explaining the variation in
the response variable.
Inspection of graphs showing the contribution
of each predictor to the Cyathea model confirmed
the dominant role of Ta. When dropping each
predictor from the final model, Ta was the only
variable whose contribution could not be compen-
sated for by a combination of other variables. The
alone contribution showed the potential of each
variable to explain Cyathea distribution. Here, the
Fig. 2. Environmental distribution of C . dealbata in the New Zealand National Indigenous Vegetation Survey. Entire histogram bars
represent the distribution of all plots, where darker areas represent the number of plots occupied by the species in each bar. This
number is also written on top of each bar. The plain line is the ratio between occupied and not occupied plots and the dashed lines
correspond to the overall mean proportion of occupied plots.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207196
Fig. 3. Plots of responses vs. predictors for (A) C . dealbata presence�/absences, and (B) fern richness. A smooth line is added to the
graph.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207 197
picture was quite different, with the principal
contributions ranked in the following order: Ta,
Sa, RE, L , VPD, W (Fig. 6).
One of the central parts of the interpretation of
GAM models was the description of the predic-
tor’s partial response curves. Here, C. dealbata is
characterized by a strong positive response to Ta
and Sa, by rather warm winter temperatures (high
Tw), by negative responses to two gradients of
water deficit (VPD, W ), and by higher rainfalls
(R /E ) (Fig. 7). Species richness was predicted to be
higher in productive environments characterized
by high Ta and low Sa, extreme temperature
seasonality, low solar radiation, high solar radia-
tion seasonality, and high humidity gradients
(VPD, W , R /E ; Fig. 7).
The plot of combined response curves for C.
dealbata and another modeled species (Dicksonia
squarrosa ) was characterized by a small difference
in response to Ta and an opposite response to Sa
and R /E (Fig. 8). Other gradients played relatively
small effects when all other variables were set to
their optimum values. These combined response
plots proved to have a great potential to study
competitive interactions between species (e.g.
Leathwick, 2002).
Fig. 4. Predictors correlation matrix.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207198
3.7. Spatial predictions
We implemented two methods for making
predictions in GRASP, depending on the number
of points for which the predictions have to be
made. In the present examples, we are predicting
on 250 000 km2 for all New Zealand. Since it is a
manageable number of pixels, the predictions are
made within GRASP in Splus. With larger num-
bers of points, regression models are exported to
lookup tables and predictions are made in Arcview
as first proposed by Ferrier (pers. communica-
tion).
C. dealbata ’s natural distribution (Fig. 9a) was
predicted to be higher in the North Island on low,
warm and humid lands, and lower on its higher
and colder center. This species was predicted to
occur only on the northern tip of the South Island.
Species richness prediction (Fig. 9b) indicated that
the highest richness should occur in central and
southern parts of the North Island, and in westernand southern South Island.
4. Discussion
GRASP fits a response surface as a function of
environmental space, and then predicts in geo-
graphic space by using the spatial patterns of the
environmental surfaces. This process has two main
purposes and a number of advantages illustrated
by our example of predicting the natural fern
species richness (see also Lehmann et al., 2002).
First, the large-scale relationship between bioticcharacteristics and the environment are described,
such as species richness in relation to climate and
soil characteristics (Fig. 7b). This is a powerful
tool in gaining ecological insight or guiding
management activities. The observed relationships
may suggest hypotheses on the ecological drivers
Fig. 5. Cross validation of (A) C . dealbata model, (B) fern richness model.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207 199
of fern biodiversity, or more appropriate ways to
manage fern biodiversity hotspots. As such, these
statistical relationships form an important role in a
coordinated research program for observational,
experimental and theoretical research. Biotic inter-
actions such as competition can also be integrated
and studied using these relationships (e.g. Leath-
wick, 2002).
Second, spatial predictions can be made by
using the modeled relationships to interpolate
within the environmental space defined by the
available data (Fig. 9a) and, with caution, to
extrapolate in time or space out of this environ-
mental envelope. Spatial predictions of species
distributions can however be used in many differ-
ent ways as presented by Overton et al. (2002) in
their information pyramids for informed biodiver-
sity conservation, for instance to define biodiver-sity hotspots (e.g. Lehmann et al., 2002), to plan
natural reserve networks (e.g. Ferrier et al.,
2002a), or to classify habitats as described in
Ferrier et al. (2002b).
While some limitations on the GRASP process
are difficult to avoid (data availability and quality,
correlative instead of causal relationships, assump-
tion of system in equilibrium), several improve-ments can be expected along the different steps of
the modeling process.
4.1. Data requirements and limitations
The availability of adequate biological and
environmental data for GAM analysis and spatial
Fig. 6. Contributions of selected predictors in modeling C . dealbata distribution, (A) when dropped from the final model, (B) on their
own.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207200
Fig. 7. Partial response curves of (A) C . dealbata and (B) species richness models.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207 201
Fig. 8. Combined response curves of two tree ferns of New Zealand, C. dealbata and D . squarrosa .
A.
Leh
ma
nn
eta
l./
Eco
log
ical
Mo
dellin
g1
57
(2
00
2)
18
9�
/20
72
02
predictions is constrained by several factors (Bio,
2000):
. heterogeneous data distribution in geographic
and environmental space;
. undefined data representation of the modeled
process (e.g. direct vs. indirect gradients; Austin
and Graywood, 1994; Austin, 1999b);
. inadequate geographic scale, as patterns and
relationships vary with scales (Franklin, 1995);
. inaccurate measurements of responses and/or
predictor variables;
. inaccurate measurements of geographic posi-
tion;
. spatial and temporal data autocorrelation.
According to these criteria, adequate data for
plant or animal distribution modeling are rare in
most countries. Available data often come from
herbaria or collections and do not meet the above
listed criteria (Hirzel et al., 2002; Zaniewski et al.,
2002). When data seem available in good quantity,
their quality is often dubious due to identification
problems, lack of survey consistency and inap-
propriate sampling strategy.
In most studies, the effect of data auto-correla-
tion on sample independence is neglected (Bio,
2000) due to the lack of understanding of its
impact on parameters estimations, residuals dis-
tribution, model selection and evaluation, and
spatial predictions.Environmental predictor data quality is also an
important concern even though methods and
availability seem more satisfactory in this area.
While most countries have climatic stations and
geological maps from which estimates of environ-
mental predictors can be obtained, the difficulty
here is in interpolation or class boundary accu-
racy. Improvements in environmental information
at finer scales are expected in most countries from
Fig. 9. Spatial predictions of (A) C . dealbata distribution and (B) fern species richness of New Zealand.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207 203
digital elevation models at low-resolution (20 m)and from remote sensing information.
In summary, data requirements for GRASP are
principally precise response and predictor mea-
surements, a good spatial cover, and precise
geographic coordinates based on a defined sam-
pling strategy. Even though GRASP analysis
cannot improve the existing data, it can help
identify data outliers and needs for new datacollection based on adequate sampling strategies.
GRASP is presently addressing these problems
only by descriptive exploratory methods which
allows the user to get a better understanding of his
data before modeling it in practice, most of the
modeling efforts mentioned in this paper have
revealed that the information contained in imper-
fect available data was still capable of generatingsensible and interpretable results.
4.2. Evolution of GRASP
While the GRASP implementation relies on
well-tested statistical methods currently used in
ecological studies, it has been designed to be
continuously updated and improved to meet user
requirements and ecological interpretation needs.
The next major improvements can be expected
with:
. analyses of spatial residuals (Bio, 2000);
. methods for presence only data (Zaniewski etal., 2002);
. methods for species competition analyses
(Leathwick, 2002);
. all possible model selection (Ray et al., 2002);
. choice of optimal smoother degrees of freedom
(Wood, 2000);
. integration of predictor interactions;
. improved cross-validation and variable selec-tion;
. more accurate error estimation.
Even though GAM analyses seem to meet most
species’ modeling requirements, we wish to inte-
grate generalised linear models (GLM) and neural
networks (ANN) as comparative modeling meth-
ods in GRASP. GLMs present the advantage of
parametrically defined methods with robust theo-
retical background associated with a larger num-ber of analytical tools. ANNs are non-linear
methods that could be preferred when quick and
accurate predictions are more important than
statistical and/or ecological interpretations.
Furthermore, ANNs might be more appropriate
to handle interactions between predictors, and can
therefore be used to assess the amount of informa-
tion that was possibly not captured with GAMs.More sophisticated evolutions, such as parallel
modeling of two or more species at the same time,
are being developed and are already available on
dedicated software (Yee and Mackenzie, 2002).
4.3. Required validation
Validation is one of the crucial parts of themodeling process. It can be achieved by comparing
observed and predicted response on an indepen-
dent data set (Guisan et al., 1998) or by cross-
validation when a data set can be split in several
subsets (Lehmann et al., 2002). It is not clear
whether independent datasets are really preferable,
as is generally claimed (Guisan and Zimmermann,
2000). We think that cross-validation is generallymore practical because it creates random subsets
relatively independent and allows the use of all
available data in the modeling process. By using
entirely independent datasets, we risk comparing
different sampling strategies instead of evaluating
a model.
Methods to assess model quality are still under
development and depend on the distribution of thedata. For binomial models, several methods are
available (Fielding and Bell, 1997) but three
methods are generally used: Kappa statistics (e.g.
Lehmann, 1998); Hosmer and Lemeshow test (e.g.
Bio et al., 1998); ROC techniques (e.g. Lehmann et
al., 2002). Our experience with these methods
favors the use of ROC statistics because, unlike
Kappa, the ROC method avoids the problem ofchoosing a threshold value and, unlike Hosmer
test, it avoids the problem of choosing a number of
groups or a sorting method. With Poisson or
Gaussian models, simple correlation measures
between observed and predicted values are very
efficient.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207204
Model validation is also commonly made on
spatial predictions themselves. The plotting of
point observations on top of predicted values to
see how geographical patterns are reproduced is a
good empirical method to judge model quality.
Additional tests of spatial autocorrelation of
residuals (e.g. Bio, 2000) are also possible to assess
whether the model was unable to capture spatial
patterns independent of environmental predictors.
4.4. Possible interpretations
Regression models rely on correlation and not
causal relationships. Nevertheless, the interpreta-
tion of species models brings valuable information
on species distribution along direct environmental
gradients, and helps formulate new hypotheses on
the respective rules of abiotic and biotic factors
(dispersal, competition). To diminish the risks of
misinterpretation, we carefully chose direct envir-
onmental gradients (Austin and Smith, 1989) that
must have some physiological meanings for species
growth or distribution. One method of examining
the effect of dispersal would be to model the
contribution of the environment, and then attempt
to explain the residuals by Easting, Northing and/
or their interaction. Similarly, competition issues
can be addressed by including in the environment
data set the information on the distribution of a
competing species (Leathwick, 1998; Leathwick
and Austin, 2001).
Among available methods for predictions (e.g.
GLM, GAM, ANN), GAM seemed preferable to
us because of the ecological interpretability of its
non-parametric response curves. It has also the
advantage of being statistically well defined allow-
ing for good inference but to be flexible enough to
fit the data closely. It appears as a good compro-
mise between GLM on one side and ANN on the
other one. Improvements are needed however to
reach a statistical method that would agree fully
with our understanding of ecological complexity
(Austin, 2002), taking into account species inter-
actions, spatial auto-correlation, and predictors
interactions.
4.5. Improved predictions
GRASP first models a species’ realized environ-
mental niche, and then it predicts its natural
spatial distribution in geographical space. This
explains why care must be taken using these
models to predict outside the present environmen-
tal envelope of one species either in space or time.
This type of statistical model also relies on the
acceptance of an equilibrium state that does not
exist in nature, but that can be accepted at a given
time and at a given scale. Once admitted, this
equilibrium greatly simplifies the modeling ap-
proach and allows the production of the quick and
accurate predictions needed for management pur-
poses.
A drawback of GAM, and other non-parame-
trical methods such as ANN, is the difficulty to
export a model out of the software package where
it was fitted. This is particularly a problem when
the aim is to build a spatial prediction in a GIS.
Large predictions data sets will become increas-
ingly common especially with the need for envir-
onment managers to use predictions at a fine
resolution. Making spatial predictions at a 20 m
resolution for instance can be achieved at a local
scale on a limited area but is presently difficult on
larger regions. Going from 1000 m to 20 m
resolution requires 2500 times more memory.
Two solutions exist for non-parametrical methods
to build spatial predictions, (i) with smaller
datasets, calculate the predictions directly within
the statistical package using a prediction environ-
mental data set, and export them to the GIS, (ii)
with larger datasets, use a lookup table to describe
the response curves for each variable along a
reduced number of values, and use the obtained
lookup table in a GIS to reclassify the predictors
accordingly to their contribution to the model.Further improvements in building large size
spatial predictions from GAMs should include
the possibility of using bivariate smoothed pre-
dictors such as geographical coordinates, possible
interactions between predictors and spatial auto-
correlation between responses. All these para-
meters cannot be included in univariate lookup
tables at the moment.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207 205
5. Conclusions
The ability of generalised additive models to
model data following different types of distribu-
tion (normal, binomial, Poisson, . . .), the possibi-
lity of combining continuous and factor
explanatory variables, as well as the data-
driven shape of response curve are all important
characteristics in ecological modeling. These
reasons motivated us to develop GRASP functions
to automate the process of modeling and predict-
ing. The main advantage of this auto-
mation is the quick and easy update of models
and predictions with new data or new
models specifications. This also allows the model-
ing process to be standardized for several
responses at once, and therefore, to makes
it more reproducible and less subjective. Allowing
editing and adapting functions or models
keep flexibility for any special cases and at
any stage. Furthermore, the user-friendly inter-
face of GRASP has proven to bring more
scientists into the practice of making
spatial predictions, which should help im-
proving this field of ecological modeling for better
applications in environmental management.
Acknowledgements
The data used in this study was drawn from the
National Vegetation Survey database, and was
mainly collected by staff of the former New
Zealand Forest Service whose efforts are gratefully
acknowledged. This work benefited greatly from
the comments and help of Mike Austin, Simon
Barry, Antoine Guisan, Margaret Cawsey, Emma-
nuel Castella, Thomas Yee and Geoff Pegler. We
acknowledge the support of the Foundation for
Research, Science and Technology under contract
number CO9642: a methodology for the selection
of biodiversity indicators. We also acknowledge
the contribution of the Swiss National Science
Foundation to Anthony’s postdoctoral project.
References
Allen, R.B., 1992. RECCE an Inventory Method for Describing
New Zealand Vegetation. Forest Research Institute,
Christchurch, p. 25.
Austin, M.P., 1999a. The potential contribution of vegetation
ecology to biodiversity research. Ecography 22, 465�/484.
Austin, M.P., 1999b. A silent clash of paradigms: some
inconsistencies in community ecology. Oikos 86, 170�/178.
Austin, M.P., 2002. Spatial prediction of species distribution:
an interface between ecological theory and statistical
modeling. Ecol. Model. 157 (2�/3), 101�/118.
Austin, M.P., Graywood, M.J., 1994. Current problems of
environmental gradients and species response curves in
relation to continuum theory. J. Veg. Sci. 5, 473�/482.
Austin, M.P., Meyers, J.A., 1996. Current approaches to
modeling the environmental niche of eucalypts: implication
for management of forest biodiversity. Forest Ecol. Man-
age. 85, 95�/106.
Austin, M.P., Smith, T.M., 1989. A new model for the
continuum concept. Vegetation 83, 35�/47.
Bio, A.M.F., 2000. Does vegetation suit our models? Data and
model assumptions and the assessment of species distribu-
tion in space. Ph.D. thesis, Utrecht University.
Bio, A.M.F., Alkemade, R., Barendregt, A., 1998. Determining
alternative models for vegetation response analysis: a non
parametric approach. J. Veg. Sci. 9, 5�/16.
Breiman, L.J., Freidman, J., Olshen, R., Stone, C., 1984.
Classification and Regression Trees. Wadsworth, Belmont,
CA.
Brosse, S., Lek, S., 2000. Modelling roach (Rutilus rutilus )
microhabitat using linear and nonlinear techniques. Fresh-
water Biol. 44, 441�/452.
Cawsey, E.M., Austin, M.P., Baker, B.L., 2002. Regional
vegetation mapping in Australia: a case study in the
practical use of statistical modeling. Biodivers. Conserv.
11, 2239�/2274.
Chambers, J.M., Hastie, T.J., 1993. Statistical Models. Chap-
man & Hall, London, p. 608.
Draper, N.R., Smith, H., 1981. Applied Regression Analysis.
Wiley, New York, p. 709.
Ferrier, S. Drielsma, M., Manion, G., Watson, G. 2002b.
Extended statistical approaches to modeling spatial pattern
in biodiversity: the north-east New South Wales experience.
II. Community-level modeling, 11, 2309�/2338.
Ferrier, S., Watson, G., Pearce, J. Drielsma, M., 2002b.
Extended statistical approaches to modeling spatial pattern
in biodiversity: the north-east New South Wales experience.
I. Species-level modeling, 11, 2275�/2307.
Fielding, A.H., Bell, J.F., 1997. A review of methods for the
assessment of prediction errors in conservation presence/
absence models. Environ. Conserv. 24, 38�/49.
Franklin, J., 1995. Predictive vegetation mapping: geographical
modeling of biospatial patterns in relation to environmental
gradients. Prog. Phys. Geogr. 19, 474�/499.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207206
Franklin, J., 1998. Predicting the distribution of shrub species
in southern California from climate and terrain-derived
variables. J. Veg. Sci. 9, 733�/748.
Guisan, A., Theurillat, J.-P., 2000. Equilibrium modeling of
alpine plant distribution and climate change: how far can we
go? Phytocoenologia 30, 353�/384.
Guisan, A., Zimmermann, N.E., 2000. Predictive habitat
distribution models in ecology. Ecol. Model. 135, 147�/186.
Guisan, A., Theurillat, J.-P., Kienast, F., 1998. Predicting the
potential distribution of plant species in an alpine environ-
ment. J. Veg. Sci. 9, 65�/74.
Hastie, T.J., Tibshirani, R.J., 1990. Generalized Additive
Models. Chapman & Hall, London, p. 335.
Hirzel, A.H., Hausser, J., Perrin, N., 2002. Ecological-niche
factor analysis: how to compute habitat-suitability maps
without absence data? Ecology, in press.
Hosmer, D.W.J., Lemeshow, S., 1989. Assessing the fit of the
model. In: Applied Logistic Regression. Wiley, New York.
Hutchinson, M.F., Gessler, P.E., 1994. Splines*/more than just
a smooth interpolator. Geoderma 62, 45�/67.
Iverson, L.R., Prasad, A., 1998. Estimating regional plant
biodiversity with GIS modeling. Divers. Distrib. 4, 49�/61.
Leathwick, J.R., 1998. Are New Zealand’s Nothofagus species
in equilibrium with their environment. J. Veg. Sci. 9, 719�/
732.
Leathwick, J.R., 2001. New Zealand’s potential forest pattern
as predicted from current species-environment relationships.
New Zealand J. Botany 39, 447�/464.
Leathwick, J.R., Austin, M.P., 2001. Competitive interactions
between tree species in New Zealand’s old growth indigen-
ous forests. Ecology 82, 2560�/2573.
Leathwick, J.R., Stephens, R.T.T., 1998. Climate surfaces for
New Zealand. Contract Report LC9798/126, Landcare
Research, Hamilton.
Leathwick, J.R., Whitehead, D., 2001. Soil and atmospheric
water deficits, and the distributions of New Zealand’s
indigenous tree species. Funct. Ecol. 15, 233�/242.
Leathwick, J.R., Burns, B.R., Clarkson, B.D., 1998. Environ-
mental correlates of tree alpha-diversity in New Zealand
primary forests. Ecography 21, 235�/246.
Leathwick, J.R., 2002. Incorporating the effects of inter-specific
competition when modeling species distributions at land-
scape scales. Biodivers. Conserv., in press.
Leathwick, J.R., Whitehead, D., McLeod, M., 1996. Predicting
changes in the composition of New Zealand’s indigenous
forests in response to global warning: a modeling approach.
Environ. Software 11, 81�/90.
Lehmann, A., 1998. GIS modeling of submerged macrophyte
distribution using generalized additive models. Plant Ecol.
139, 113�/124.
Lehmann, A., Jaquet, J.-M., Lachavanne, J.-B., 1997. A GIS
approach of aquatic plant spatial heterogeneity in relation
to sediment and depth gradient, Lake Geneva, Switzerland.
Aquat. Bot. 347�/361.
Lehmann, A., Overton, J.McC., Leathwick, J.R., 1999.
GRASP: Generalized Regression Analysis and Spatial
Predictions, User’s manual. Landcare Research, Hamilton,
New Zealand.
Lehmann, A., Leathwick, J.R., Overton, J.McC., 2002. Asses-
sing New Zealand fern diversity from spatial predictions of
species assemblages. Biodivers. Conserv. 11, 2217�/2238.
McCullagh, P., Nelder, J.A., 1997. Generalized Linear Models.
Monographs on Statistics and Applied Probability. Chap-
man & Hall, London, p. 511.
McKenzie, N.J., Ryan, P.J., 1999. Spatial prediction of soil
properties using environmental correlation. Geoderma 89,
67�/94.
Narumalani, S., Jensen, J.R., Althausen, J.D., Burkhalter, S.G.,
Mackey, H., 1997. Aquatic macrophyte modeling using GIS
and logistic multiple regression. Photogramm. Eng. Remote
Sens. 63, 41�/49.
Newsome, P.F.J., 1992. New Zealand Resource Inventory.
ARC/INFO data manual, DSIR.
Overton, J.McC., Leathwick, J.R., Lehmann, A., 2000. Predict
first, classify later*/a new paradigm of spatial classification
for environmental management: a revolution in the map-
ping of vegetation, soil, land cover, and other environmental
information. In: Fourth International Conference on Inte-
grating GIS and Environmental Modeling (GIS/EM4),
Banff, Alta., Canada.
Overton, J.McC., Leathwick, J.R., Stephens, R.T.T., Lehmann,
A., 2002. Information pyramids for informed ecosystem
management. Biodivers. Conserv. 11, 2093�/2116.
Peters, D., Thackway, R., 1998. A New Biogeographic Regio-
nalisation for Tasmania. Tasmanian Parks and Wildlife
Service GIS Section.
Ray, N., Lehmann, A., Joly, P., 2002. Modelling spatial
distribution of amphibian populations: a GIS approach
based on habitat matrix permeability. Biodivers. Conserv.
11, 2243�/2265.
Wiser, S.K., Bellingham, P.J., Burrows, L.E., 2001. Managing
biodiversity information: development of New Zealand’s
National Vegetation Survey databank. New Zealand J.
Ecol. 25, 1�/17.
Wolgemuth, T., 1998. Modelling floristic richness on a regional
scale: a case study in Switzerland. Biodivers. Conserv. 7,
159�/177.
Wood, S.N., 2000. Modelling and smoothing parameter
estimation with multiple quadratic penalties. J. R. Stat.
Soc. 62, 413�/428.
Yee, T.W., Mitchell, N.D., 1991. Generalized additive models
in plant ecology. J. Veg. Sci. 2, 587�/602.
Yee, T.W., Mackenzie, M., 2002. Vector generalized additive
models plant ecology. Ecol. Model. 157 (2�/3), 141�/156.
Zaniewski, A.E., Lehmann, A., Overton, J.McC., 2002. Pre-
dicting species distribution using presence-only data: a case
study of native New Zealand ferns. Ecol. Model. 157 (2�/3),
259�/278.
A. Lehmann et al. / Ecological Modelling 157 (2002) 189�/207 207