15
Ecological Applications, 17(1), 2007, pp. 266–280 Ó 2007 by the Ecological Society of America PREDICTIVE VEGETATION MODELING FOR CONSERVATION: IMPACT OF ERROR PROPAGATION FROM DIGITAL ELEVATION DATA KIMBERLY P. VAN NIEL 1,3 AND MIKE P. AUSTIN 2 1 School of Earth and Geographical Sciences, University of Western Australia, Crawley, Western Australia 6009, Australia 2 CSIRO Sustainable Ecosystems GPO Box 284, Canberra, Australian Capitol Territory 2601 Australia Abstract. The effect of digital elevation model (DEM) error on environmental variables, and subsequently on predictive habitat models, has not been explored. Based on an error analysis of a DEM, multiple error realizations of the DEM were created and used to develop both direct and indirect environmental variables for input to predictive habitat models. The study explores the effects of DEM error and the resultant uncertainty of results on typical steps in the modeling procedure for prediction of vegetation species presence/absence. Results indicate that all of these steps and results, including the statistical significance of environmental variables, shapes of species response curves in generalized additive models (GAMs), stepwise model selection, coefficients and standard errors for generalized linear models (GLMs), prediction accuracy (Cohen’s kappa and AUC), and spatial extent of predictions, were greatly affected by this type of error. Error in the DEM can affect the reliability of interpretations of model results and level of accuracy in predictions, as well as the spatial extent of the predictions. We suggest that the sensitivity of DEM-derived environmental variables to error in the DEM should be considered before including them in the modeling processes. Key words: Acmena smithii; Corymbia maculata; digital elevation model (DEM); error analysis; error propagation; generalized additive model (GAM); generalized linear model (GLM); geographic information system (GIS); predictive vegetation modeling; southern New South Wales, Australia; uncertainty. INTRODUCTION Predictive models of species spatial distribution are now widely used in conservation studies as exemplified by the encyclopaedic proceedings of Scott et al. (2002). Although errors in prediction are considered (O’Connor 2002, e.g., van Horne 2002), only one paper explicitly mentions error propagation from maps, Geographic Information System (GIS) layers, or remotely sensed data, and its potential impact on the statistical models obtained (Henebry and Merchant 2002). These authors (p. 295) state ‘‘effective techniques for ... understanding of error propagation in GIS are still rudimentary but experience is rapidly increasing.’’ In this paper we present an evaluation of the impact of error propagation from errors in a digital elevation model (DEM) on all stages in the fitting of statistical regression models for predicting the distribution of species and their potential habitats. The purpose is to understand how error and its propagation affect final results in predictive modeling and to consider options that will reduce the impact on modeling results. Error can enter into the predictive modeling process by a number of pathways: spatial inaccuracies in the sampling points, weaknesses in the assumptions or calculations developing the spatial extension of the environmental factors, the fit of the modeled relation- ship, and error that was propagated from any initial data sets. Like errors in other data sets, GIS data errors affect the reliability of the final outcomes of any model, but how they affect final outcomes is difficult to assess. In general, we can only calculate the uncertainty in model results due to input errors introduced into the modeling processes. Derivatives of a digital elevation model can have very high levels of propagated error (Fisher 1998, Holmes et al. 2000), yet most commonly employed environmental variables for predictive vegetation modeling are derived from a DEM but are not tested for problems (Henebry and Merchant 2002). Prevailing belief is that geo- morphometric (indirect) variables are expected to have less propagated error than more derived, direct variables (Guisan et al. 1998, Guisan and Zimmermann 2000), but Van Niel et al. (2004) have demonstrated that this is not true. They found that the level of propagated error from a DEM is more complex and depends on the method- ology of the data development and the nature of the study site. Van Horssen et al. (2002) considered the impact of error in kriged prediction surfaces on spatial predictions of vegetation, but in this case the indepen- dent and dependent variables were gathered at the same time at each field site, so there is no uncertainty in the relationship between the variables beyond model fit. They still found extensive impact of model fit and kriging error on the outcomes of spatial predictive Manuscript received 21 November 2005; revised 5 May 2006; accepted 9 May 2006. Corresponding Editor: J. S. Baron. 3 E-mail: [email protected] 266

PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

Ecological Applications, 17(1), 2007, pp. 266–280� 2007 by the Ecological Society of America

PREDICTIVE VEGETATION MODELING FOR CONSERVATION:IMPACT OF ERROR PROPAGATION FROM DIGITAL ELEVATION DATA

KIMBERLY P. VAN NIEL1,3

AND MIKE P. AUSTIN2

1School of Earth and Geographical Sciences, University of Western Australia, Crawley, Western Australia 6009, Australia2CSIRO Sustainable Ecosystems GPO Box 284, Canberra, Australian Capitol Territory 2601 Australia

Abstract. The effect of digital elevation model (DEM) error on environmental variables,and subsequently on predictive habitat models, has not been explored. Based on an erroranalysis of a DEM, multiple error realizations of the DEM were created and used to developboth direct and indirect environmental variables for input to predictive habitat models. Thestudy explores the effects of DEM error and the resultant uncertainty of results on typicalsteps in the modeling procedure for prediction of vegetation species presence/absence. Resultsindicate that all of these steps and results, including the statistical significance ofenvironmental variables, shapes of species response curves in generalized additive models(GAMs), stepwise model selection, coefficients and standard errors for generalized linearmodels (GLMs), prediction accuracy (Cohen’s kappa and AUC), and spatial extent ofpredictions, were greatly affected by this type of error. Error in the DEM can affect thereliability of interpretations of model results and level of accuracy in predictions, as well as thespatial extent of the predictions. We suggest that the sensitivity of DEM-derivedenvironmental variables to error in the DEM should be considered before including them inthe modeling processes.

Key words: Acmena smithii; Corymbia maculata; digital elevation model (DEM); error analysis; errorpropagation; generalized additive model (GAM); generalized linear model (GLM); geographic informationsystem (GIS); predictive vegetation modeling; southern New South Wales, Australia; uncertainty.

INTRODUCTION

Predictive models of species spatial distribution are

now widely used in conservation studies as exemplified

by the encyclopaedic proceedings of Scott et al. (2002).

Although errors in prediction are considered (O’Connor

2002, e.g., van Horne 2002), only one paper explicitly

mentions error propagation from maps, Geographic

Information System (GIS) layers, or remotely sensed

data, and its potential impact on the statistical models

obtained (Henebry and Merchant 2002). These authors

(p. 295) state ‘‘effective techniques for . . . understanding

of error propagation in GIS are still rudimentary but

experience is rapidly increasing.’’ In this paper we

present an evaluation of the impact of error propagation

from errors in a digital elevation model (DEM) on all

stages in the fitting of statistical regression models for

predicting the distribution of species and their potential

habitats. The purpose is to understand how error and its

propagation affect final results in predictive modeling

and to consider options that will reduce the impact on

modeling results.

Error can enter into the predictive modeling process

by a number of pathways: spatial inaccuracies in the

sampling points, weaknesses in the assumptions or

calculations developing the spatial extension of the

environmental factors, the fit of the modeled relation-

ship, and error that was propagated from any initial

data sets. Like errors in other data sets, GIS data errors

affect the reliability of the final outcomes of any model,

but how they affect final outcomes is difficult to assess.

In general, we can only calculate the uncertainty in

model results due to input errors introduced into the

modeling processes.

Derivatives of a digital elevation model can have very

high levels of propagated error (Fisher 1998, Holmes et

al. 2000), yet most commonly employed environmental

variables for predictive vegetation modeling are derived

from a DEM but are not tested for problems (Henebry

and Merchant 2002). Prevailing belief is that geo-

morphometric (indirect) variables are expected to have

less propagated error than more derived, direct variables

(Guisan et al. 1998, Guisan and Zimmermann 2000), but

Van Niel et al. (2004) have demonstrated that this is not

true. They found that the level of propagated error from

a DEM is more complex and depends on the method-

ology of the data development and the nature of the

study site. Van Horssen et al. (2002) considered the

impact of error in kriged prediction surfaces on spatial

predictions of vegetation, but in this case the indepen-

dent and dependent variables were gathered at the same

time at each field site, so there is no uncertainty in the

relationship between the variables beyond model fit.

They still found extensive impact of model fit and

kriging error on the outcomes of spatial predictive

Manuscript received 21 November 2005; revised 5 May 2006;accepted 9 May 2006. Corresponding Editor: J. S. Baron.

3 E-mail: [email protected]

266

Page 2: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

modeling. However, there has been no examination of

how the error and uncertainty in environmental

variables, particularly those developed from DEMs,

affects the process and results of predictive vegetation

modeling. If it has an extensive effect, then sensitivity of

environmental variables to error in the DEM should be

a consideration when selecting variables for inclusion in

a model. This has the potential to affect many aspects

and types of models, not only of terrestrial predictive

habitat modeling, but also climatic, marine benthic, and

ecosystem process models. The implications of the

outcomes of this study are discussed further in the

Discussion, below.

In this paper we explore how error in the DEM leads

to increased uncertainty in results from generalized

additive models (GAMs), generalized linear models

(GLMs), stepwise model selection procedures, accuracy

of model predictions, and the predicted spatial extent of

vegetation species for each of two sets of variables. We

develop species spatial-distribution models for two types

of predictor variables, direct and indirect (sensu Austin

and Smith 1989), which are derived from a DEM. A

probability distribution function based on assessed error

in the DEM is used in a stochastic model to develop 10

equi-probable DEMs. The predictor variables are then

redeveloped from each DEM ‘‘realization’’ and used to

redevelop the species spatial distribution models. The

models are then compared against each other and

against the original model at each stage in the modeling

process.

This research demonstrates the range of model results

that can be expected due to error in the DEM and other

source data sets. In addition, the methods employed here

provide a pathway for analyzing the influence of source-

data error on predictive habitat models, as well as a

guideline for understanding the nature of these effects

and their possible reduction.

DATA AND METHODS

The study was conducted in Murramarang National

Park and South Brooman State Forest on the South

Coast of New South Wales, Australia, ;300 km south

of Sydney. This area has been the subject of numerous

vegetation-prediction studies (Moore et al. 1991, Fitz-

gerald and Lees 1992, 1994), and an extensive GIS

database and set of field plots (424) exists for the region.

The elevation at the site ranges from 8 to 260 m with an

average elevation of 66 m. Predictive models were

developed for two species: Corymbia maculata (Hook.),

a common sclerophyll tree (63% presence on plots), and

Acmena smithii (Poir.), a less common rainforest tree

(20% presence on plots), and spatial predictions were

made for a subset of the region well within the extent of

the field sites. The driving factors were shown to be

different for these species in standard GAM (generalized

additive model) and GLM (generalized linear model)

analysis. The standard, unperturbed-data results were

used throughout this paper as a base model for

comparison with the results from the uncertainty

analysis. Commonly employed (based on the literature)

direct and indirect variables, as defined by Austin and

Smith (1989), were used to develop separate models for

each species, resulting in four different models (Table 1).

The indirect variables used were elevation (e.g., Guisan

et al. 1999, Vayssieres et al. 2000), slope (e.g., Franklin

1998, Guisan et al. 1998), cosine of aspect (e.g., Lees and

Ritman 1991, e.g., Guisan et al. 1998), and topographic

position, and the direct variables were net solar

radiation (Moore et al. 1993, e.g., Leathwick 1995),

average air temperature (Austin and Meyers 1996,

Vayssieres et al. 2000), and topographic wetness index

(TWI) (e.g., Moore et al. 1993, Barling et al. 1994).

Models based on indirect and direct variables were

compared to investigate their relative sensitivity to

uncertainity (Guisan and Zimmermann 2000, Austin

2002). A geologic nutrient data set (as described in

Moore et al. [1991]) was included in the models

(appropriate soils data have not been developed for

the region), but was not evaluated for uncertainty.

Assessment of baseline error

The error model applied in this study was developed

from ground-truthed (i.e., field verified) data (Fisher

1998, Holmes et al. 2000) and was used to create a

probability distribution function (pdf) to derive ten

random grids which simulate the known error in the

TABLE 1. Base models applied for stepwise procedure, by species.

Model type� Base model applied�

Corymbia maculata

Direct GeoNutrients,4 þ AvgAirTemp þ NetSolarRad þ TWIIndirect GeoNutrients,4 þ DEM þ CosineAspect,2 þ Slope,2 þ TopoPosition

Acmena smithii

Direct GeoNutrients,4 þ AvgAirTemp þ NetSolarRad þ TWI,3Indirect GeoNutrients,4 þ DEM,3 þ CosineAspect,2 þ Slope,4 þ TopoPosition,2

� Direct and indirect variables were used to develop separate models. Direct variables used werenet solar radiation, average air temperature, and topographic wetness index (TWI); indirectvariables were elevation, slope, cosine of aspect, and topographic position.

� Note that the value following the variable indicates that level of polynomial plus all lowerorders. Thus, the direct model for C. maculata would be interpreted as GeoNutrients þGeonutrients2þGeonutrients3 þGeonutrients4 þ AvgAirTempþNetSolarRadþ TWI.

January 2007 267IMPACT OF ERROR ON PREDICTIVE MODELS

Page 3: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

digital elevation model (DEM). The DEM for the study

region was developed from elevation contours (10-m

interval), steamlines and spot heights that were digitized

from a 1:25 000 scale topographic map and then

interpolated to 30-m pixel resolution using IDRISI

(GIS and image-processing software; Clark Labs, Clark

University, Worcester, Massachusetts, USA), as de-

scribed by Lees (1999). The reference (ground-truthed)

data of 2097 points were captured using an OmniSTAR

Scout 12 differential GPS (OmniSTAR, West Perth,

Western Australia, Australia), at a vertical accuracy of

3.8 m or less, along roads at the study site. The point

data were then aggregated into cells (30-m resolution

and spatially matched to the DEM), taking the mean of

all values. The dense forest canopy across much of the

study site made it impossible to capture vertically

accurate data via a GPS across a range of topographic

positions (e.g., gullies). Also, capture of primarily linear

data meant that directional biases (anisotrophy) could

not be assessed properly. Thus, we were unable to

consider these relationships.

The DEM error was assessed at a standard deviation

of 5.3 m, normally distributed, with a spatial autocor-

relation equivalent to Moran’s I of 0.2776 (Van Niel et

al. 2004) and these settings were used to develop the

pdf ’s. This level of error compares well to the standard

deviation of error in other DEMs, such as USGS 30-m-

resolution DEMs (4.1 m, as per Holmes et al. [2000]),

British ordinance survey (BOS) profile (10 m) and

Panorama (50 m) DEMs (2.6 m and 7.0 m, respectively,

as per Fisher [1998]), and shuttle radar topography

mission interferometric terrain elevation data (SRTM

ITED-2) DEMs (3.3 m in open landscape and 15.18 m in

forested areas, as per Heipke et al. [2002]).

Each error grid was then added to the original DEM,

and the environmental variables were re-calculated. This

process was repeated 10 times to generate 10 alternative

uncertainty realizations of the set of predictor variables.

These realizations were then used to create 10 separate

statistical models, which were compared against each

other and against the original models for each species.

The effect of uncertainty was considered not only on the

final outcomes of the model, but also on each step of the

modeling process.

Error impact and assessment

The effect of error on a number of different modeling

processes is considered. The processes selected were

based on their usage in the literature and their

importance in the overall development and outcome of

predictive habitat modeling. There are five steps in

modeling the impact of base error on the modeling

process:

1) Testing of individual variable significance for a

species using GLMs. The effect of propagated error on

the significance of individual predictor variables was

assessed, as this is often the first step in developing a

species spatial-distribution model (Vayssieres et al.

2000). Regression models using each predictor indepen-

dently were fitted for the unperturbed data and the 10

realizations. Regressions for linear, quadratic, cubic,

and quartic polynomial models were fitted separately.

The level of probability determined for each realization

regression was compared with that of the unperturbed

data. The entire set of field plots is used for this analysis.

2) Exploring the shape of the species–environment

relationship curves using GAMs. Shapes of the species-

predictor variable relationship are examined using

GAMs (Brown 1994, Austin and Meyers 1996, Bio et

al. 1998, Franklin 1998, Leathwick 1998, Lehmann

1998). These models are developed for individual

predictor variables, first for comparison to the individ-

ual predictor, and then as full models to consider the

change due to the combination of relationships. The

second case is current practice for later development of a

GLM model (Franklin 1998).

3) Selecting a final model from all predictor variables

by stepwise GLM procedures. Backward stepwise

selection of variables applying Aikake’s information

criterion (AIC) is used for all modeled realizations to

examine the effects of propagated error on this process.

For this analysis and all others, the split-sample

approach was applied with 70% of the data used for

model development and 30% retained for testing.

4) Assessing accuracy of model predictions. Based on

GAM analysis of the unperturbed data, an initial model

was constructed for each species. This model was then

applied to each of the realizations of the data sets. The

coefficients and their standard errors were examined,

and the model accuracy tested using both kappa

accuracies (all at 0.05 thresholds; prediction accuracy

as assessed via the kappa statistic [Cohen’s kappa]) and

area under the receiver operating curve (AUC; Bradley

1997).

5) Predicting the spatial distribution of the species.

Each of the models developed in the stepwise procedures

was used to develop spatial predictions (Pearce and

Ferrier 2000) of Corymbia maculata and Acmena smithii.

For the predictions, each model was applied using the

complete grids of the variables for its realization. For

example, the model developed from realization 1 was

applied only to those environmental variables developed

for realization 1 for predicting the spatial distribution.

In this way, each model was applied to the data set from

which it was developed. The spatial predictions were

developed for each realization, then they were used to

create three data sets based on the probabilities: the

minimum probability, the maximum probability, and

the range in probability for each cell. This allows for the

display of the differences in probability estimates. For

each data set the threshold that optimized kappa

accuracy was used to reclassify the probability data sets

to presence/absence grids (1/0). These were then

summed to give the number of realizations that

predicted the species as present for each cell. This

KIMBERLY P. VAN NIEL AND MIKE P. AUSTIN268 Ecological ApplicationsVol. 17, No. 1

Page 4: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

displays the effect of uncertainty and the range of

possibilities for the final presence/absence predictions.

RESULTS

Univariate GLM analysis

Comparison of the results of the univariate GLM

(generalized linear model) regressions for the realiza-

tions with the unpertubed model shows variability in the

outcomes when error in the digitized error model

(DEM) is considered (Table 2). For all variables and

species, the propagated error affected both the polyno-

mial selected for each predictor variable and whether the

predictor variable was selected at all.

For Corymbia maculata, results depended on the

variable investigated when judged on probability levels

for significance (using P , 0.05). Table 2 shows that

average air temperature, elevation (DEM) and topo-

graphic position all gave identical results for all 40

regressions. Cosine of aspect also showed nonsignificant

results. The topographic wetness index (TWI) generally

produced similar significant regressions to the unper-

turbed data (70% agreement). Not one realization for

net solar radiation or slope gave identical results to the

unperturbed data, and none of the regressions for slope

match those of the unperturbed data using two

significant figs for the probability. At the P , 0.05

level, there is only 23% agreement. TWI is more affected

by error in the DEM than net solar radiation (Van Niel

et al. 2004), yet in this case, the error propagated to net

solar radiation leads to more inconsistent results. Thus,

the variability introduced by the error appears to depend

on both the strength of the relationship and the nature

of the derivation of the predictor variable from the

DEM—for example, where the representation of curva-

ture is critical to the output data set or where the data

conversion is not continuously differentiable (Van Niel

et al. 2004).

For Acmena smithii, the results with direct variables

demonstrate similar patterns as for C. maculata (results

not presented; see Van Niel [2003] for details). In

summary, a strong relationship with net solar radiation

remained unaffected by error in the DEM. Average air

temperature regressions had a 93% agreement, while

TWI had 50% agreement. Regression results using

indirect variables were consistent with the unperturbed

data at the P , 0.05 level for elevation, slope, and

topographic position. For all variables and both species,

TABLE 2. Comparison of unperturbed values and the range of perturbed Prob(v2) values for GLM analysis of Corymbia maculatapresence/absence with direct and indirect variables.

Predictor variablePolynomialselected�

Unperturbedvalue

Perturbed value Realizations (10)

Minimum Maximum Mean SD No. significant�

C. maculata, direct

Net solar radiation L 0.830 0.008 0.715 0.394 0.250 1Qd 0.247 0.013 0.788 0.353 0.255 2C 0.002 0.001 0.686 0.344 0.285 2Qt 0.003 0.003 0.681 0.334 0.268 2

Average air temperature L ,0.001 ,0.001 ,0.001 ��� ��� 10Qd ,0.001 ,0.001 ,0.001 ��� ��� 10C ,0.001 ,0.001 ,0.001 ��� ��� 10Qt ,0.001 ,0.001 ,0.001 ��� ��� 10

Topographic wetness index, TWI Qd ,0.001 0.001 0.361 0.046 0.111 9C ,0.001 0.002 0.316 0.054 0.095 8Qt ,0.001 0.002 0.193 0.060 0.066 7

C. maculata, indirect

Digital elevation model, DEM L ,0.001 ,0.001 ,0.001 ��� ��� 10Qd ,0.001 ,0.001 ,0.001 ��� ��� 10C ,0.001 ,0.001 ,0.001 ��� ��� 10Qt ,0.001 ,0.001 ,0.001 ��� ��� 10

Slope L ,0.001 0.018 0.929 0.268 0.299 2Qd ,0.001 0.020 0.748 0.289 0.295 3C ,0.001 0.019 0.822 0.350 0.301 2Qt ,0.001 0.025 0.748 0.356 0.297 2

Topographic position L ,0.001 ,0.001 ,0.001 ��� ��� 10Qd ,0.001 ,0.001 0.001 ��� ��� 10C ,0.001 ,0.001 ,0.001 ��� ��� 10Qt ,0.001 ,0.001 ,0.001 ��� ��� 10

Cosine of aspect L 0.244 0.130 0.969 0.560 0.332 0Qd 0.09 0.257 0.985 0.571 0.252 0C 0.161 0.203 0.981 0.564 0.244 0Qt 0.267 0.211 0.991 0.622 0.267 0

Note: For each species and environmental variable, all polynomials to quartic are shown, even if a significant value was attainedbefore reaching the next polynomial, to show the extent of variability in the results.

� Polynomials are: L, linear; Qd, quadratic; C, cubic; and Qt, quartic.� Number of error realizations (out of 10 realizations) significant at P , 0.05.

January 2007 269IMPACT OF ERROR ON PREDICTIVE MODELS

Page 5: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

the propagated error affected both the polynomial

selected for each predictor variable and whether the

predictor variables was selected at all.

Effect on shapes of species response curves

Examination of the nature or shape of the relationship

between species and environmental variables is usually

conducted using generalized additive models (GAMs).

Given the results of the univariate analysis above, we

would expect changes in the shapes of the species

response curves due to slight error perturbation.

Figs. 1 and 2 demonstrate the change in the shape of

species response curves given a relatively strong rela-

tionship with the variable. Fig. 1 shows the relationships

between C. maculata presence/absence and average air

temperature. Although average air temperature had a

strong relationship with C. maculata presence/absence

despite the propagation of DEM error, Fig. 1 shows that

the nature of the relationship may still be distorted by

error in the predictor. Different realizations suggest

different final curves, which vary from linear, to

quadratic to cubic (Fig. 1B). Two of the realizations

demonstrate the differences (Fig. 1C and D). Note that

some of the shape changes occur in sparse-sampling

areas, indicating that more comprehensive sampling

may provide some protection from this problem.

Although topographic position appeared to have a

robust relationship with A. smithii presence/absence in

the univariate GLM analysis, Fig. 2 shows how

relatively minor error can change interpretation of the

nature of the relationship, again with variation in form

of linear, quadratic, and cubic. This predictor relation-

ship was less sensitive to error than those environmental

variables with weaker species relationships, such as with

aspect and slope.

This series of realizations demonstrated the problem

of basing interpretation of species response on error-

prone data. Not only does the shape and complexity of

the response change, affecting for example the selection

of a term for a GLM model, but the general response

pattern can be quite different. This was surprising given

that many of the relationships remained significant in

the univariate analysis, and showed that these relation-

ships were not impervious to error in the DEM. The

problem of interpretation of the relationships and the

shapes of species curves remained, despite the strength

of the GLM results.

Effects on model selection by stepwise procedures

Results from the stepwise model selection revealed

sensitivity to DEM error. Table 3 shows the differences

in backwards step model-development results due to the

propagation of error in the DEM to the environmental

variables, and illustrates the weaker relationship of TWI

to C. maculata presence/absence compared to the other

variables. Seven realizations were now nonsignificant for

TWI compared with the unperturbed result, although

nine realizations had significant linear GLM equations

FIG. 1. Generalized additive model (GAM) plots demonstrating the change in relationship between Corymbia maculatapresence/absence and mean air temperature (AvgAirTemp). (A) Relationship for the unperturbed data set. (B) Relationship for 10different error realizations. (C and D) The variation between two of the realizations. On the y-axis ‘‘s’’ stands for ‘‘smoothingspline’’; the units are a spline of the variable, and a high value indicates high probability of occurrence (but is not calibrated here).Above the x-axis the short vertical lines are a ‘‘rug plot,’’ providing information on the number of data points defining the spline.

KIMBERLY P. VAN NIEL AND MIKE P. AUSTIN270 Ecological ApplicationsVol. 17, No. 1

Page 6: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

(Table 2). Net solar radiation and average air temper-

ature are less affected by DEM error than TWI (Van

Niel et al. 2004). The other variables were unaffected

either due to a stronger relationship with C. maculata

presence/absence or to their robustness to DEM error.

Note that the geologic nutrients data set was not

changed in the error propagation. Geologic nutrients, an

unperturbed variable from a different data source,

remains in every model, as does elevation (DEM). The

selection of the other variables in the stepwise procedure

FIG. 2. GAM plots demonstrating the change in relationship between Acmena smithii presence/absence and topographicposition. (A) Relationship with the unperturbed data set. (B) Relationship for 10 different error realizations. (C and D) Thevariation between two of the realizations. The format is as in Fig. 1. The topographic-position units are meters of exposure.

TABLE 3. Results, by species, of backwards stepwise model development and change in AIC and residual deviance for 10 errorrealizations of all models.

Species and model type ModelNo.

affected�

Change in AIC D deviance

Mean SD Mean SD

Corymbia maculata, direct model

Unperturbed model GeoNutrients,4 þ AvgAirTemp þ NetSolarRadþ TWI

0.97 0.83 78.66 6.86

Dropped from unperturbed noneOther models selected GeoNutrients,4 þ AvgAirTemp þ NetSolarRad 7

C. maculata, indirect model

Unperturbed model GeoNutrients,4 þ DEM þ Slope,2 þ TopoPosition 2.83 1.88 77.65 5.97Dropped from unperturbed CosineAspect,2Other models selected GeoNutrients,4 þ DEM þ Slope,2 3

GeoNutrients,4 þ DEM þ CosineAspect,2 þ Slope,2 1GeoNutrients,4 þ DEM þ CosineAspect,2 1GeoNutrients,4 þ DEM þ TopoPosition 2

Acmena smithii, direct model

Unperturbed model GeoNutrients,4 þ AvgAirTemp þ NetSolarRadþ TWI,3

1.41 0.60 44.67 7.93

Dropped from unperturbed noneOther models selected GeoNutrients,4 þ NetSolarRad þ TWI 9

A. smithii, indirect model

Unperturbed model GeoNutrients,4 þ TopoPosition,2 8.24 1.25 75.08 14.68Dropped from unperturbed Slope,4; Dem,3Other models selected GeoNutrients,4 þ CosineAspect,2 þ TopoPosition,2 5

GeoNutrients,4 þ Slope,4 þ TopoPosition,2 2

� Number of models (out of 10 models) with this outcome.

January 2007 271IMPACT OF ERROR ON PREDICTIVE MODELS

Page 7: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

varied for the different realizations. Only 3 out of 10

realizations have identical predictors to the unperturbed

model. The changes in deviance and AIC showed that

the unperturbed-data model falls within the envelope of

the perturbed-data models. The models for A. smithii

with the direct environmental variables disagreed on the

inclusion of average air temperature; 9 out of 10

realizations do not include it in the regression model.

Average air temperature is not greatly affected by error

in the DEM (Van Niel et al. 2004), but the addition of

error in other variables has resulted in the consistent

exclusion of a predictor, due to weakness in the

relationship, which is consistent with the species’

ecology.

Effect on coefficients in GLM models

For the models of C. maculata with direct environ-

mental variables, the coefficients remained relatively

stable across the realizations (Table 4), even between

realizations that included different sets of variables. The

sign remained the same for all coefficients and the

intercept was quite stable, indicating that the general

slope of the model was similar across realizations. The

indirect-environmental variable models for C. maculata

showed changes in the regression coefficients for the

intercept and the unperturbed geologic nutrients data

set. As with the direct model, the sign of the coefficients

and the relative level of the coefficient were not highly

affected for all variables, even with different variables in

the model. This suggests that the relationships between

C. maculata and both geologic nutrients and elevation

are strong enough to be relatively unaffected by error.

The results of A. smithii models with direct variables

demonstrated much greater variation in the coefficients

than the C. maculata models. In particular, the

coefficient on the intercept was highly variable. To test

whether the sensitivity of TWI to error in the DEM was

the cause of the instability in the intercept seen here, the

TWI parameters were removed for all the models with

the same parameters (1–5, 7–10) and the models were

rebuilt. For these nine models, the range in variation of

the intercept was 1.513 and the range of the standard

errors was 0.583, much less than the analysis shown in

Table 4.

The results of models of A. smithii with indirect

variables supported the instability that was seen in the

direct model (not shown). Again, this was likely due to

the strength of the relationship between topographic

position and A. smithii presence/absence and the

sensitivity of topographic position to errors in the

DEM. As a test, the models were redeveloped excluding

topographic position while leaving all other parameters

as they were shown above. Despite the differences in the

other variables included, the variation of the coefficients

for the intercept and geologic nutrients was much lower.

The intercept and geologic nutrients showed much lower

coefficient ranges (e.g., intercept range became 2.53, as

opposed to 12.82).

Effect on prediction accuracy and threshold

Table 5 shows the differences in kappa accuracy and

the area under the receiver operating curve (AUC) for

all models. The graphs of the differences in kappa

accuracy for models of C. maculata with the change in

threshold are shown in Fig 3. for both the direct and

indirect models. Notice that the general trend of the

overall and kappa accuracies was the same for the

realizations as with the unperturbed data. For the

TABLE 4. Summary of the variation in regression coefficients for 10 error realizations of the direct models for Corymbia maculataand Acmena smithii presence/absence.

VariableNumberincluded

Perturbed Realization

UnperturbedMinimum Maximum Mean SD

C. maculata, direct model

Intercept 10 80.688 94.494 86.387 4.093 85.442GeoNutrients 10 �29.352 �25.119 �27.152 1.267 �27.426GeoNutrients2 10 11.910 13.965 12.924 0.608 13.290GeoNutrients3 10 �2.506 �2.127 �2.319 0.111 �2.420GeoNutrients4 10 0.126 0.150 0.139 0.007 0.146AvgAirTemp 10 �4.994 �4.232 �4.560 0.252 �4.507NetSolarRad 10 0.046 0.082 0.058 0.010 0.076TWI 3 �0.174 �0.111 �0.137 0.033 �0.144

A. smithii, direct model

Intercept 10 �25.069 23.244 �1.883 14.140 �27.367GeoNutrients 10 10.487 14.052 12.251 1.220 12.106GeoNutrients2 10 �6.145 �4.506 �5.343 0.555 �5.283GeoNutrients3 10 0.721 0.998 0.864 0.093 0.857GeoNutrients4 10 �0.052 �0.037 �0.045 0.005 �0.044AvgAirTemp 1 0.900 0.900 0.9 NA �1.244NetSolarRad 10 �0.050 �0.023 �0.035 0.008 �0.095TWI 3 �7.710 4.142 2.032 3.636 8.564TWI2 10 �0.295 0.647 0.195 0.312 �0.509TWI3 10 �0.017 0.007 �0.005 0.009 0.01

KIMBERLY P. VAN NIEL AND MIKE P. AUSTIN272 Ecological ApplicationsVol. 17, No. 1

Page 8: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

indirect models the unperturbed data had the best kappa

accuracy, but for the direct models the error realizations

had higher kappa accuracies. There was a wide range of

results, especially the higher kappa accuracies from the

perturbed data sets in the direct models. One could

argue that the kappa accuracies for the direct model had

nowhere to go but up. Note the models with higher

accuracy were the three that did not include TWI,

whereas TWI was included in the unperturbed model.

The graphs of the differences in kappa accuracies for

A. smithii with the change in threshold are shown in Fig.

4 for both the direct and indirect models. The chart of

the kappa accuracies for both indirect and direct models

(Fig. 4) shows that one realization had higher kappa

accuracies than the unperturbed data.

Effect on spatial predictions of species

For each model of C. maculata and A. smithii, the

spatial predictions were developed for every realization

and then used to create three data sets based on the

probabilities, as well as a map of the spatial agreement

across all realizations. The map in Fig. 5 shows the

variation in predicted probabilities for the indirect

models of A. smithii, while Fig. 6 shows the variation

between the presence/absence prediction for the unper-

turbed analysis and two realizations. Fig. 7 shows the

extent of the variation in presence/absence predictions

across all realizations at the optimum threshold, where

the mapped value is the number of models predicting

presence. There were areas on which all models agreed

on either presence or absence of C. maculata for both

models (Fig. 7A and B), although there was less

agreement on the indirect model (7A). Two areas of

agreement for the direct models on predicted absence for

C. maculata were quite notable, in the north and south

of the study area, as well as the lowland running from

the southwest to northwest of the study site along a

river. The realizations also show agreement of presence

in the northwest and eastern parts of the study site.

However, the areas of disagreement and the range in

variation of the probabilities of presence show the extent

to which error in the DEM can affect the spatial

prediction of this species in the both model. For A.

smithii, there were very few places where the models all

agree (Fig. 7D), particularly for the indirect models (Fig.

7C).

DISCUSSION

Propagated primary data error affected each of the

steps in predictive vegetation model development.

Qualitatively this is to be expected; what this study

shows is the nature, magnitude and complexity of the

effects of DEM errors.

Analysis of the relationships between a species and a

single environmental variable is a relatively common

method. For example, Vayssieres et al. (2000) tested the

relationship of six plant species against 25 separate

environmental variables up to the cubic term for each.

The results here suggest that tests of significance of an

individual variable depends on the relationship between

the true value and the value in the DEM, the type of

error (biased or random), and the type and complexity

of the model (Carroll et al. 1995). Complexity of the

model and decreasing degrees of freedom were expected

to make models more sensitive to error, so one would

expect greater variability in a more complex fit. Our

study shows that the GLM (generalized linear model)

univariate method may be called into question as a way

of eliminating variables before model fitting, simply

based on error in the primary data (e.g., Pearce and

Ferrier 2000, Vayssieres et al. 2000). The strong

relationships for both species, which remained unaffect-

TABLE 5. Summary of variation in optimum kappa threshold, highest kappa accuracy, and errorunder the receiver operating curve (AUC) for 10 error-realization direct and indirect models ofCorymbia maculata and Acmena smithii presence/absence, together with results of unperturbeddata models for comparison.

Direct models Indirect models

ModelOptimumthreshold

Kappaaccuracy AUC

Optimumthreshold

Kappaaccuracy AUC

C. maculata

Unperturbed 0.70 0.209 0.634 0.60 0.316 0.646

Perturbed

Minimum 0.50 0.155 0.574 0.35 0.068 0.530Maximum 0.80 0.251 0.639 0.85 0.228 0.630Mean 0.63 0.207 0.608 0.62 0.157 0.581SD 0.09 0.032 0.018 0.16 0.049 0.028

A. smithii

Unperturbed 0.40 0.452 0.735 0.20 0.397 0.781

Perturbed

Minimum 0.25 0.299 0.715 0.20 0.176 0.581Maximum 0.50 0.616 0.779 0.75 0.416 0.778Mean 0.37 0.426 0.751 0.35 0.305 0.711SD 0.09 0.089 0.018 0.17 0.069 0.051

January 2007 273IMPACT OF ERROR ON PREDICTIVE MODELS

Page 9: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

ed by DEM error propagation, were indicated by their

commonly observed ecology (such as the strong

relationship between Acmena smithii, a rainforest

species, with topographic position).

GAMs (generalized additive models) were used for

exploring the shapes of species response to environmen-

tal variables. This method has been used to suggest

relationships for use in GLM models (Brown 1994,

Franklin 1998), to explore ecological theories (Austin

and Meyers 1996), or to develop predictive models (Bio

et al. 1998, Leathwick 1998, Lehmann 1998). The effect

that error in the predictor variables has on the integrity

of GAMs and GLMs influences the success of final

predictive outcomes. More importantly, it can also

affect the development of ecological theories on species

niche and environmental response (Leathwick 1995,

Austin and Meyers 1996). Error in the environmental

variables had a relatively strong effect on the apparent

shapes of species response curves. For some relation-

ships both the slope and the shape of the relationship

were changed by DEM uncertainty.

In the presence of a strong (or extreme lack of)

relationship between the predictor variable and species

presence, GAM and GLM univariate analyses were less

sensitive to error in the DEM. For A. smithii, the strong

relationship between presence/absence and topographic

position was relatively unaffected by DEM error, while

the lack of relationship between presence/absence and

elevation was similarly unaffected. More ephemeral

relationships, such as Corymbia maculata presence/

absence with slope, proved to be quite variable.

However, higher levels of error would certainly have a

more dramatic effect on all the relationships. There is an

interaction between the strength of relationship between

the predictor variable and species presence and the level

of error considered in the uncertainty perturbations.

Weaker relationships are more sensitive at lower levels

of error, and stronger relationships at higher levels of

error. Thus, the level of error in the DEM can have a

very powerful effect on the accuracy of the relationships

explored using this method. However, the GAM

analysis showed that the strength of the relationship

FIG. 3. Predictive accuracies (kappa statistics based on p-optimum, a threshold value that leads to the best accuracy value asassessed by kappa accuracy) for 10 error realizations of the (A) direct and (B) indirect models for Corymbia maculata presence/absence, displayed as decimals. All perturbed realizations are shown in gray; the unperturbed results are shown in black.

KIMBERLY P. VAN NIEL AND MIKE P. AUSTIN274 Ecological ApplicationsVol. 17, No. 1

Page 10: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

did not necessarily protect against misinterpretation of

the nature of species–environment relationships.

This study did not include predictor-variable interac-

tions. Given results from this study and Van Niel et al.

(2004), we would expect that the impact from propa-

gated DEM error would be a complex interaction

between the strength of relationship between interacting

variables and the species, the attribute and spatial

relationships between the variables themselves, and the

relationship (co-occurrence) of the spatial pattern of

error in the interacting variables.

Stepwise procedures are widely used for model

selection for GLMs (Guisan et al. 1998, Zimmerman

and Kienast 1999). Stepwise model selection proved to

be relatively stable, although the strength or extreme

lack of relationship dictated the similarity in the results.

It was likely that inclusion of the unperturbed data set,

geologic nutrients, helped to stabilize this step in the

modeling procedure. For stepwise model development,

the models from the indirect variables were more

sensitive to error in the DEM than the direct models.

This was probably due to the larger number of variables

presented to the stepwise procedure and to the fact that

topographic position, slope, and cosine of aspect are

known to be more sensitive to error than net solar

radiation and average air temperature (see Van Niel et

al. 2004), but it may also indicate that these variables

were more sensitive to error during the stepwise

procedure because the relationships are not as strong.

Some studies have attempted to assign importance of

variables in species relationships based on the order in

which they enter or leave a model in stepwise

procedures. Although this inference is warned against

(Harrell et al. 1996), it is often used (e.g., Nicholls 1989).

FIG. 4. Predictive accuracies (kappa statistics based on p-optimum) for 10 error realizations of the (A) direct and (B) indirectmodels for Acmena smithii presence/absence, displayed as decimals. All perturbed realizations are shown in gray; the unperturbedresults are shown in black.

January 2007 275IMPACT OF ERROR ON PREDICTIVE MODELS

Page 11: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

Our analysis also shows that error in the DEM can

distort the order, and supports the notion that inference

of importance due to order should be avoided. In

addition, change in deviance is often reported as a

measure of how well the model was developed. The

results show that change in deviance is impacted by

error in the DEM. A wide range in the change of

deviance (21.72 in Table 5 and 18.54 in Table 6) was

found simply due to relatively slight error in the DEM.

With more realizations, this range would probably

increase. Both order of variable exclusion and change

in deviance are affected by propagated DEM error, and

thus they may be unreliable guides to the assignment of

importance to environmental variables and as a

comparative measure between models.

Examination of GLM coefficients showed that strong

relationships between species presence/absence and an

environmental variable, when combined with high

sensitivity to error in that environmental variable, can

lead to unstable intercepts, coefficients, and standard

errors. This result was more disturbing than results from

the previous model development steps. It suggests that

although the other steps show more stability when

relationships between environmental variables and

species were strong, a strong relationship can also lead

to very unstable results if the environmental variable

was sensitive to propagated error. Removal of the

sensitive variable appears to help stabilize the results, as

shown in the removal of TWI (topographic wetness

index) from the direct model and topographic position

from the indirect model for A. smithii. The other striking

difference with these models was the high variability of

the coefficients for TWI. It was likely that both of these

effects were caused by the sensitivity of TWI to error in

the DEM (as described by Van Niel et al. [2004]) and the

FIG. 5. Prediction maps for the study area on the southcoast of New South Wales, Australia, demonstrating thevariation in the probability of species presence. The (A)minimum, (B) maximum, and (C) range of probabilities foreach cell for 10 error realizations of the indirect models ofAcmena smithii presence/absence are shown.

FIG. 6. Predicted presence/absence for the (A) unperturbed model and (B and C) two realizations for the indirect models ofCorymbia maculata.

KIMBERLY P. VAN NIEL AND MIKE P. AUSTIN276 Ecological ApplicationsVol. 17, No. 1

Page 12: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

strength of the relationship between TWI and A. smithii

presence/absence. Because the relationship was quite

strong (as shown in the univariate analysis), changes in

TWI through the perturbations caused wide fluctuations

in the intercept, high standard errors on the intercept,

and variability in the TWI coefficients. The greatest

variation occurs with outliers and data sparsity, which

supports the idea that any data collection and analysis

methods that generally strengthen models will also help

to protect models against primary-data error.

Therefore there is a further interaction between the

strength of the relationship, level of error, and the

sensitivity of the variables to error in the DEM. In the

univariate GLM analysis, we saw that a strong

relationship led to more robust results. However, a

strong relationship with an independent variable that

has a high sensitivity can lead to the development of

models that were very sensitive to error in the DEM.

This is a big cause for concern in the development of

vegetation-prediction models. In this study, for example,

TWI and topographic position were the strongest

variables for prediction of the rainforest species,

characteristically found only in gullies on the steep

slopes of the two north–south mountain ranges on the

study site. This is due to a number of factors, which

included not only the routing of water in gully channels

but also the availability of soil nutrients accumulated

from deposition and protection from fire. Since both

predictor variables are sensitive to error in the DEM

(Van Niel et al. 2004) and contributed to model

FIG. 7. The number of models (out of 10) that predicted the species as present. None (0) is displayed in red, and all (10) isshown in blue to highlight those areas where all models agreed. Displayed are (A) indirect models of Corymbia maculata, (B) directmodels of C. maculata, (C) indirect models of Acmena smithii, and (D) direct models of A. smithii.

January 2007 277IMPACT OF ERROR ON PREDICTIVE MODELS

Page 13: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

instability, neither seemed to be particularly better than

the other for model development for the prediction of

rainforest species at this site. What is required in this

case is a variable that can represent the gullies without

being sensitive to error in the DEM. Unfortunately, the

representation of flow characteristics is highly sensitive

to errors in the base DEM (Holmes et al. 2000), and

therefore estimating this topographic feature requires

more scrutiny among the predictive habitat-modeling

community.

Analysis of model prediction accuracy supports the

stepwise model-development findings. Models that

included only those environmental variables that were

relatively robust to DEM error had the highest

accuracies, as compared to those that included variables

that were highly sensitive to DEM error. In this analysis,

the decision of whether to use direct or indirect variables

was not clear. The most important aspect of variable

selection shown here is that environmental variables

should a priori have a strong hypothetical relationship

to species presence/absence (based on known plant

physiology and ecology) and should also be robust to

error in the source data. This recommendation was also

supported by the analysis of the relationships between

variables and species, where stronger relationships are

more impervious to propagated DEM error (see above).

In addition to the strength of the predictive relation-

ship reducing model sensitivity to DEM error, it is likely

that other measures that generally increase model

robustness would also increase robustness to error.

These measures would likely include a larger number of

sample points, balance in the ratio of presence/absence

in the data set, and limiting data sets to reduce absences

beyond the range of a species. The use of environmental

variables that are less sensitive to error would also be

expected to increase robustness. In our analysis,

however, a strong (or strong lack of) relationship was

shown as the most important factor in sensitivity to

DEM error in the univariate GLM analyses. However, it

did not preserve the shape of the species response curves

if both the relationship was strong and the environmen-

tal variable was sensitive to DEM error.

The spatial predictions clearly demonstrate the

problem of propagated error on model results. For only

10 realizations, the results show a wide variation in the

spatial extent of the species predictions. The variation in

the extent of spatial predictions has the greatest

implications for management. If the analyses were

expanded to include a larger number of realizations,

which is required for the development of complete

statistical results, then a better estimate of the extent of

variation and the estimate of the mean variation could

be obtained. However the maps of spatial variation in

the species predictions give a powerful visual demon-

stration of the range of results due to primary-data

error. It also provides a spatial context for the

uncertainties introduced from error in the base data.

For example, the spatial pattern of model disagreement

for the indirect models for C. maculata (Fig. 6) show the

greatest uncertainties in the region where topographic

position is most affected by DEM error (Van Niel et al.

2004), while the same region is affected in the direct

models, probably due to the inconsistency of the

inclusion of TWI in the final models. For both models

of A. smithii, there is general agreement of species

absence on ridges, while uncertainty of species presence

is evident in the gullies (direct model) as well as slopes

and toe slopes (indirect model).

This study has shown that not only final outcomes,

but also many of the components from the process of

predictive habitat modeling are impacted. This includes

prediction accuracies for both plant and animal species

(Pereira and Itami 1991, Zimmerman and Kienast 1999),

spatial extents (Franklin 1998), species realised niche

(Austin et al. 1990), predictor-variable significance

(Vayssieres et al. 2000) and strength of the relationship

between species and predictor variables for species

occurrences (Franklin 1998), interrelationships of envi-

ronmental variables (Austin et al. 1983), stepwise

procedures (Vayssieres et al. 2000), and the shape of

species response curves. Models that predict dominance

(Lenihan 1993) and richness (Heikkinen 1996) could

also be impacted. Although this study has focused on

predictive habitat modeling and modeling methods,

other models based on continuous GIS data or surface

models may also be impacted. For example, distributed

ecosystem process models (e.g., Band et al. 1991), which

rely on input from a DEM to develop secondary

topographic data, will be similarly affected. Climate

and climate-change models that either use DEMs and

their derivatives to spatially extend climate station data

or rely on models that use these data, such as MTCLIM

(Running et al. 1987), would also be impacted (e.g.,

Tingey et al. 2001, Lapp et al. 2005). Affected also

would be predictive process models that are reliant on

derived terrain data, such as fire models like BEHAVE

(Rothermel 1972) and studies that use these models (e,g.,

Stephens 1998). These results may also be extended to

the marine environment, where bathymetry, as a basis

for benthic habitat modeling (Kostylev et al. 2001), may

also be developed from contour data or from data that

require smoothing of striping and scalar (depth) effects

(e.g., MultiBeam techniques). None of the cited papers

consider the effect of propagated error, including those

from GIS data, on the results.

Conclusions

Spatial predictions of vegetation, habitats, and

processes are widely used in management. An under-

standing of how error can affect the prediction of the

spatial extent of a species, community, or process is

critical for management decision making. The results in

this study indicate that error in a DEM can have a

profound effect on modeling processes and outcomes.

Accepting this, it is necessary that we begin to explore

the ramifications of these findings and develop best-

KIMBERLY P. VAN NIEL AND MIKE P. AUSTIN278 Ecological ApplicationsVol. 17, No. 1

Page 14: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

practice methods that reduce the impact of errors. For

example, our results indicate that proper model devel-

opment could help to reduce the impact of DEM error

on predictive habitat modeling. The sensitivity of DEM-

derived environmental variables to error in the DEM

should be considered before including them in any

modeling process. There is clearly a need to examine

environmental variables, not just on the basis of their

relatedness to the original source data or to physiolog-

ical drivers, but also in light of their sensitivity to error

in the source data. In particular, we need to develop

variables that are more robust under these conditions.

This study has shown that propagated uncertainty,

based on error in the DEM, has the potential to have a

significant impact on a number of steps and the final

results from predictive vegetation modeling.

ACKNOWLEDGMENTS

This study was conducted as part of a Ph.D. thesis at theAustralian National University, supported by generous schol-arships provided by the ANU and the Australian Departmentof Youth Services. Special thanks go to thesis supervisor BrianLees (UNSW-ADFA) for support and comments on the draftand to thesis advisors Stuart Davey (BRS) and BrendanMackey (ANU). Thanks also to two anonymous reviewerswhose comments greatly improved the manuscript.

LITERATURE CITED

Austin, M. P. 2002. Spatial predictions of species distribution:an interface between ecological theory and statisticalmodelling. Ecological Modelling 157:101–118.

Austin, M. P., R. B. Cunningham, and R. B. Good. 1983.Altitudinal distribution of several eucalypt species in relationto other environmental factors in southern New South Wales.Australian Journal of Ecology 8:169–180.

Austin, M. P., and J. A. Meyers. 1996. Current approaches tomodelling the environmental niche of eucalypts: implicationsfor management of forest biodiversity. Forest Ecology andManagement 85:95–106.

Austin, M. P., A. O. Nicholls, and C. R. Margules. 1990.Measurement of the realized qualitative niche: environmentalniches of five Eucalyptus species. Ecological Monographs 60:161–177.

Austin, M. P., and T. M. Smith. 1989. A new model for thecontinuum concept. Vegetation 83:35–47.

Band, L. E., D. Peterson, S. Running, J. Coughlin, R.Lammers, J. Dungan, and R. Nemani. 1991. Forestecosystem processes at the watershed scale: basis fordistributed simulation. Ecological Modelling 56:171–196.

Barling, R. D., I. D. Moore, and R. B. Grayson. 1994. A quasi-dynamic wetness index for characterising the spatial distri-bution of zones of surface saturation and soil water content.Water Resources Research 30:1029–1044.

Bio, A. M. F., R. Alkemade, and A. Barendregt. 1998.Determining alternative models for vegetation responseanalysis: a non-parametric approach. Journal of VegetationScience 9:5–16.

Bradley, A. P. 1997. The use of the area under the ROC curve inthe evaluation of machine learning algorithms. PatternRecognition 30:1145–1159.

Brown, D. G. 1994. Predicting vegetation types at treeline usingtopography and biophysical disturbance variables. Journal ofVegetation Science 5:641–656.

Carroll, D., D. Ruppert, and L. A. Stefanski. 1995. Measure-ment error in nonlinear models. Chapmand and Hall,London, UK.

Fisher, P. 1998. Improved modeling of elevation error withgeostatistics. GeoInformatica 2:215–233.

Fitzgerald, R. W., and B. G. Lees. 1992. The application ofneural networks to the floristic classification of remote sensingandGISdata in complex terrain. Pages 570–573 inProceedingsof the 17th ISPRS Congress. American Society for Photo-grammetry and Remote Sensing, Washington D.C., USA.

Fitzgerald, R. W., and B. G. Lees. 1994. Spatial context andscale relationships in raster data for thematic mapping innatural systems. Pages 462–476 in T. Waugh and R. Hedley,editors. Advances in GIS research. Taylor and Francis,Southhampton, UK.

Franklin, J. 1998. Predicting the distribution of shrub species insouthern California from climate and terrain-derived vari-ables. Journal of Vegetation Science 9:733–748.

Guisan, A., J. Theurillat, and F. Kienast. 1998. Predicting thepotential distribution of plant species in an alpine environ-ment. Journal of Vegetation Science 9:65–74.

Guisan, A., S. B. Weiss, and A. D. Weiss. 1999. GLM versusCCA spatial modeling of plant species distribution. PlantEcology 143:107–122.

Guisan, A., and N. E. Zimmermann. 2000. Predictive habitatdistribution models in ecology. Ecological Modelling 135:147–186.

Harrell, F. E., K. L. Lee, and D. B. Mark. 1996. Mutlivariableprognostic models: issues in developing models, evaluatingassumptions and adequacy measuring and reducing errors.Statistics in Medicine 15:361–387.

Heikkinen, R. K. 1996. Predicting patterns of vascular plantspecies richness with composite variables: a meso-scale studyin Finnish Lapland. Vegetatio 126:151–165.

Heipke, C., A. Koche, and P. Lohmann. 2002. Analysis ofSRTM DTM. Methodology and practical results. In A.Boberg, editor. Photogrammetry meets geoinformatics.Journal of Swedish Society for Photogrammetry and RemoteSensing 2000(1):69–80.

Henebry, G. M., and J. Merchant. 2002. Geospatial data intime: Limits and prospects for predicting species occurrences.Pages 291–302 in J. Scott, P. Heglund, J. Haufler, J.Morrison, M. Raphael, W. Wall, and F. Samson, editors.Predicting species occurrences: issues of accuracy and scale.Island Press, Washington, D.C., USA.

Holmes, K. W., O. A. Chadwick, and P. C. Kyriakidis. 2000.Error in a USGS 30-meter digital elevation model and itsimpact on terrain modelling. Journal of Hydrology 233:154–173.

Kostylev, V., B. Todd, G. Fader, R. Courtney, G. Cameron,and R. Pickill. 2001. Benthic habitat mapping on the ScotianShelf based on multibeam bathymetry, surficial geology, andseafloor photographs. Marine Ecology Progress Series 219:121–137.

Lapp, S., J. Buyrne, I. Townshend, and S. Zienzle. 2005.Climate warming impacts on snowpack accumulation in analpine watershed. International Journal of Climatology 25:521–536.

Leathwick, J. R. 1995. Climatic relationships of some NewZealand forest tree species. Journal of Vegetation Science. 6:237–248.

Leathwick, J. R. 1998. Are New Zealand’s Nothofagus speciesin equilibrium with their environment? Journal of VegetationScience 9:719–732.

Lees, B. G. 1999. The Kioloa GLCTS Pathfinder Site.[Accessed 21 October 1999.] hhttp://sres.anu.edu.au/associated/pathfinder/index.htmli

Lees, B. G., and K. Ritman. 1991. Decision-tree and rule-induction approach to integration of remotely sensed andGIS data in mapping vegetation in disturbed or hillyenvironments. Environmental Management 15:823–831.

Lehmann, A. 1998. GIS modelling of submerged macrophytedistribution using Generalised Additive Models. PlantEcology 139:113–124.

January 2007 279IMPACT OF ERROR ON PREDICTIVE MODELS

Page 15: PREDICTIVE VEGETATION MODELING FOR CONSERVATION: …randallboone.org/nr505/readings/Van_Niel_and_Austin_2007.pdf · Ecological Applications, 17(1), 2007, pp. 266–280 2007 by the

Lenihan, J. M. 1993. Ecological response surfaces for NorthAmerican boreal tree species and their use in forestclassification. Journal of Vegetation Science 4:667–680.

Moore, D. M., B. G. Lees, and S. M. Davey. 1991. A newmethod for predicting vegetation distributions using decisiontree analysis in a geographic information system. Environ-mental Management 15:59–71.

Moore, I. D., T. W. Norton, and J. E. Williams. 1993.Modelling environmental heterogeneity in forested land-scapes. Journal of Hydrology 150:717–747.

Nicholls, A. O. 1989. How to make biological surveys gofurther with generalised linear models. Biological Conserva-tion 73:51–75.

O’Connor, R. 2002. The conceptual basis of species distributionmodeling: time for a paradigm shift. Pages 25–33 in J. Scott,P. Heglund, J. Haufler, J. Morrison, M. Raphael, W. Wall,and F. Samson, editors. Predicting species occurrences: issuesof accuracy and scale. Island Press, Washington, D.C., USA.

Pearce, J., and S. Ferrier. 2000. An evaluation of alternativealgorithms for fitting species distribution models. EcologicalModelling 128:127–147.

Pereira, J. M. C., and R. M. Itami. 1991. GIS-based habitatmodelling using logistic multiple regression: a study of theMt. Graham red squirrel. Photogrammetric Engineering andRemote Sensing 57:1475–1486.

Rothermel, R. 1972. A mathematical model for predicting firespread in wildland fuels. Research Paper INT-115. USDAForest Service, Ogden, Utah, USA.

Running, S., R. Nemani, and R. Hungerford. 1987. Extrapo-lation of synoptic meteorological data in mountainousterrain and its use for simulating forest evapotranspirationand photosynthesis. Canadian Journal of Forest Research17:472–483.

Scott, J., P. Heglund, J. Haufler, J. Morrison, M. Raphael, W.Wall, and F. Samson, editors. 2002. Predicting species

occurrences: issues of accuracy and scale. Island Press,Washington, D.C., USA.

Stephens, S. 1998. Evaluation of the effects of silviculture andfuels treatments on potential fire behaviour in Sierra Nevadamixed-conifer forests. Forest Ecology and Management 105:21–35.

Tingey, D., J. Laurence, J. Weber, J. Greene, W. Hogsett, S.Brown, and E. Lee. 2001. Elevated CO2 and temperaturealter the response of Pinus ponderosa to ozone: a simulationanalysis. Ecological Applications 11:1412–1424.

van Horne, B. 2002. Approach to habitat modelling: the tensionbetween pattern and process and between specificity andgenerality. Pages 63–72 in J. M. Scott, P. J. Heglund, andM. L. Morrison, editors. Predicting species occurrences.Island Press, Washington, D.C., USA.

van Horssen, P. W., E. J. Pebesma, and P. P. Schot. 2002.Uncertainties in spatially aggregated predictions from alogistic regression. Ecological Modelling 154:93–101.

Van Niel, K. P. 2003. Geographical issues in predictivevegetation modelling: Error and uncertainty in GIS data,methods, and models. Dissertation. The Australian NationalUniversity, Canberra, Australian Capital Territory, Austra-lia.

Van Niel, K. P., S. W. Laffan, and B. G. Lees. 2004. Effect oferror in the DEM on environmental variables for predictivevegetation modelling. Journal of Vegetation Science 15:747–756.

Vayssieres, M. P., R. E. Plant, and B. H. Allen-Diaz. 2000.Classification trees: an alternative non-parametric approachfor predicting species distributions. Journal of VegetationScience 11:679–694.

Zimmerman, N., and F. Kienast. 1999. Predictive mapping ofalpine grasslands in Switzerland: Species versus communityapproach. Journal of Vegetation Science 10:469–482.

KIMBERLY P. VAN NIEL AND MIKE P. AUSTIN280 Ecological ApplicationsVol. 17, No. 1