9
Ecological Modelling 363 (2017) 48–56 Contents lists available at ScienceDirect Ecological Modelling j ourna l h omepa ge: www.elsevier.com/locate/ecolmodel Minimizing effects of methodological decisions on interpretation and prediction in species distribution studies: An example with background selection Catherine S. Jarnevich a,, Marian Talbert b , Jeffery Morisette b , Cameron Aldridge c , Cynthia S. Brown d , Sunil Kumar e , Daniel Manier a , Colin Talbert a,b , Tracy Holcombe a a U.S. Geological Survey, Fort Collins Science Center, 2150 Centre Ave Bldg C, Fort Collins, CO 80526, USA b Department of Interior, North Central Climate Science Center, Colorado State University, Fort Collins, CO 80523, USA c Natural Resource Ecology Laboratory, Colorado State University, in cooperation with the U.S. Geological Survey, Fort Collins Science Center, 2150 Centre Ave Bldg C, Fort Collins, CO 80526, USA d Department of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO 80523-1177, USA e Natural Resource Ecology Laboratory, Colorado State University, Fort Collins, CO 80523-1499, USA a r t i c l e i n f o Article history: Received 23 December 2016 Received in revised form 16 June 2017 Accepted 8 August 2017 Keywords: Species distribution modeling Habitat modeling Niche modeling Correlative models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c t Evaluating the conditions where a species can persist is an important question in ecology both to under- stand tolerances of organisms and to predict distributions across landscapes. Presence data combined with background or pseudo-absence locations are commonly used with species distribution modeling to develop these relationships. However, there is not a standard method to generate background or pseudo-absence locations, and method choice affects model outcomes. We evaluated combinations of both model algorithms (simple and complex generalized linear models, multivariate adaptive regression splines, Maxent, boosted regression trees, and random forest) and background methods (random, mini- mum convex polygon, and continuous and binary kernel density estimator (KDE)) to assess the sensitivity of model outcomes to choices made. We evaluated six questions related to model results, including five beyond the common comparison of model accuracy assessment metrics (biological interpretability of response curves, cross-validation robustness, independent data accuracy and robustness, and prediction consistency). For our case study with cheatgrass in the western US, random forest was least sensitive to background choice and the binary KDE method was least sensitive to model algorithm choice. While this outcome may not hold for other locations or species, the methods we used can be implemented to help determine appropriate methodologies for particular research questions. Published by Elsevier B.V. 1. Introduction Understanding environmental conditions that allow a species to persist has been a fundamental question in ecology (Grinnell, 1917; Hutchinson, 1957; Soberón, 2007) and continues to be a pressing conservation priority. As originally described, the Grinnel- lian or fundamental niche considered a series of scenopoetic (i.e., abiotic) conditions that allowed for a species to persist (Grinnell, 1917; Hutchinson, 1957). However, Hutchinson (1957) and others recognized that biotic interactions, such as competitive exclu- sion, resulted in a species rarely utilizing its entire fundamental Corresponding author. E-mail address: [email protected] (C.S. Jarnevich). niche, referring to this smaller occupied space as the realized niche (Pulliam, 2000). There has been a recent proliferation in the application of species distribution models (hereafter SDMs) in the ecological literature partly in response to the large availability of species occurrence data (Anderson, 2012) and spatial datasets (e.g., Porter et al., 2012), but also in part due to the increase in development and application of multiple SDMs (Zimmermann et al., 2010). SDMs attempt to understand the niche conditions (typically realized niche) that allow a species to persist, contrasting known presence locations with either known absence locations or some represen- tative sample of potential available locations across space that characterize the range of environmental conditions available to the species, alternatively called background, available, or pseudo- absence locations. We will use the term ‘background’. Presence data are often the only data collected and available (i.e., no absence data; http://dx.doi.org/10.1016/j.ecolmodel.2017.08.017 0304-3800/Published by Elsevier B.V.

Minimizing effects of methodological decisions on ......Niche of modeling Correlative beyond models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Minimizing effects of methodological decisions on ......Niche of modeling Correlative beyond models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c

Mpb

CCa

b

c

Ad

e

a

ARRA

KSHNCMBRGB

1

t1pla1rs

h0

Ecological Modelling 363 (2017) 48–56

Contents lists available at ScienceDirect

Ecological Modelling

j ourna l h omepa ge: www.elsev ier .com/ locate /eco lmodel

inimizing effects of methodological decisions on interpretation andrediction in species distribution studies: An example withackground selection

atherine S. Jarnevich a,∗, Marian Talbert b, Jeffery Morisette b, Cameron Aldridge c,ynthia S. Brown d, Sunil Kumar e, Daniel Manier a, Colin Talbert a,b, Tracy Holcombe a

U.S. Geological Survey, Fort Collins Science Center, 2150 Centre Ave Bldg C, Fort Collins, CO 80526, USADepartment of Interior, North Central Climate Science Center, Colorado State University, Fort Collins, CO 80523, USANatural Resource Ecology Laboratory, Colorado State University, in cooperation with the U.S. Geological Survey, Fort Collins Science Center, 2150 Centreve Bldg C, Fort Collins, CO 80526, USADepartment of Bioagricultural Sciences and Pest Management, Colorado State University, Fort Collins, CO 80523-1177, USANatural Resource Ecology Laboratory, Colorado State University, Fort Collins, CO 80523-1499, USA

r t i c l e i n f o

rticle history:eceived 23 December 2016eceived in revised form 16 June 2017ccepted 8 August 2017

eywords:pecies distribution modelingabitat modelingiche modelingorrelative modelsaxent

a b s t r a c t

Evaluating the conditions where a species can persist is an important question in ecology both to under-stand tolerances of organisms and to predict distributions across landscapes. Presence data combinedwith background or pseudo-absence locations are commonly used with species distribution modelingto develop these relationships. However, there is not a standard method to generate background orpseudo-absence locations, and method choice affects model outcomes. We evaluated combinations ofboth model algorithms (simple and complex generalized linear models, multivariate adaptive regressionsplines, Maxent, boosted regression trees, and random forest) and background methods (random, mini-mum convex polygon, and continuous and binary kernel density estimator (KDE)) to assess the sensitivityof model outcomes to choices made. We evaluated six questions related to model results, including fivebeyond the common comparison of model accuracy assessment metrics (biological interpretability of

oosted regression treesandom forestLMackground data

response curves, cross-validation robustness, independent data accuracy and robustness, and predictionconsistency). For our case study with cheatgrass in the western US, random forest was least sensitive tobackground choice and the binary KDE method was least sensitive to model algorithm choice. While thisoutcome may not hold for other locations or species, the methods we used can be implemented to helpdetermine appropriate methodologies for particular research questions.

Published by Elsevier B.V.

. Introduction

Understanding environmental conditions that allow a specieso persist has been a fundamental question in ecology (Grinnell,917; Hutchinson, 1957; Soberón, 2007) and continues to be aressing conservation priority. As originally described, the Grinnel-

ian or fundamental niche considered a series of scenopoetic (i.e.,biotic) conditions that allowed for a species to persist (Grinnell,917; Hutchinson, 1957). However, Hutchinson (1957) and others

ecognized that biotic interactions, such as competitive exclu-ion, resulted in a species rarely utilizing its entire fundamental

∗ Corresponding author.E-mail address: [email protected] (C.S. Jarnevich).

ttp://dx.doi.org/10.1016/j.ecolmodel.2017.08.017304-3800/Published by Elsevier B.V.

niche, referring to this smaller occupied space as the realized niche(Pulliam, 2000).

There has been a recent proliferation in the application ofspecies distribution models (hereafter SDMs) in the ecologicalliterature partly in response to the large availability of speciesoccurrence data (Anderson, 2012) and spatial datasets (e.g., Porteret al., 2012), but also in part due to the increase in developmentand application of multiple SDMs (Zimmermann et al., 2010). SDMsattempt to understand the niche conditions (typically realizedniche) that allow a species to persist, contrasting known presencelocations with either known absence locations or some represen-tative sample of potential available locations across space that

characterize the range of environmental conditions available tothe species, alternatively called background, available, or pseudo-absence locations. We will use the term ‘background’. Presence dataare often the only data collected and available (i.e., no absence data;
Page 2: Minimizing effects of methodological decisions on ......Niche of modeling Correlative beyond models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c

ical M

SfaadA

ctttclsveptD

psitetbeetswbsTpetbpg(ho2ltthtdstdpmcsbTtr

ett

C.S. Jarnevich et al. / Ecolog

oberon and Peterson, 2005), especially over large spatial extentsor which the time and cost to adequately sample is prohibitive,nd for poorly sampled parts of the world. In these cases, SDMsre limited to those methods that only use presence information toefine the niche (e.g., Ecological Niche Factor Analysis; Hirzel andrlettaz, 2003) or background methods.

Several choices must be made during development of SDMs thatan influence results, and there is not a quantitative methodologyo direct decisions. Alternative choices add uncertainty to predic-ions, some of which can be quantified by comparing alternatives. Inhese studies that partitioned uncertainty among various choices,omparisons were made between modeling algorithm selection,ocation data choice and accuracy, predictor choice, climate changecenarios, method to control for collinearity in predictors, andariable selection method (e.g., Diniz-Filho et al., 2009; Dormannt al., 2008). Previous analyses of quantifiable uncertainty in modelredictions highlight that modeling algorithm is often one ofhe greatest sources of uncertainty (e.g., Diniz-Filho et al., 2009;ormann et al., 2008).

The practice of generating background locations is a form of ariori definition of the area accessible of a species, akin to priorelection of (independent) predictor variables, making careful andnformed consideration of the background sample region essen-ial to interpretable and useful results when models are used toxtrapolate beyond sample units. The selection of background loca-ions in presence-background SDMs is a subject of ongoing debateecause this decision can affect model estimates (e.g., Phillipst al., 2009) and inflate model evaluation statistics (e.g., Roddat al., 2011), but has not been included in the analyses of parti-ioning uncertainty described above. Regardless of the approach,election should be related to the biological question of interesthen defining the niche conditions for a given species. Several

ackground point selection approaches have been explored, buto far no consistent, optimally performing method has emerged.o generate background points both the extent within whichoints will be generated and how points are placed within thextent should be addressed. Three main considerations apply tohese decisions, including the biology of the species, the questionseing asked and the potential sampling bias that often exists inresence-only datasets. Many earlier SDM studies selected back-round points randomly from the entire extent of the study areae.g., Elith et al., 2006; Phillips et al., 2006). For applications usingerbaria and museum data, research suggests targeted backgroundr inventory pseudo-absence approach (e.g., Elith and Leathwick,007; Phillips et al., 2009), thereby comparing observations (col-

ections) to the “full range” of environmental conditions in thearget region. If doing so encompasses a range of unsuitable condi-ions for the species of interest, model prediction success would beigh, but biological understanding of the niche requirements forhe species would not be enhanced (e.g., temperate regions pre-icted unsuitable for tropical species). Thus, linking backgroundampling to the question of interest and understanding implica-ions of the background method is imperative. Additionally, foratasets aggregated from disparate sources, such as multiple, inde-endent survey or mapping efforts, target background locationsay not be available and the aggregated location data may be

lustered geographically (e.g., spatially disparate clusters repre-enting disparate mapping efforts). This resulting sample selectionias can reduce the accuracy of SDMs (see Fourcade et al., 2014).hus it is important to explore the impact of background selec-ion on model results because these locations will influence modelesults.

Building upon the efforts of Barbet-Massin et al. (2012) whoxamined background selection uncertainty, our goal was to inves-igate the effects of a broader spectrum of background methodso evaluate the spatial extent and the spatial placement of avail-

odelling 363 (2017) 48–56 49

able locations within that extent on predictions of SDMs using a‘real’ dataset rather than a virtual species. Previous work highlightsthe importance of testing background selection methods for eachdataset rather than a best method for all species-geographic extentcombinations, and we outline a process to evaluate the effects ofmethods to select background points in conjunction with differ-ent SDMs. We evaluated six different SDMs of varying complexityusing four different background-selection methods. Although thereare many more algorithms commonly used and other methodsto select background locations, we felt that this set of 24 pairswas enough to demonstrate the methodology. Our purpose wasnot to say what the ‘best’ pairing was but rather to evaluate amethodology to choose a pairing that minimized the effects of sub-jective decisions on model results. We explored random placementwithin the study area, random placement within a minimum con-vex polygon defined by presence data, random placement withina region defined by a kernel density estimator (KDE), and place-ment weighted by density of presence locations through a KDE(Fig. 1).

To conduct this assessment, we required a readily availabledataset depicting the presence of a species across a large spa-tial extent, where existing covariates were available spatially.Modeling the distribution of cheatgrass (Bromus tectorum), anexotic invasive grass, is a good test candidate for this exer-cise because previous conservation and management efforts haveresulted in an abundance of location data across the USA. Man-agement concerns and challenges associated with the speciesindicate a clear need to better understand the current and poten-tial influence of this species on wildlands and wildlife habitatsat local and continental scales (Miller et al., 2011). Given a bet-ter understanding of environmental factors that affect cheatgrassdistribution and abundance, land managers may focus limitedresources on areas with the greatest threat, susceptibility, orboth. Regional models also provide a link to scenario model-ing efforts (i.e., application of climate and land-use scenarios) tosupport planning for potential future conditions. Importantly, tobe useful for management and planning, models must represent“reality” observed by field managers and biologists. Analyticallyelegant, but inaccurate models may have little practical value.Therefore, this project presents results from our efforts to refineand improve regional SDM for conceptual and practical applica-tions.

2. Materials and methods

2.1. Location data

We processed data within the VisTrails (Freire et al., 2006) Soft-ware for Assisted Habitat Modeling package (SAHM v 1.1; Morisetteet al., 2013). We compiled cheatgrass point location data for thewestern USA from a variety of sources, resulting in 36,971 loca-tions (Supplementary Table 1), reduced to 16,651 unique locationswithin 230 m resolution pixels. This cell size (230 m) was selectedto facilitate integration of datasets derived from Moderate Reso-lution Imaging Spectroradiometer (MODIS). For model evaluationwe obtained two independent datasets of cheatgrass presence andabsence for a region of southwestern Wyoming (907 presence;4882 absence) and a region in north central Nevada (360 presence;204 absence; Supplementary Table 1).

We generated background locations equal in number to ourpresence locations to use in model development, following advice

of Barbet-Massin et al. (2012) to have a large number of back-ground points with equal weight to the presence locations andresults from preliminary tests we conducted. In these preliminaryanalyses we examined histograms for each potential predictor vari-
Page 3: Minimizing effects of methodological decisions on ......Niche of modeling Correlative beyond models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c

50 C.S. Jarnevich et al. / Ecological Modelling 363 (2017) 48–56

F ution

d ckgrob

atptueairiwasrspwLtbdbrr

ig. 1. Cheatgrass (Bromus tectorum) presence locations and boundaries for distribisplayed as a density function, where darker colors indicate a higher density of baackground points were scattered randomly within the boundaries shown.

ble across a range of background sample sizes ranging from 500o 32000 points. Most means stabilized at the 8000–16000 sam-le of available points used as background locations. Based onhese results, which agreed with Barbet-Massin et al. (2012), wesed a 1:1 ratio of presence to background point locations. Wexplored four different methods to create background locationscross a spectrum of geographically inclusive to restrictive, includ-ng randomly within a box around the presence locations (leastestrictive; termed random hereafter), within a 95% isopleth min-mum convex polygon (MCP) around our presence data, randomly

ithin a binary KDE (KDEb) surface with a 99% isopleth and and hoc optimization method, and within a continuous KDE (KDEc)urface generated with an ad hoc optimization method (mostestrictive). We used SAHM to create the background probabilityurfaces with Albers equal area projection to ensure consistentixel sizes and then added a mask of pixels classified as 100%ater, developed, and/or agricultural according to the National

and Cover Database (NLCD; Fry et al., 2011). Finally, to determinehe sensitivity of our analyses to the random seed used to selectackground locations, we repeated the process 10 times with aifferent seed to examine effects of which points were chose as

ackground locations. However, as this difference did not alter ouresults, we only include these results in the supplementary mate-ial.

of background locations. The continuous kernel density estimator (KDE) surface isund locations. For binary KDE, minimum convex polygon (MCP), and random, the

2.2. Predictor variables

Predictor layers for the model represented variation in climate,phenology, and land cover. For climate, we calculated 19 biocli-matic variables derived from minimum, maximum, and averagemonthly temperature and monthly precipitation from the PRISMclimate group for 2000–2009 at a 2 km resolution (O’Donnell andIgnizio, 2012). Phenology data consisted of 15 phenology metricsderived from ∼230 m MODIS enhanced vegetation index (EVI) dataaveraged across 2000–2009 (Tan et al., 2011). For land cover, wecalculated the percent of each 230 m pixel classified as each ofnine high level classifications in the NLCD product (excluding water[1x] and wetlands [9x] and combining developed [2x] and agricul-tural [8x] into an anthropogenic category), solar radiation, and aremoteness index derived from distance to night time lights (fol-lowing Theobald et al., 2010). We resampled all layers to matchthe phenology layers using the nearest neighbor technique withinSAHM.

Using the SAHM ‘CovariateCorrelationAndSelection’ module, weexamined the maximum of the Pearson’s, Kendall’s, and Spear-man’s rank correlation coefficients among all predictors. We

removed one of any pair with a correlation coefficient |r| > 0.7 toreduce collinearity issues (Dormann et al., 2013), retaining the mostbiologically relevant based on input from literature and research
Page 4: Minimizing effects of methodological decisions on ......Niche of modeling Correlative beyond models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c

ical M

t2

2

a(serasBsssMaae0cirrMsWeam

2

btcttatrepbf(

fiefiPedbravbt

fv

C.S. Jarnevich et al. / Ecolog

eam knowledge, resulting in 17 predictors (Supplementary Table).

.3. Statistical analyses

We used five SDMs covering a range of complexity to gener-te a habitat suitability model including generalized linear modelsGLM; McCullagh and Nelder, 1989), multivariate adaptive regres-ion splines (MARS; Elith and Leathwick, 2007), Maxent (Phillipst al., 2006), boosted regression trees (BRT; Elith et al., 2008), andandom forest (RF; Breiman, 2001). The resultant models were thenpplied to predictor layers for the western USA to map relativeuitability for cheatgrass. The GLM used the standard logit link andayesian information criterion for model selection due to our largeample size (Acquah, 2010). We ran GLM in two ways; one con-idering only first order terms and for the other considering anyecond order terms (squared terms and interactions). For MARS,axent, and RF, we used the default settings in the SAHM pack-

ge (Talbert and Talbert, 2012). BRT models with default settingsppeared overfit, and we optimized parameters following Elitht al. (2008) selecting tree complexity of 5 and learning rate of.0068. To assess model performance, we applied a 9-fold spatialross-validation to the training data, where the study area is dividednto nine geographic areas to split the data into blocks rather thanandom assignment of locations to a split (Roberts et al., 2017). Wean each of these six models (first order GLM, second order GLM,

ARS, Maxent, RF, and BRT) four times, changing the backgroundelection method with each run, to produce a total of 24 models.

ith the additional 10 iterations of background point location gen-ration using a different random seed for each of the 24 modellgorithm by background method combination, we produced 240odels with comparisons shown in the supplementary material.

.4. Model evaluation

To assess which SDM and which background method performedest, we developed a set of criteria for evaluation. The first test waso assess which models had the most biologically logical responseurves. Authors Brown and Manier, each with cheatgrass exper-ise, individually defined the expected response a priori based onheir knowledge of cheatgrass physiology and growth patternsnd in consultation with the literature (although direct connec-ions with our variables were rare, ecological and physiologicalesearch shows important relations to temperature and moisture;.g., Chambers et al., 2014; Condon et al., 2011). We then com-ared these expected responses to the response curves producedy each of the 24 models, and counted the number of responsesor each predictor that deviated considerably from expectationsSupplementary Table 2).

Another set of evaluation metrics pertained to model per-ormance, as indicated by the internal cross-validation and thendependent datasets for Wyoming and Nevada. We examined fourvaluation metrics calculated in SAHM including correlation coef-cient, percent correctly classified, area under the curve (AUC;earce and Ferrier, 2000), and true skill statistic (TSS; Allouchet al., 2006). We generated box plots for each of the three validationatasets both across background method by SDM and across SDMy background method. We visually compared the relative accu-acy and robustness of the different background selection methodsnd SDMs. Relative accuracy was assessed by comparing the meanalues for each of the four evaluation metrics across either SDM orackground method, while robustness was assessed by comparing

he variability in the values (i.e., size of the box plots).

We also wanted to examine how the consistency of predictionsrom different background methods varied among SDMs and viceersa. To do this, we created an ensemble of all SDMs for each

odelling 363 (2017) 48–56 51

background sampling method and an ensemble of all backgroundmethods for each SDM by summing binary maps of the continuouspredictions. The continuous scores were discretized by calculatingthe value of the continuous suitability score that maximized thesum of sensitivity and specificity (suggested by Liu et al., 2016).

3. Results

3.1. Response curves

Overall, RF had the fewest response curves not matching expec-tations, although all techniques except second order GLM wererelatively close (9[RF], 10[MARS], 12[Maxent], 13[BRT] and 14[firstorder GLM] compared to 22 s order GLM response curves notmatching expectations from 68 curves (4 background methodstimes 17 predictors; Table 1a, Supplementary Table 2 and Supple-mentary Figs. 1 and 4). Given that RF was the only SDM withoutvariable selection in the model fitting process, this result is par-ticularly telling as only three predictors had responses that did notmatch expectation including precipitation seasonality for which nomodel’s response curves matched expectation, EVI season lengthwhere no RF model’s response curves matched, and temperatureseasonality for which no KDEc models matched expectations. Ran-dom background (18 outliers) most often matched expectations,followed by MCP and KDEb (19 outliers). KDEc had slightly morecurves that did not meet expectations (24 outliers).

3.2. Internal cross-validation assessment

When comparing multiple evaluation metrics based on inter-nal cross-validation, there was no clear difference in accuracy androbustness between the six SDMs. Means were relatively sim-ilar and variability was large enough to create overlap amongmodel results (Fig. 2a–d; Table 1a; Supplementary Fig. 3). Similarly,there was little noticeable difference between the four backgroundmethods for either accuracy or robustness from the internal cross-validation (Fig. 2e–h; Table 1b; Supplementary Fig. 3).

3.3. Independent data assessment

Random forest generally had higher evaluation metrics andexhibited less variability than other models across all four eval-uation metrics applied to both sets of independent location data(Fig. 2a–d; Table 1a; Supplementary Fig. 3). First order GLM gener-ally performed the poorest with the exception of percent correctlyclassified for Nevada. Mean values for second order GLM and BRTwere consistently right below RF, while MARS and Maxent wereconsistently just above first order GLM. Robustness among theseother models was similar. Overall, the independent Nevada datahad less variability within SDMs (e.g., more robust than Wyomingdata). However, while model evaluation scores using the indepen-dent Wyoming data were consistently lower and less robust thanthe Nevada data, interpretation of model performance was similar,in that RF models performed best.

The KDEc background method had the highest accuracy accord-ing to the internal cross-validation with the exception of percentcorrectly classified, with little difference among the other threemethods (Fig. 2e–h; Table 1b; Supplementary Fig. 3). There was lit-tle noticeable difference between all four background methods for

internal robustness. Interestingly, the random background modelsgenerally performed poorly for internal robustness, but accuracywas generally better than the other methods for percent correctlyclassified.
Page 5: Minimizing effects of methodological decisions on ......Niche of modeling Correlative beyond models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c

52 C.S. Jarnevich et al. / Ecological Modelling 363 (2017) 48–56

Fig. 2. Evaluation metrics for internal cross-validation (internal CV) of cheatgrass (Bromus tectorum) and for two independent datasets, Wyoming and Nevada, for a)background point method across modeling technique and b) for modeling technique across background method.

Page 6: Minimizing effects of methodological decisions on ......Niche of modeling Correlative beyond models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c

C.S. Jarnevich et al. / Ecological Modelling 363 (2017) 48–56 53

Table 1Criteria used to compare model and background performance for cheatgrass (Bromus tectorum) distribution models applied to a) SDMs across the set of background methodsand to b) background methods across the set of SDMs. Values represent the rank in comparison to others, where a value of one indicates that technique or method performedbest for that metric. Background methods included random within a box around the presence locations, within a 95% isopleth minimum convex polygon (MCP) around ourpresence data, randomly within a binary KDE (KDEb) surface with a 99% isopleth and an ad hoc optimization method, and within a continuous KDE (KDEc) surface generatedwith an ad hoc optimization method.

a)

First order GLM Second order GLM Maxent MARS BRT RF

Biological interpretability of response curves 5 6 3 2 4 1Cross-validation accuracy 1 1 1 1 1 1Cross-validation robustness 1 1 1 1 1 1Independent data accuracy 4 2 3 3 2 1Independent data robustness 2 2 2 2 2 1Prediction consistency 6 4 2 3 5 1TOTAL 19 16 12 12 15 6

b)

KDEc KDEb MCP Random

Biological interpretability of response curves 3 2 2 1Cross-validation accuracy 1 1 1 1Cross-validation robustness 1 1 1 1Independent data accuracy 1 2 2 2

3

bdot7dl

vdbglml

3

Sthevmimd

ireh

4

m

Independent data robustness 1

Prediction consistency 4

TOTAL 11

.4. Consistent predictions

Consistency of predictions as measured by the ensemble acrossackground method was highest for RF (73% agreement in pre-ictions across the four background methods) and lowest for firstrder GLM (56% agreement; Supplementary Fig. 2 and Supplemen-ary Table 3). The others were in the middle with 72% for Maxent,2% for MARS, 69% for second order GLM, and 65% for BRT. RF pre-ictions are the least sensitive to background selection method, at

east for this dataset.Examining the four background techniques, KDEc had the most

ariable predictions (38%), while KDEb had the most consistent pre-ictions (70%); recall that variability in these metrics was createdy different results from changing the SDM while holding back-round points constant. Random and MCP both had relatively highevels of consistency (68% and 69%, respectively). KDEc had the

ost extrapolation beyond the geographic bounds of the sampleocations (most restricted background geographically).

.5. Uncertainty from SDM compared to background method

The greatest variability in model performance was due to theDM used and exceeded that due to varying background selec-ion methods (see Fig. 2). Based on these comparisons, RF hadigher accuracy and robustness in general than the other mod-ls, followed by BRT; first order GLM (linear without exponentialariables) performed worst among these model comparisons whenetrics were averaged across background methods. Not surpris-

ngly, internal cross-validation (CV) suggested consistently higherodel performance and lower variability than the independent

ataset comparisons.Among background methods, KDEb ranked highest. However,

nfluence of background was limited to the biological realism ofesponse curves and prediction consistency questions within ourvaluation methodology. It was very similar to MCP, which onlyad slightly lower consistency of predictions.

. Discussion

Our results highlight the importance of evaluating SDMs acrossodel algorithms and environmental space (background). Our gen-

1 1 11 2 38 9 9

eral findings match that of others, where the greatest variation inprediction arose from SDMs (e.g., Diniz-Filho et al., 2009; Dormannet al., 2008). Despite this consistently reported uncertainty, SDMsusing a single algorithm without evaluating uncertainty from thischoice are still common in the literature. Additionally, our resultswere sensitive to background selection method (both extent andplacement of background locations), although in this case studythe RF algorithm displayed less sensitivity to this choice than othermethods. For our particular dataset, combining the RF algorithmwith KDEb background selection minimized uncertainty relatedto both choices, but we do not expect this result to be consistentacross species, space, and time. Our results suggest the importanceof considering both these uncertainties for any SDM effort as previ-ously suggested for SDM choice (Qiao et al., 2015). The methods weprovide to evaluate sensitivity is transferable to other species andgeographic regions, and could be used to evaluate the sensitivity ofresults to modeling.

4.1. Effects of background selection

We evaluated four different methods to select background loca-tion, considering both extent and placement. All methods limitedthe extent in some way to ‘accessible areas’ (Barve et al., 2011), andthis limitation considered areas surveyed (KDEc) and accessible tothe species (random with general rectangle). The MCP method weused is common, although it often includes a buffer (e.g., Roddaet al., 2011). For invasive species, KDEs have been used for draw-ing background points to account for varying sampling intensityand establishment opportunity (Elith et al., 2010). In this approach,background points are drawn proportional to intensity of presencelocations (i.e., more points near dense areas and fewer from sparseareas); thus both background and presence points have similarspatial bias.

Other suggested methods include limiting to within placementto within a certain distance of presence points (VanDerWal et al.,2009), defining the extent for background placement to zonescontaining a presence location from some sort of ecological clas-

sification (e.g., Mesgaran et al., 2014 used Koppen–Geiger climatezones), and using different thresholds from a trend surface analysisto restrict the extent (Acevedo et al., 2012). Many approaches aregeographic in nature and lack information on species’ biology, dis-
Page 7: Minimizing effects of methodological decisions on ......Niche of modeling Correlative beyond models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c

5 ical M

ptdb2mot

pLbdw1pma

tflfitffansciioj

weaif(

4

tsacrtb(dmiigtc2il

4 C.S. Jarnevich et al. / Ecolog

ersal and biotic interactions, and sampling intensity. Selection ofhe background extent at biologically relevant scales and based onetailed information regarding species dispersal limitations and/oriotic interactions may be more pragmatic (Anderson and Raza,010), although this information is often not readily available forany species. Regardless of the method, either those we used or

thers, the methodology we developed could be used to evaluatehe impact of the method on model results.

The influence of background data on models has been exploredreviously using virtual species (e.g., Barbet-Massin et al., 2012;obo and Tognelli, 2011). Barbet-Massin et al. (2012) examinedoth background extent and number of background points on pre-ictive performance of seven SDMs. Methods performed betterhen trained using a large number of background points (e.g.,

0,000 or more) and equal weighting of presence and backgroundoints. They, however, explored pseudo-absence, attempting toimic absence rather than sample the available environment with

ll but their strictly random sample.With biased sampling data where the sampled region is uncer-

ain, it may be impossible to distinguish impacts from sample biasrom those of actual distribution differences. The use-availabilityiterature has a rich history discussing inferences that can be drawnrom different study designs to characterize use versus availabil-ty (see Beyer et al., 2010). Use data, or information on how muchime animals spend in a particular habitat, are generally derivedrom telemetry data, which provides large amounts of location dataor a few individuals. In this case with telemetry data, informationbout the sampling of use is available. In our particular case, weot only did not know the geographic or environmental bounds ofampling or the density of sampling effort, but we also had lessertainty about availability due to spreading and disequilibriumssues associated with an invasive species. In use-availability stud-es, the questions being asked often determine the spatial extentver which background locations are drawn; this relies on a sub-

ective definition of accessible, though.In this study, the spatial extent over which background locations

ere drawn had a large effect (compare random [largest spatialxtent] to KDEc [most restrictive spatial extent]). Interestingly,lthough other studies highlight spatial extent as a substantialnfluence (e.g., Northrup et al., 2013; VanDerWal et al., 2009), weound placement within this extent to be as, if not more, importantcompare of KDEb to KDEc).

.2. Response curves

The criteria we evaluated to compare the 24 different combina-ions of SDMs-background method highlighted the importance ofpecies-specific modeling. Hannemann et al. (2016) recommendednalysis of response curves and variable importance of each modelreated, matching our first criteria related to biological realism ofesponse curves. The degree of biological realism represented byhe shape of the fitted response curves may depend on a num-er of factors including SDM complexity, their inherent limitationse.g., GLM produces linear responses) and different decisions madeuring model calibration (Merow et al., 2014). A less complexodel generally fits simpler response curves (e.g., first order GLM

n Supplementary Fig. 1) whereas a highly complex model resultsn highly irregular response curves with the minor fluctuationsenerally being biologically unrealistic (e.g., BRT in Supplemen-ary Fig. 1). Model complexity and the shape of response curves

an also affect model projections in space and time (Merow et al.,014). Therefore, an important consideration in developing models

s determining optimal levels of complexity that represent under-ying species responses to environmental variables.

odelling 363 (2017) 48–56

4.3. Model evaluation criteria

Our results also reinforce other’s findings (e.g., Hijmans, 2012)to use caution when using internal cross-validation, even when it isspatially split, to infer model validity, given the consistently higherevaluation metrics compared to the independent data. Our resultsshowed little variation between SDMs or background methods intraditional model evaluation statistics (Table 1), which makes itchallenging to select a single ‘best’ model. Thus, traditional modelselection approaches using one or more evaluation metric(s) cal-culated using either the training data or independent testing datamay not be adequate. Ensemble modeling is suggested as a wayaround selecting a single best model (Araújo and New, 2007); how-ever, depending on the research question an ensemble may not beappropriate. For example, response curves are not available fromensemble models, and a study concerned with explanation ratherthan predictions may become intractable as different permutationsare considered (e.g., SDM, background method, suites of predictors,spatial scales, etc.). Additionally, some researchers have questionedthe validity of ensemble modeling due to high variation in predic-tions and instead advocate selection of a ‘best’ model (e.g., Mainaliet al., 2015). A more pragmatic, collaborative model evaluationapproach based on additional considerations, expert knowledgeand the literature, such as used in this study, seems valuable. Whilemost studies tend to rely on standard evaluation metrics fromcross-validation, we developed evaluation metrics that include sev-eral other criteria. For evaluating model performance, researchersshould not only look at the mean values of a single evaluation metric(e.g., AUC or TSS) across all model methods but also critically exam-ine the variability in multiple model runs (i.e., consider accuracyversus robustness of the models), response curve eco-plausibility,and sensitivity to modeling choices (e.g., prediction consistency).This expanded set of criteria could be used to compare any suite ofmodels, not limited to combinations in our analyses.

4.4. Independent data evaluation

Differences in independent data used for model evaluation influ-enced evaluation results, with models seeming to fit the Nevadadata better than the Wyoming data. Since these data came fromregions more than 500 km apart and covered different spatialextents, it is not surprising that model results varied between thesetwo evaluation areas. Variability in model performance demon-strated by the independent Wyoming data provided importantinsights into interpretation challenges often faced in modelingefforts − namely that “real world” distributions often are not asclear and well-delineated as modeled distributions. Understandingrelations between predictors, scale and resolution used to repre-sent variability in predictive factors and the spatial patterns andheterogeneity observed in the modeled population are all impor-tant for interpreting model results in different regions. In our studythe Wyoming test data indicated higher cheatgrass prevalence thanNevada data, but also covered steeper environmental gradientssuggesting potential for great variability in cheatgrass distribution(Supplementary Fig. 1).

Variability in the data and (or) environmental predictors inWyoming clearly led to greater variability in our evaluation met-rics (Fig. 2), and the difference between evaluations based on thesedata and the independent data from Nevada highlight that selec-tion of different models or background selection methods may bemore important in more heterogeneous regions or near the mar-gins of species distributions, or both. Alternately, our models may

be heavily influenced by the abundance of “training” data from theGreat Basin, making our models better suited for predictions in thatregion and thus highlighting the relevance of models to the sampledregion and limitations associated with extrapolation. However, we
Page 8: Minimizing effects of methodological decisions on ......Niche of modeling Correlative beyond models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c

ical M

ue

gainad(acttadaTcbc

4

csatmtteairVtaipdS

5

tioaesppmuocia

C.S. Jarnevich et al. / Ecolog

sed considerable data from the Wyoming basin region, so wexpected this area to be well-within our modeling targets.

In addition, Nevada is the center of “invasion space” for cheat-rass in North America whereas Wyoming is close to the northernnd eastern limits (Brooks et al., 2016). While cheatgrass has shownt can survive and even dominate in Wyoming given the properiche conditions, these conditions appear to be less widespreadnd/or consistent (temporally) in Wyoming, making accurate pre-ictions of the current distribution in this region more difficultBrooks et al., 2016). This has implications for ecology and man-gement of the species in Wyoming (especially given a changinglimate, which may intensify or diminish environmental condi-ions that favor them), but it also has important implications forhose who wish to project model outcomes into unsampled placesnd across time. Namely, environmental model relationships areerived from current and past conditions and these conditions (andssociated variables) should be expected to vary in space and time.hus, while projections to unsampled areas or times, or both, willontinue to be desired by practitioners, these model outputs shoulde considered and applied with great care to avoid assuming a tightonnection to “reality”.

.5. Workflows and SDMs

Our analyses involved producing 24 different models, requiringonsistent inputs and parameterization across subsets of these. Thepatial extent and resolution resulted in computationally intensivenalyses to produce the full suite of results that included a set ofhree maps for each model (continuous surface, binary surface, and

ultivariate environmental similarity surface map) and numerousextual and graphical outputs (e.g., variable importance, evalua-ion metrics). The use of scientific workflows in SDM studies cannhance our abilities to perform such computationally intensivenalyses. For example, the VisTrails software can facilitate track-ng the scientific workflow to enhance reproducibility of scientificesults (Freire and Silva, 2012). The provenance capabilities of theisTrails:SAHM software facilitated the development of models in

his study with many combinations of SDM-background methodnd the visualization and comparison of model outputs. SAHMs particularly useful for this sort of exercise because it readilyresents the same inputs to different models and scenarios, pro-ucing comparable output, evaluation metrics, and maps acrossDMs.

. Conclusions

Our particular results should provide incentive for considera-ions during model development, supporting investigation of thenfluence of not only SDMs but also background selection methodn model results. We provide a methodology for selecting SDMsnd background methods to minimize uncertainty based on mod-ling decisions that are transferable to other species and/or studyites. Our methodology could be applied to other situations withoor quality location data (e.g., unknown sampling bias withresence-only data) to minimize the impact SDM and backgroundethod decisions make. Other decisions could also be considered

sing these same criteria, such as the effect of the selected thresh-

ld rule. Decisions on background sampling design should stronglyonsider the autecology of the species, the niche conditions ones interested in modeling, and indeed consider the question beingddressed by the modeling.

odelling 363 (2017) 48–56 55

Acknowledgements

The work was funded by NASA (Grant # NNH11AS09I) and theDepartment of Interior North Central Climate Science Center. Spe-cial thanks to NASA program manager Woody Turner. Any use oftrade, firm, or product names is for descriptive purposes only anddoes not imply endorsement by the U.S. Government.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.ecolmodel.2017.08.017.

References

Acevedo, P., Jiménez-Valverde, A., Lobo, J.M., Real, R., 2012. Delimiting thegeographical background in species distribution modelling. J. Biogeogr. 39,1383–1390.

Acquah, H.D.-G., 2010. Comparison of Akaike information criterion (AIC) andBayesian information criterion (BIC) in selection of an asymmetric pricerelationship. J. Dev. Agric. Econ. 2, 001–006.

Allouche, O., Tsoar, A., Kadmon, R., 2006. Assessing the accuracy of speciesdistribution models: prevalence, kappa and the true skill statistic (TSS). J. Appl.Ecol. 43, 1223–1232.

Anderson, R.P., Raza, A., 2010. The effect of the extent of the study region on GISmodels of species geographic distributions and estimates of niche evolution:preliminary tests with montane rodents (genus Nephelomys) in Venezuela. J.Biogeogr. 37, 1378–1393.

Anderson, R.P., 2012. Harnessing the world’s biodiversity data: promise and perilin ecological niche modeling of species distributions. Ann. N. Y. Acad. Sci. 1260,66–80.

Araújo, M.B., New, M., 2007. Ensemble forecasting of species distributions. TrendsEcol. Evol. 22, 42–47.

Barbet-Massin, M., Jiguet, F., Albert, C.H., Thuiller, W., 2012. Selectingpseudo-absences for species distribution models: how, where and how many?Methods Ecol. Evol. 3, 327–338.

Barve, N., Barve, V., Jiménez-Valverde, A., Lira-Noriega, A., Maher, S.P., Peterson,A.T., Soberón, J., Villalobos, F., 2011. The crucial role of the accessible area inecological niche modeling and species distribution modeling. Ecol. Model 222,1810–1819.

Beyer, H.L., Haydon, D.T., Morales, J.M., Frair, J.L., Hebblewhite, M., Mitchell, M.,Matthiopoulos, J., 2010. The interpretation of habitat preference metrics underuse–availability designs. Philos. Trans. R. Soc. B : Biol. Sci. 365, 2245–2254.

Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.Brooks, M.L., Brown, C.S., Chambers, J.C., D’Antonio, C.M., Keeley, J.E., Belnap, J.,

2016. Exotic annual bromus invasions: comparisons among species andecoregions in the western United States. In: Germino, M.J., Chambers, J.C.,Brown, C.S. (Eds.), Exotic Brome-Grasses in Arid and Semiarid Ecosystems ofthe Western US: Causes, Consequences, and Management Implications.Springer, Switzerland, pp. 11–60.

Chambers, J.C., Bradley, B.A., Brown, C.S., D’Antonio, C., Germino, M.J., Grace, J.B.,Hardegree, S.P., Miller, R.F., Pyke, D.A., 2014. Resilience to stress anddisturbance: and resistance to Bromus tectorum L. invasion in cold desertshrublands of Western North America. Ecosystems 17, 360–375.

Condon, L., Weisberg, P.J., Chambers, J.C., 2011. Abiotic and biotic influences onBromus tectorum invasion and Artemisia tridentata recovery after fire. Int. J.Wildland Fire 20, 597–604.

Diniz-Filho, J.A.F., Bini, L.M., Rangel, T.F., Loyola, R.D., Hof, C., Nogues-Bravo, D.,Araujo, M.B., 2009. Partitioning and mapping uncertainties in ensembles offorecasts of species turnover under climate change. Ecography 32, 897–906.

Dormann, C.F., Purschke, O., M.árquez, J.R.G., Lautenbach, S., Schröder, B., 2008.Components of uncertainty in species distribution analysis: a case study of theGreat Grey Shrike. Ecology 89, 3371–3386.

Dormann, C.F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J.R.G.,Gruber, B., Lafourcade, B., Leitão, P.J., Münkemüller, T., McClean, C., Osborne,P.E., Reineking, B., Schröder, B., Skidmore, A.K., Zurell, D., Lautenbach, S., 2013.Collinearity: a review of methods to deal with it and a simulation studyevaluating their performance. Ecography 36, 027–046.

Elith, J., Leathwick, J., 2007. Predicting species distributions from museum andherbarium records using multiresponse models fitted with multivariateadaptive regression splines. Divers. Distrib. 13, 265–275.

Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J.,Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A.,Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M., Peterson,A.T., Phillips, S.J., Richardson, K., Scachetti-Pereira, R., Schapire, R.E., Soberon, J.,

Williams, S., Wisz, M.S., Zimmermann, N.E., 2006. Novel methods improveprediction of species’ distributions from occurrence data. Ecography 29,129–151.

Elith, J., Leathwick, J.R., Hastie, T., 2008. A working guide to boosted regressiontrees. J. Anim. Ecol. 77, 802–813.

Page 9: Minimizing effects of methodological decisions on ......Niche of modeling Correlative beyond models Maxent Boosted regression trees Random forest GLM Background data a b s t r a c

5 ical M

E

F

F

F

F

G

H

H

H

H

L

L

M

M

M

M

MVanDerWal, J., Shoo, L.P., Graham, C., William, S.E., 2009. Selecting pseudo-absence

6 C.S. Jarnevich et al. / Ecolog

lith, J., Kearney, M., Phillips, S., 2010. The art of modelling range-shifting species.Methods Ecol. Evol. 1, 330–342.

ourcade, Y., Engler, J.O., Rödder, D., Secondi, J., 2014. Mapping speciesdistributions with MAXENT using a geographically biased sample of presencedata: a performance assessment of methods for correcting sampling bias. PLoSOne 9, e97122.

reire, J., Silva, C.T., 2012. Making computations and publications reproduciblewith VisTrails. Comput. Sci. Eng. 14, 18–25.

reire, J., Silva, C., Callahan, S., Santos, E., Scheidegger, C., Vo, H., 2006. In: Moreau,L., Foster, I. (Eds.), Managing Rapidly-Evolving Scientific Workflows Provenanceand Annotation of Data, vol. 4145. Springer Berlin/Heidelberg, pp. 10–18.

ry, J., Xian, G., Jin, S., Dewitz, J., Homer, C., Yang, L., Barnes, C., Herold, N.,Wickham, J., 2011. Completion of the 2006 national land cover database for theconterminous United States. Photogramm. Eng. Remote Sens. 77, 858–864.

rinnell, J., 1917. The niche relationships of the California thrasher. Auk 34,427–433.

annemann, H., Willis, K.J., Macias-Fauria, M., 2016. The devil is in the detail:unstable response functions in species distribution models challenge bulkensemble modelling. Glob. Ecol. Biogeogr. 25, 26–35.

ijmans, R.J., 2012. Cross-validation of species distribution models: removingspatial sorting bias and calibration with a null model. Ecology 93, 679–688.

irzel, A.H., Arlettaz, R., 2003. Modeling habitat suitability for complex speciesdistributions by environmental-distance geometric mean. Environ. Manag. 32,614–623.

utchinson, G.E., 1957. Concluding remarks. Cold Spring Harb. Symp. Q. Biol. 22,415–427.

iu, C., Newell, G., White, M., 2016. On the selection of thresholds for predictingspecies occurrence with presence-only data. Ecol. Evol. 6, 337–348.

obo, J.M., Tognelli, M.F., 2011. Exploring the effects of quantity and location ofpseudo-absences and sampling biases on the performance of distributionmodels with limited point occurrence data. J. Nat. Conserv. 19, 1–7.

ainali, K.P., Warren, D.L., Dhileepan, K., McConnachie, A., Strathie, L., Hassan, G.,Karki, D., Shrestha, B.B., Parmesan, C., 2015. Projecting future expansion ofinvasive species: comparing and improving methodologies for speciesdistribution modeling. Glob. Change Biol. 21, 4464–4480.

cCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, 2nd edition. Chapmanand Hall London, New York.

erow, C., Smith, M.J., Edwards, T.C., Guisan, A., McMahon, S.M., Normand, S.,Thuiller, W., Wüest, R.O., Zimmermann, N.E., Elith, J., 2014. What do we gainfrom simplicity versus complexity in species distribution models? Ecography37, 1267–1281.

esgaran, M.B., Cousens, R.D., Webber, B.L., 2014. Here be dragons: a tool forquantifying novelty due to covariate range and correlation change whenprojecting species distribution models. Divers. Distrib. 20, 1147–1159.

iller, R.F., Knick, S.T., Pyke, D.A., Meinke, C.W., Hanser, S.E., Wisdom, M.J., Hild,A.L., 2011. Characteristics of sagebrush habitats and limitations to long-termconservation. In: Knick, S.T., Connelly, J.W. (Eds.), Greater Sage-Grouse:Ecology and Conservation of a Landscape Species and Its Habitats. Studies inAvian Biology. University of California Press, Berkeley, CA, pp. 145–184.

odelling 363 (2017) 48–56

Morisette, J.T., Jarnevich, C.S., Holcombe, T.R., Talbert, C.B., Ignizio, D., Talbert, M.K.,Silva, C., Koop, D., Swanson, A., Young, N.E., 2013. VisTrails SAHM: visualizationand workflow management for species habitat modeling. Ecography 36,129–135.

Northrup, J.M., Hooten, M.B., Anderson, C.R., Wittemyer, G., 2013. Practicalguidance on characterizing availability in resource selection functions under ause–availability design. Ecology 94, 1456–1463.

O’Donnell, M.S., Ignizio, D.A., 2012. Bioclimatic predictors for supporting ecologicalapplications in the conterminous United States. U.S. Geol. Surv. Data Ser., 691.

Pearce, J., Ferrier, S., 2000. An evaluation of alternative algorithms for fitting speciesdistribution models using logistic regression. Ecol. Model 128, 127–147.

Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum entropy modeling ofspecies geographic distributions. Ecol. Model 190, 231–259.

Phillips, S.J., Dudik, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S.,2009. Sample selection bias and presence-only distribution models:implications for background and pseudo-absence data. Ecol. Appl. 19, 181–197.

Porter, J.H., Hanson, P.C., Lin, C.-C., 2012. Staying afloat in the sensor data deluge.Trends Ecol. Evol. 27, 121–129.

Pulliam, H.R., 2000. On the relationship between niche and distribution. Ecol. Lett.3, 349–361.

Qiao, H., Soberón, J., Peterson, A.T., 2015. No silver bullets in correlative ecologicalniche modelling: insights from testing among many potential algorithms forniche estimation. Methods Ecol. Evol. 6, 1126–1136.

Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera-Arroita, G., Hauenstein,S., Lahoz-Monfort, J.J., Schröder, B., Thuiller, W., Warton, D.I., Wintle, B.A.,Hartig, F., Dormann, C.F., 2017. Cross-validation strategies for data withtemporal, spatial, hierarchical, or phylogenetic structure. Ecography (n/a–n/a).

Rodda, G.H., Jarnevich, C.S., Reed, R.N., 2011. Challenges in identifying sitesclimatically matched to the native ranges of animal invaders. PLoS One 6,e14670.

Soberón, J., 2007. Grinnellian and Eltonian niches and geographic distributions ofspecies. Ecol. Lett. 10, 1115–1123.

Soberon, J., Peterson, A.T., 2005. Interpretation of models of fundamentalecological niches and species’ distributional areas. Biodivers. Inf. 2, 1–10.

Talbert, C.B., Talbert, M.K., 2012. User Manual for SAHM Package for VisTrails. U.S.Geological Survey, Fort Collins Science Center, Fort Collins, CO (72 pp).

Tan, B., Morisette, J.T., Wolfe, R.E., Gao, F., Ederer, G.A., Nightingale, J., Pedelty, J.A.,2011. An enhanced TIMESAT algorithm for estimating vegetation phenologymetrics from MODIS data. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 4,361–371.

Theobald, D.M., Norman, J.B., Newman, P., 2010. Estimating visitor use of protectedareas by modeling accessibility: a case study in Rocky Mountain National Park,Colorado. J. Conserv. Plan. 6, 1–20.

data for presence-only distribution modeling: how far should you stray fromwhat you know? Ecol. Model 220, 589–594.

Zimmermann, N.E., Edwards, T.C., Graham, C.H., Pearman, P.B., Svenning, J.-C.,2010. New trends in species distribution modelling. Ecography 33, 985–989.