29
Ecological Modelling 190 (2006) 231–259 Maximum entropy modeling of species geographic distributions Steven J. Phillips a, , Robert P. Anderson b,c , Robert E. Schapire d a AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ 07932, USA b Department of Biology, City College of the City University of New York, J-526 Marshak Science Building, Convent Avenue at 138th Street, New York, NY 10031, USA c Division of Vertebrate Zoology (Mammalogy), American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA d Computer Science Department, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA Received 23 February 2004; received in revised form 11 March 2005; accepted 28 March 2005 Available online 14 July 2005 Abstract The availability of detailed environmental data, together with inexpensive and powerful computers, has fueled a rapid increase in predictive modeling of species environmental requirements and geographic distributions. For some species, detailed pres- ence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, absence data are not available for most species. In this paper, we introduce the use of the maximum entropy method (Maxent) for modeling species geographic distributions with presence-only data. Maxent is a general-purpose machine learning method with a simple and precise mathematical formulation, and it has a number of aspects that make it well-suited for species distribution modeling. In order to investigate the efficacy of the method, here we perform a continental-scale case study using two Neotropical mammals: a lowland species of sloth, Bradypus variegatus, and a small montane murid rodent, Microryzomys minutus. We compared Maxent predictions with those of a commonly used presence-only modeling method, the Genetic Algorithm for Rule-Set Prediction (GARP). We made predictions on 10 random subsets of the occurrence records for both species, and then used the remaining localities for testing. Both algorithms provided reasonable estimates of the species’ range, far superior to the shaded outline maps available in field guides. All models were significantly better than random in both binomial tests of omission and receiver operating characteristic (ROC) analyses. The area under the ROC curve (AUC) was almost always higher for Maxent, indicating better discrimination of suitable versus unsuitable areas for the species. The Maxent modeling approach can be used in its present form for many applications with presence-only datasets, and merits further research and development. © 2005 Elsevier B.V. All rights reserved. Keywords: Maximum entropy; Distribution; Modeling; Niche; Range Corresponding author. Tel.: +1 973 360 8704; fax: +1 973 360 8871. E-mail addresses: [email protected] (S.J. Phillips), [email protected] (R.P. Anderson), [email protected] (R.E. Schapire). 1. Introduction Predictive modeling of species geographic distribu- tions based on the environmental conditions of sites of known occurrence constitutes an important tech- 0304-3800/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.ecolmodel.2005.03.026

Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

Embed Size (px)

Citation preview

Page 1: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

Ecological Modelling 190 (2006) 231–259

Maximum entropy modeling of species geographic distributions

Steven J. Phillipsa,∗, Robert P. Andersonb,c, Robert E. Schapired

a AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ 07932, USAb Department of Biology, City College of the City University of New York, J-526 Marshak Science Building,

Convent Avenue at 138th Street, New York, NY 10031, USAc Division of Vertebrate Zoology (Mammalogy), American Museum of Natural History, Central Park West at 79th Street,

New York, NY 10024, USAd Computer Science Department, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA

Received 23 February 2004; received in revised form 11 March 2005; accepted 28 March 2005Available online 14 July 2005

Abstract

The availability of detailed environmental data, together with inexpensive and powerful computers, has fueled a rapid increasein predictive modeling of species environmental requirements and geographic distributions. For some species, detailed pres-ence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, absence dataare not available for most species. In this paper, we introduce the use of the maximum entropy method (Maxent) for modelingspecies geographic distributions with presence-only data. Maxent is a general-purpose machine learning method with a simpleand precise mathematical formulation, and it has a number of aspects that make it well-suited for species distribution modeling. In

mmals: a

dictionemainingoutline

eceiverdicatingts present

u-esh-

order to investigate the efficacy of the method, here we perform a continental-scale case study using two Neotropical malowland species of sloth,Bradypus variegatus, and a small montane murid rodent,Microryzomys minutus. We compared Maxentpredictions with those of a commonly used presence-only modeling method, the Genetic Algorithm for Rule-Set Pre(GARP). We made predictions on 10 random subsets of the occurrence records for both species, and then used the rlocalities for testing. Both algorithms provided reasonable estimates of the species’ range, far superior to the shadedmaps available in field guides. All models were significantly better than random in both binomial tests of omission and roperating characteristic (ROC) analyses. The area under the ROC curve (AUC) was almost always higher for Maxent, inbetter discrimination of suitable versus unsuitable areas for the species. The Maxent modeling approach can be used in iform for many applications with presence-only datasets, and merits further research and development.© 2005 Elsevier B.V. All rights reserved.

Keywords: Maximum entropy; Distribution; Modeling; Niche; Range

∗ Corresponding author. Tel.: +1 973 360 8704;fax: +1 973 360 8871.

E-mail addresses: [email protected](S.J. Phillips), [email protected] (R.P. Anderson),[email protected] (R.E. Schapire).

1. Introduction

Predictive modeling of species geographic distribtions based on the environmental conditions of sitof known occurrence constitutes an important tec

0304-3800/$ – see front matter © 2005 Elsevier B.V. All rights reserved.doi:10.1016/j.ecolmodel.2005.03.026

Page 2: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

232 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

nique in analytical biology, with applications in con-servation and reserve planning, ecology, evolution,epidemiology, invasive-species management and otherfields (Corsi et al., 1999; Peterson and Shaw, 2003;Peterson et al., 1999; Scott et al., 2002; Welk etal., 2002; Yom-Tov and Kadmon, 1998). Sometimesboth presence and absence occurrence data are avail-able for the development of models, in which casegeneral-purpose statistical methods can be used (for anoverview of the variety of techniques currently in use,seeCorsi et al., 2000; Elith, 2002; Guisan and Zim-merman, 2000; Scott et al., 2002). However, while vaststores of presence-only data exist (particularly in nat-ural history museums and herbaria), absence data arerarely available, especially for poorly sampled tropicalregions where modeling potentially has the most valuefor conservation(Anderson et al., 2002; Ponder et al.,2001; Soberon, 1999). In addition, even when absencedata are available, they may be of questionable valuein many situations(Anderson et al., 2003). Modelingtechniques that require only presence data are thereforeextremely valuable(Graham et al., 2004).

1.1. Niche-based models from presence-only data

We are interested in devising a model of a species’environmental requirements from a set of occurrencelocalities, together with a set of environmental vari-ables that describe some of the factors that likelyinfluence the suitability of the environment for thes )E udep ob-s oftend andh on,2 allp area,w het en-t vene

tiono nvi-r ichec itsl thats pies

(Hutchinson, 1957). The species’ realized niche maybe smaller than its fundamental niche, due to humaninfluence, biotic interactions (e.g., inter-specific com-petition, predation), or geographic barriers that havehindered dispersal and colonization; such factors mayprevent the species from inhabiting (or even encoun-tering) conditions encompassing its full ecological po-tential(Pulliam, 2000; Anderson and Martınez-Meyer,2004). We assume here that occurrence localities aredrawn from source habitat, rather than sink habitat,which may contain a given species without having theconditions necessary to maintain the population with-out immigration; this assumption is less realistic withhighly vagile taxa(Pulliam, 2000). By definition, then,environmental conditions at the occurrence localitiesconstitute samples from the realized niche. A niche-based model thus represents an approximation of thespecies’ realized niche, in the study area and environ-mental dimensions being considered.

If the realized niche and fundamental niche do notfully coincide, we cannot hope for any modeling al-gorithm to characterize the species’ full fundamentalniche: the necessary information is simply not presentin the occurrence localities. This problem is likely ex-acerbated when occurrence records are drawn from toosmall a geographic area. In a larger study region, how-ever, spatial variation exists in community composi-tion (and, hence, in the resulting biotic interactions)as well as in the environmental conditions available tothe species. Therefore, given sufficient sampling effort,m hice da-m encel .I fun-d lizedn ainu

bil-i tog pre-d y thec senti ar-e ibu-t y bes t tot ich

pecies(Brown and Lomolino, 1998; Root, 1988.ach occurrence locality is simply a latitude–longitair denoting a site where the species has beenerved; such georeferenced occurrence recordserive from specimens in natural history museumserbaria(Ponder et al., 2001; Stockwell and Peters002a). The environmental variables in GIS formatertain to the same geographic area, the studyhich has been partitioned into a grid of pixels. T

ask of a modeling method is to predict environmal suitability for the species as a function of the ginvironmental variables.

A niche-based model represents an approximaf a species’ ecological niche in the examined eonmental dimensions. A species’ fundamental nonsists of the set of all conditions that allow forong-term survival, whereas its realized niche isubset of the fundamental niche that it actually occu

odeling in a study region with a larger geograpxtent is likely to increase the fraction of the funental niche represented by the sample of occurr

ocalities(Peterson and Holt, 2003), and is preferablen practice, however, the departure between theamental niche (a theoretical construct) and reaiche (which can be observed) of a species will remnknown.

Although a niche-based model describes suitaty in ecological space, it is typically projected ineographic space, yielding a geographic area oficted presence for the species. Areas that satisfonditions of a species’ fundamental niche reprets potential distribution, whereas the geographicas it actually inhabits constitute its realized distr

ion. As mentioned above, the realized niche mamaller than the fundamental niche (with respeche environmental variables being modeled), in wh

Page 3: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 233

case the predicted distribution will be smaller than thefull potential distribution. However, to the extent thatthe model accurately portrays the species’ fundamen-tal niche, the projection of the model into geographicspace will represent the species’ potential distribution.

Whether or not a model captures a species’ full nicherequirements, areas of predicted presence will typicallybe larger than the species’ realized distribution. Dueto many possible factors (such as geographic barriersto dispersal, biotic interactions, and human modifica-tion of the environment), few species occupy all areasthat satisfy their niche requirements. If required by theapplication at hand, the species’ realized distributioncan often be estimated from the modeled distributionthrough a series of steps that remove areas that thespecies is known or inferred not to inhabit. For ex-ample, suitable areas that have not been colonized dueto contingent historical factors (e.g., geographic barri-ers) can be excluded(Peterson et al., 1999; Anderson,2003). Similarly, suitable areas not inhabited due to bi-otic interactions (e.g., competition with closely relatedmorphologically similar species) can be identified andremoved from the prediction(Anderson et al., 2002).Finally, when a species’ present-day distribution is de-sired, such as for conservation purposes, a current land-cover classification derived from remotely sensed datacan be used to exclude highly altered habitats (e.g., re-moving deforested areas from the predicted distributionof an obligate-forest species;Anderson and Martınez-Meyer, 2004).

thes ing,s poralc encel ple,a sedw umr dM lda ale,d f them ,u r( pre-c ales;t ikelya ales;a over

influence species distributions at the micro-scale. Thechoice of variables to use for modeling also affectsthe degree to which the model generalizes to regionsoutside the study area or to different environmentalconditions (e.g., other time periods). This is importantfor applications such as invasive-species management(e.g.,Peterson and Robins, 2003) and predicting theimpact of climate change (e.g.,Thomas et al., 2004).Bioclimatic and soil-type variables measure availabil-ity of the fundamental primary resources of light, heat,water and mineral nutrients(Mackey and Linden-mayer, 2001). Their impact, as measured in one studyarea or time frame, should generalize to other situa-tions. On the other hand, variables representing latitudeor elevation will not generalize well; although they arecorrelated with variables that have biophysical impacton the species, those correlations vary over space andtime.

A number of other serious potential pitfalls may af-fect the accuracy of presence-only modeling; some ofthese also apply to presence–absence modeling. First,occurrence localities may be biased. For example, theyare often highly correlated with the nearby presenceof roads, rivers or other access conduits(Reddy andDavalos, 2003). The location of occurrence localitiesmay also exhibit spatial auto-correlation (e.g., if a re-searcher collects specimens from several nearby local-ities in a restricted area). Similarly, sampling intensityand sampling methods often vary widely across thestudy area(Anderson, 2003). In addition, errors maye rip-t pe-c on.F ayb l re-ls t bes cies’f n att bep datam aticm ter-p inga rep-r onlym ceedt use-

There are implicit ecological assumptions inet of environmental variables used for modelo selection of that set requires great care. Temorrespondence should exist between occurrocalities and environmental variables; for exam

current land-cover classification should not be uith occurrence localities that derive from muse

ecords collected over many decades(Anderson anartınez-Meyer, 2004). Secondly, the variables shouffect the species’ distribution at the relevant scetermined by the geographic extent and grain oodeling task(Pearson et al., 2004). For examplesing the terminology ofMackey and Lindenmaye2001), climatic variables such as temperature andipitation are appropriate at global and meso-scopographic variables (e.g., elevation and aspect) lffect species distributions at meso- and topo-scnd land-cover variables like percent canopy c

xist in the occurrence localities, be it due to transcion errors, lack of sufficient geographic detail (esially in older records), or species misidentificatirequently, the number of occurrence localities me too low to estimate the parameters of the mode

iably (Stockwell and Peterson, 2002b). Similarly, theet of available environmental variables may noufficient to describe all the parameters of the speundamental niche that are relevant to its distributiohe grain of the modeling task. Finally, errors mayresent in the variables, perhaps due to errors inanipulation, or due to inaccuracies in the climodels used to generate climatic variables, or inolation of lower-resolution data. In sum, determinnd possibly mitigating the effects of these factorsesent worthy topics of research for all presence-odeling techniques. With these caveats, we pro

o introduce a modeling approach that may prove

Page 4: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

234 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

ful whenever the above concerns are adequately ad-dressed.

1.2. Maxent

Maxent is a general-purpose method for makingpredictions or inferences from incomplete information.Its origins lie in statistical mechanics(Jaynes, 1957),and it remains an active area of research with an AnnualConference, Maximum Entropy and Bayesian Meth-ods, that explores applications in diverse areas suchas astronomy, portfolio optimization, image recon-struction, statistical physics and signal processing. Weintroduce it here as a general approach for presence-only modeling of species distributions, suitable for allexisting applications involving presence-only datasets.The idea of Maxent is to estimate a target probabilitydistribution by finding the probability distribution ofmaximum entropy (i.e., that is most spread out, orclosest to uniform), subject to a set of constraints thatrepresent our incomplete information about the targetdistribution. The information available about the targetdistribution often presents itself as a set of real-valuedvariables, called “features”, and the constraints are thatthe expected value of each feature should match its em-pirical average (average value for a set of sample pointstaken from the target distribution). When Maxent is ap-plied to presence-only species distribution modeling,the pixels of the study area make up the space on whichthe Maxent probability distribution is defined, pixelsw thes bles,e here

aw-b willb hi fol-l therw dya ricald rentv veb to theo n.( isem le toa and

generalized additive models (GLM and GAM), in theabsence of interactions between variables, additivityof the model makes it possible to interpret how eachenvironmental variable relates to suitability(Dudık etal., 2004; Phillips et al., 2004). (5) Over-fitting can beavoided by using�1-regularization (Section2.1.2). (6)Because dependence of the Maxent probability distri-bution on the distribution of occurrence localities is ex-plicit, there is the potential (in future work) to addressthe issue of sampling bias formally, as inZadrozny(2004). (7) The output is continuous, allowing fine dis-tinctions to be made between the modeled suitabilityof different areas. If binary predictions are desired, thisallows great flexibility in the choice of threshold. If theapplication is conservation planning, the fine distinc-tions in predicted relative environmental suitability canbe valuable to reserve planning algorithms. (8) Maxentcould also be applied to species presence/absence databy using a conditional model (as inBerger et al., 1996),as opposed to the unconditional model used here. (9)Maxent is a generative approach, rather than discrim-inative, which can be an inherent advantage when theamount of training data is limited (see Section2.1.4).(10) Maximum entropy modeling is an active area of re-search in statistics and machine learning, and progressin the field as a whole can be readily applied here. (11)As a general-purpose and flexible statistical method,we expect that it can be used for all the applicationsoutlined in Section1 above, and at all scales.

Some drawbacks of the method are: (1) It is not asm erea werm dic-t n-t t ofr y( ssi le-s ,2 bil-i cang on-d Extrac others (fore es int ap-p ose

ith known species occurrence records constituteample points, and the features are climatic varialevation, soil category, vegetation type or otnvironmental variables, and functions thereof.

Maxent offers many advantages, and a few dracks; a comparison with other modeling methodse made in Section2.1.4 after the Maxent approac

s described in detail. The advantages include theowing: (1) It requires only presence data, togeith environmental information for the whole sturea. (2) It can utilize both continuous and categoata, and can incorporate interactions between diffeariables. (3) Efficient deterministic algorithms haeen developed that are guaranteed to convergeptimal (maximum entropy) probability distributio4) The Maxent probability distribution has a concathematical definition, and is therefore amenabnalysis. For example, as with generalized linear

ature a statistical method as GLM or GAM, so thre fewer guidelines for its use in general, and feethods for estimating the amount of error in a pre

ion. Our use of an “unconditional” model (cf. advaage 8) is rare in machine learning. (2) The amounegularization (see Section2.1.2) requires further stude.g., seePhillips et al., 2004), as does its effectivenen avoiding over-fitting compared with other variabelection methods (for alternatives, seeGuisan et al.002). (3) It uses an exponential model for proba

ties, which is not inherently bounded above andive very large predicted values for environmental citions outside the range present in the study area.are is therefore needed when extrapolating to antudy area or to future or past climatic conditionsxample, feature values outside the range of valuhe study area should be “clamped”, or reset to theropriate upper or lower bound). (4) Special-purp

Page 5: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 235

software is required, as Maxent is not available in stan-dard statistical packages.

1.3. Existing approaches for presence-onlymodeling

Many methods have been used for presence-onlymodeling of species distributions, and we only attempthere to give a broad overview of existing methods.Some methods use only presences to derive a model.BIOCLIM (Busby, 1986; Nix, 1986)predicts suitableconditions in a “bioclimatic envelope”, consisting ofa rectilinear region in environmental space represent-ing the range (or some percentage thereof) of observedpresence values in each environmental dimension. Sim-ilarly, DOMAIN (Carpenter et al., 1993)uses a similar-ity metric, where a predicted suitability index is givenby computing the minimum distance in environmentalspace to any presence record.

Other techniques use presence and backgrounddata. General-purpose statistical methods such asgeneralized linear models (GLMs) and generalizedadditive models (GAMs) are commonly used formodeling with presence–absence datasets. Recently,they have been applied to presence-only situations bytaking a random sample of pixels from the study area,known as “background pixels” or “pseudo-absences”,and using them in place of absences during model-ing (Ferrier and Watson, 1996; Ferrier et al., 2002).A sample of the background pixels can be chosenp ithpw m am eta1 doms dic-t ers,1 ledg andn ion;r eirs sedo ixels.Ee ithe outr like

absences. It is similar to principal components analysis,involving a linear transformation of the environmentalspace into orthogonal “marginality” and “specializa-tion” factors. Environmental suitability is then modeledas a Manhattan distance in the transformed space.

As a first step in the evaluation of Maxent, we choseto compare it with GARP, as the latter has recentlyseen extensive use in presence-only studies (Anderson,2003; Joseph and Stockwell, 2002; Peterson and Kluza,2003; Peterson and Robins, 2003; Peterson and Shaw,2003 and references therein). While further stud-ies are needed comparing Maxent with other widelyused methods that have been applied to presence-onlydatasets, such studies are beyond the scope of this pa-per.

2. Methods

2.1. Maxent details

2.1.1. The principleWhen approximating an unknown probability dis-

tribution, the question arises, what is the best approx-imation? E.T. Jaynes gave a general answer to thisquestion: the best approach is to ensure that the ap-proximation satisfies any constraints on the unknowndistribution that we are aware of, and that subject tothose constraints, the distribution should have max-imum entropy(Jaynes, 1957). This is known as them un-ko es vid-uatae

H

w on-n bero pti hatfi ure

urely at random (sometimes excluding sites wresence records,Graham et al., 2004), or from siteshere sampling is known to have occurred or froodel of such sites(Zaniewski et al., 2002; Englerl., 2004). Similarly, a Bayesian approach(Aspinall,992) proposed modeling presence versus a ranample. The Genetic Algorithm for Rule-Set Preion(Stockwell and Noble, 1992; Stockwell and Pet999)uses an artificial-intelligence framework calenetic algorithms. It produces a set of positiveegative rules that together give a binary predictules are favored in the algorithm according to thignificance (compared with random prediction) ban a sample of background pixels and presence pnvironmental-Niche Factor Analysis (ENFA,Hirzelt al., 2002) uses presence localities together wnvironmental data for the entire study area, withequiring a sample of the background to be treated

aximum-entropy principle. For our purposes, thenown probability distribution, which we denoteπ, isver a finite setX, (which we will later interpret as thet of pixels in the study area). We refer to the indial elements ofX as points. The distributionπ assignsnon-negative probabilityπ(x) to each pointx, and

hese probabilities sum to 1. Our approximation ofπ islso a probability distribution, and we denote itπ. Thentropy ofπ is defined as

(π) = −∑x∈X

π(x) ln π(x)

here ln is the natural logarithm. The entropy is negative and is at most the natural log of the numf elements inX. Entropy is a fundamental conce

n information theory: in the paper that originated teld,Shannon (1948)described entropy as “a meas

Page 6: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

236 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

of how much ‘choice’ is involved in the selection of anevent”. Thus a distribution with higher entropy involvesmore choices (i.e., it is less constrained). Therefore,the maximum entropy principle can be interpreted assaying that no unfounded constraints should be placedon π, or alternatively,

The fact that a certain probability distribution maxi-mizes entropy subject to certain constraints represent-ing our incomplete information, is the fundamentalproperty which justifies use of that distribution forinference; it agrees with everything that is known,but carefully avoids assuming anything that is notknown(Jaynes, 1990).

2.1.2. A machine learning perspectiveThe maximum entropy principle has seen recent

interest in the machine learning community, with amajor contribution being the development of effi-cient algorithms for finding the Maxent distribution(seeBerger et al., 1996for an accessible introductionandRatnaparkhi, 1998for a variety of applications anda favorable comparison with decision trees). The ap-proach consists of formalizing the constraints on theunknown probability distributionπ in the followingway. We assume that we have a set of known real-valued functionsf1, . . . , fn onX, known as “features”(which for our application will be environmental vari-ables or functions thereof). We assume further that thei eee txdgf no

du -p hepiπ andu yp bu-t intt

served empirically, i.e.

π[fj] = π[fj], for each featurefj (1)

It turns out that the mathematical theory of convexduality can be used(Della Pietra et al., 1997)to showthat this characterization uniquely determinesπ, andthatπ has an alternative characterization, which can bedescribed as follows. Consider all probability distribu-tions of the form

qλ(x) = eλ·f (x)

Zλ(2)

whereλ is a vector ofn real-valued coefficients or fea-ture weights,f denotes the vector of alln features, andZλ is a normalizing constant that ensures thatqλ sumsto 1. Such distributions are known as Gibbs distribu-tions. Convex duality shows that the Maxent probabil-ity distribution π is exactly equal to the Gibbs prob-ability distribution qλ that maximizes the likelihood(i.e., probability) of them sample points. Equivalently,it minimizes the negative log likelihood of the samplepoints

π[− ln(qλ)] (3)

which can also be written lnZλ − 1m

∑mi=1 λ · f (xi)

and termed the “log loss”.As described so far, Maxent can be prone to over-

fitting the training data. The problem derives from thefact that the empirical feature means will typically note em.T dt canbe

|f ar-a :t theG

π

w ,w argev x-e

nformation we know aboutπ is characterized by thxpectations (averages) of the features underπ. Here,ach featurefj assigns a real valuefj(x) to each poinin X. The expectation of the featurefj underπ isefined as

∑x∈X π(x)fj(x) and denoted byπ[fj]. In

eneral, for any probability distributionp and function, we use the notationp[f ] to denote the expectatiof f underp.

The feature expectationsπ[fj] can be approximatesing a set of sample pointsx1, . . . , xm drawn indeendently fromX (with replacement) according to trobability distributionπ. The empirical average offj

s 1m

∑mi=1 fj(xi), which we can write asπ[fj] (where

˜ is the uniform distribution on the sample points),se as an estimate ofπ[fj]. By the maximum entroprinciple, therefore, we seek the probability distri

ion π of maximum entropy subject to the constrahat each featurefj has the same mean underπ as ob-

qual the true means; they will only approximate thherefore the means underπ should only be restricte

o be close to their empirical values. One way thise done is to relax the constraint in(1) above(Dudıkt al., 2004), replacing it with

π[fj] − π[fj]| ≤ βj, for each featurefj (4)

or some constantsβj. This also changes the dual chcterization, resulting in a form of�1-regularization

he Maxent distribution can now be shown to beibbs distribution that minimizes

˜ [− ln(qλ)] +∑

j

βj|λj| (5)

here the first term is the log loss (as in(3) above)hile the second term penalizes the use of lalues for the weightsλj. Regularization forces Mant to focus on the most important features, and�1-

Page 7: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 237

regularization tends to produce models with few non-zeroλj values(Williams, 1995). Such models are lesslikely to overfit, because they have fewer parameters;as a general rule, the simplest explanation of a phe-nomenon is usually best (the principle of parsimony,Occam’s Razor). Note that�1 regularization has alsobeen applied to GLM/GAMs, and is called the “lasso”in that context (Guisan et al., 2002and referencestherein).

This maximum likelihood formulation suggests anatural approach for finding the Maxent probabilitydistribution: start from the uniform probability distri-bution, for whichλ = (0, . . . , 0), then repeatedly makeadjustments to one or more of the weightsλj in sucha way that the regularized log loss decreases. Regular-ized log loss can be shown to be a convex function of theweights, so no local minima exist, and several convexoptimization methods exist for adjusting the weights ina way that guarantees convergence to the global min-imum (see Section2.2 for the algorithm used in thisstudy).

The above presentation describes an “uncondi-tional” maximum entropy model. “Conditional” mod-els are much more common in the machine learningliterature. The task of a conditional Maxent model isto approximate a joint probability distributionp(x, y)of the inputsx and output labely. Both presence andabsence data would be required to train a conditionalmodel of a species’ distribution, which is why we useunconditional models.

2ded

f co-l ingu ata,a the-o t tom con-s m-p singtb t ther mplep tyd odb Onei xel,

and record 1 if the species is present there, and 0 other-wise. If we denote the response variable asy, then underthis sampling strategy,π is the probability distributionp(x|y = 1). By applying Bayes’ rule, we get thatπ isproportional to probability of occurrence,p(y = 1|x),although with presence-only data we cannot determinethe constant of proportionality.

However, most presence-only datasets derive fromsurveys where the data model is much less well-definedthat the idealized model presented above. The varioussampling biases described in Section1seriously violatethis data model. In practice, then,π (andπ) can be moreconservatively interpreted as a relative index of envi-ronmental suitability, where higher values represent aprediction of better conditions for the species (similarto the relaxed interpretation of GLMs with presence-only data inFerrier et al. (2002)).

The critical step in formulating the ecological modelis defining a suitable set of features. Indeed, the con-straints imposed by the features represent our ecologi-cal assumptions, as we are asserting that they representall the environmental factors that constrain the geo-graphical distribution of the species. We consider fivefeature types, described inDudık et al. (2004). We didnot use the fourth in our present study, as it may requiremore data than were available for our study species.

1. A continuous variablef is itself a “linear feature”.It imposes the constraint onπ that the mean of theenvironmental variable,π[f ], should be close to its

cali-

2ear-loseal tofor

3 ari-hentouldancein-les.

4 -n

.1.3. Application to species distribution modelingAustin (2002)examines three components nee

or statistical modeling of species distributions: an eogical model concerning the ecological theory besed, a data model concerning collection of the dnd a statistical model concerning the statisticalry. Maxent is a statistical model, and to apply iodel species distributions successfully, we must

ider how it relates to the two other modeling coonents (the data model and ecological model). U

he notation of Section2.1.2, we define the setX toe the set of pixels in the study area, and interpreecorded presence localities for the species as saointsx1, . . . , xm taken from an unknown probabiliistributionπ. The data model consists of the methy which the presence localities were collected.

dealized sampling strategy is to pick a random pi

observed value, i.e., its mean on the sample loties.

. The square of a continuous variablef is a “quadraticfeature”. When used with the corresponding linfeature, it imposes the constraint onπ that the variance of the environmental variable should be cto its observed value, since the variance is equπ[f 2] − π[f ]2. It models the species’ tolerancevariation from its optimal conditions.

. The product of two continuous environmental vablesf andg is a “product feature”. Together with tlinear features forf andg, it imposes the constraithat the covariance of those two variables shbe close to its observed value, since the covariis π[fg] − π[f ]π[g]. Product features thereforecorporate interactions between predictor variab

. For a continuous environmental variablef, a “threshold feature” is equal to 1 whenf is above a give

Page 8: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

238 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

threshold, and 0 otherwise. It imposes the followingconstraint: the proportion ofπ that has values forfabove the threshold should be close to the observedproportion. All possible threshold features forf to-gether allow Maxent to model an arbitrary responsecurve of the species tof, as any smooth function canbe approximated by a linear combination of thresh-old functions.

5. For a categorical environmental variable that takeson valuesv1 . . . vk, we usek “binary features”,where theith feature is 1 wherever the variableequalsvi, and 0 otherwise. As with threshold fea-tures, these binary features constrain the proportionof π in each category to be close to the observedproportion.

For each of these feature types, the correspondingregularization parameterβj governs how close the ex-pectation underπ is required to be to the observedvalue; without regularization, they are required to beequal (Section2.1.2). The above list of features typesis not exhaustive, and additional feature types could bederived from the same environmental variables. Thefeatures used should be those that likely constrain thegeographic distribution of the species.

The applicability of the maximum entropy prin-ciple to species distributions is supported by ther-modynamic theories of ecological processes(Aoki,1989; Schneider and Kay, 1994). The second lawof thermodynamics specifies that in systems with-o ctiont f in-fl ts int ciesw me

2ing

m rtic-u zeda th-o orksG net-w referh nce-o x-e t we

restrict our discussion to�1-regularized unconditionalMaxent, as described in Section2.1.2.

Theoretically, Maxent is most similar to GLMs andGAMs. In what follows, we use the terminology ofYeeand Mitchell (1991). A frequently-used GLM is theGuassian logit model, in which the logit of the predictedprobability of occurrence is

α + β1f1(x) + γ1f1(x)2 + . . . + βnfn(x) + γnfn(x)2

(6)

where thefj are environmental variables,α, βj andγj are fitted coefficients, and the logit function is de-fined by logit(p) = ln( p

1−p). The expression in(6) is

the same form as the log (rather than logit) of the prob-ability of the pixel x in a Maxent model with linearand quadratic features. A common method for mod-eling interactions between variables in a GLM is tocreate product variables, which is analogous to the useof product features in Maxent.

In the same way, if probability of occurrence is mod-eled with a GAM using a logit link function, the logitof the predicted probability has the form

g1(f1(x)) + . . . + gn(fn(x))

where thefi are again environmental variables. Thegi

are smooth functions fit by the model, with the amountof smoothing controlled by a width parameter. This isthe same form as the log probability of the pixelx in aM iza-t oth-e peo le isd

cese emt sa enced onlyd truea al.,2 essc x ofe elsa dyr ciesr xent

ut outside influences, processes move in a direhat maximizes entropy. Thus, in the absence ouences other than those included as constrainhe model, the geographic distribution of a speill indeed tend toward the distribution of maximuntropy.

.1.4. Relationships to other modeling approachesMaxent has strong similarities to some exist

ethods for modeling species distributions, in palar, generalized linear models (GLMs), generalidditive models (GAMs) and machine learning meds such as Bayesian approaches and neural netwLMs, GAMs, Bayesian approaches and neuralorks are all broad classes of techniques, and weere only to the way they have been applied to presenly modeling of species distributions. Similarly, Mant generally refers to a class of techniques, bu

.

axent model with threshold features, and regularion has an analogous effect to smoothing on therwise arbitrary functionsgi. In both cases, the shaf the response curve to each environmental variabetermined by the data.

Despite these similarities, important differenxist between GLM/GAMs and Maxent, causing tho make different predictions. When GLM/GAMre used to model probability of occurrence, absata are required. When applied to presence-ata, background pixels must be used instead ofbsences(Ferrier and Watson, 1996; Ferrier et002). However, the interpretation of the result is llear-cut—it must be interpreted as a relative indenvironmental suitability. In contrast, Maxent modprobability distribution over the pixels in the stu

egion, and in no sense are pixels without speecords interpreted as absences. In addition, Ma

Page 9: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 239

is a generative approach, whereas GLM/GAMs arediscriminative, and generative methods may givebetter predictions when the amount of training data issmall (Ng and Jordan, 2001). For a joint probabilitydistributionp(x, y), a discriminative classifier modelsthe posterior probabilityp(y|x) directly, in order tochoose the most likely labely for given inputsx.Typically, a generative classifier models the distribu-tion p(x, y) or p(x|y), and relies on Bayes’ rule todeterminep(y|x). Our unconditional Maxent modelsare generative: we model a distributionp(x|y = 1).

Maxent shares with other machine learning methodsan emphasis on probabilistic reasoning. Regulariza-tion, which penalizes the use of large values of modelparameters, can be interpreted as the use of a Bayesianprior (Williams, 1995). However, Maxent is quite dif-ferent from the particular Bayesian species modelingapproach ofAspinall (1992). The latter approach isknown as “naive Bayes” in the machine learning liter-ature, and assumes independence of the environmentalvariables. This assumption is frequently not met forenvironmental data.

The data requirements of Maxent are closest to thoseof environmental niche factor analysis (ENFA), whichalso uses presence data in combination with environ-mental data for the whole study area (although bothcould use only a random sample of background pixelsto improve running time).

2.2. A Maxent implementation for modelings

form ple-m e off ple-ma si ss.T d toc hea iter-a n logl lue(

n-n rea.B prob-

ability is typically extremely small. Although these“raw” probabilities are an optional output, by defaultour software presents the probability distribution in an-other form that is easier to use and interpret, namelya “cumulative” representation. The value assigned to apixel is the sum of the probabilities of that pixel and allother pixels with equal or lower probability, multipliedby 100 to give a percentage. The cumulative representa-tion can be interpreted as follows: if we resample pixelsaccording to the modeled Maxent probability distribu-tion, thent% of the resampled pixels will have cumula-tive value oft or less. Thus, if the Maxent distributionπ is a close approximation of the probability distribu-tionπ that represents reality, the binary model obtainedby setting a threshold oft will have approximatelyt% omission of test localities and minimum predictedarea among all such models (cf. the “minimal predictedarea” evaluation measure ofEngler et al. (2004)). Thisprovides a theoretical foundation that aids in the selec-tion of a threshold when a binary prediction is required.

Our Maxent implementation has a straightforwardgraphical user interface (Fig. 1). It also has a command-line interface, allowing it to be run automaticallyfrom scripts for batch processing. It is written inJava, so it can be used on all modern comput-ing platforms, and is freely available on the world-wide web athttp://www.cs.princeton.edu/∼schapire/maxent. The user-specified parameters and theirdefault values (which we used in all runs de-scribed below) are: convergence threshold= 10−5,m1 aryf val-u ver-g ep iesb toc ic ofo

n thewo ribedi ob-t g thep te fort . Thev , alsoa

pecies distributions

In order to make the Maxent method availableodeling species geographic distributions, we imented an efficient algorithm together with a choic

eature types that are well suited to the task. Our imentation uses a sequential-update algorithm(Dudık etl., 2004)that iteratively picks a weightλj and adjust

t so as to minimize the resulting regularized log lohe algorithm is deterministic, and is guaranteeonverge to the Maxent probability distribution. Tlgorithm stops when a user-specified number oftions has been performed, or when the change i

oss in an iteration falls below a user-specified vaconvergence), whichever happens first.

As described in Section2.1, Maxent assigns a noegative probability to each pixel in the study aecause these probabilities must sum to 1, each

aximum iterations= 1000, regularization valueβ =0−4, and use of linear, quadratic, product and bin

eatures. The first two parameters are conservativees that allow the algorithm to get close to conence. The small value ofβ has minimal effect on thrediction but avoids potential numerical difficulty keepingλ values from tending to infinity; howhoose the best regularization parameters is a topngoing research (seeDudık et al. (2004)).1

1 A later version of the software, Version 1.8.1, was posted oeb site during review of this paper. It allows eachβj to dependn observed variability in the corresponding feature, as desc

n Dudık et al. (2004). The recommended regularization is nowained by setting the regularization parameter to “auto”, allowinrogram to select an amount of regularization that is appropria

he types of features used and the number of sample localitiesersion of the software used in the present study (Version 1.0vailable on the web site) uses the same valueβ for all βj .

Page 10: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

240 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

Fig. 1. User interface for the Maxent application (Version 1.0) for modeling species geographic distributions using georeferenced occurrencerecords and environmental variables. The interface allows for the use of both continuous and categorical environmental data, and linear, quadratic,and product features. See Section2 for further documentation.

2.3. GARP

In its simplest form, GARP seeks a collection ofrules that together produce a binary prediction. Posi-tive rules predict suitable conditions for pixels satis-fying some set of environmental conditions; similarly,negative rules predict unsuitable conditions. Rules arefavored in the algorithm according to their significance(compared with random prediction) based on a sampleof 1250 presence pixels and 1250 background pixels,sampled with replacement. Some pixels may receive noprediction, if no rule in the rule-set applies to them, andsome may require resolution of conflicting predictions.A genetic algorithm is used to search heuristically fora good rule-set(Stockwell and Noble, 1992).

There is considerable random variability in GARPpredictions, so we implemented the best-subset modelselection procedure as follows, similar toPeterson andShaw (2003)and following the general recommenda-tions ofAnderson et al. (2003). First, we generated 100binary models, with pixels that did not received a pre-diction interpreted as predicted absence, using GARPversion 1.1.3 with default values for its parameters

(0.01 convergence limit, 1000 maximum iterations, andallowing the use of atomic, range, negated range andlogit rules). We then eliminated all models with morethan 5% intrinsic omission (of training localities). If atmost 10 models remained, they then constituted the bestsubset (this happened 4 out of 44 times, yielding bestsubsets with 5, 7, 8 and 9 models). In all other cases,we determined the median value of the predicted areaof the remaining models, and selected the 10 modelswhose predicted area was closest to the median. Fi-nally, we combined the best-subset models to make acomposite GARP prediction, in which the value of apixel was equal to the number of best-subset models inwhich the pixel was predicted present (0–10).

2.4. Data sources

2.4.1. Study speciesThe brown-throated three-toed slothBradypus var-

iegatus (Xenarthra: Bradypodidae) is a large arbo-real mammal (3–6 kg) that is widely distributed in theNeotropics from Honduras to northern Argentina. It isfound primarily in lowland areas but also ranges up to

Page 11: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 241

middle elevations. It has been documented in regionsof deciduous forest, evergreen rainforest and montaneforest, but is absent from xeric areas and non-forestedregions(Anderson and Handley, 2001). Three otherspecies are known in the genus.B. pygmaeus is endemicto Isla Escudo on the Caribbean coast of Panama, andtwo species have geographic distributions restricted toSouth America:B. tridactylus in the Guianan regionandB. torquatus in the Atlantic forests of Brazil. Thelatter two species show geographic distributions thatlikely come into contact (or did historically) with thatof B. variegatus, but areas of sympatry are apparentlyminimal.

Microryzomys minutus (Rodentia: Muridae) is asmall-bodied rodent (10–20 g) known from middle-to-high elevations of the Andes and associated moun-tain chains from Venezuela to Bolivia(Carleton andMusser, 1989). It occupies an elevational range of ap-proximately 1000–4000 m and has been recorded pri-marily in wet montane forests, although sometimes inmesicparamo habitats above treeline (in theparamo-forest ecotone). A congeneric species,M. altissimus,

occupies generally higher elevations in much of this re-gion, but occasionally the two have been found in sym-patry.M. minutus has not been encountered in lowlandregions (below approximately 1000 m). Likewise, it isapparently absent from openparamo far from forests,dry puna habitat above treeline, and obviously frompermanent glaciers on the highest mountain peaks.

These two species hold several characteristics con-ducive to their use in evaluating the utility of Maxent inmodeling species distributions. First of all, they showwidespread geographic distributions with clear ecolog-ical/environmental patterns. Secondly, they have beenthe subject of recent taxonomic revisions by specialists.Finally, those revisions provide a reasonable numberof georeferenced occurrence localities for each speciesbased on confirmed museum specimens (128 forB.variegatus, Anderson and Handley, 2001; 88 for M.minutus, Carleton and Musser, 1989; Fig. 2).

2.4.2. Environmental variablesWe examine the species’ potential distributions in

the Neotropics from southeastern Mexico to Argentina

F 116 re int orted iM

ig. 2. Occurrence records forBradypus variegatus (triangles; left,his study. Data derive from vouchered museum specimens repusser, 1989).

cords) andMicroryzomys minutus (circles; right, 88 records) usedn recent taxonomic revisions (Anderson and Handley, 2001; Carleton and

Page 12: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

242 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

(23.55◦ N – 56.05◦ S, 94.8◦ W – 34.2◦ W), includingthe Caribbean from Cuba southward. The environmen-tal variables fall into three categories: climate, elevationand potential vegetation. All variables are recorded ata pixel size of 0.05◦ by 0.05◦, yielding a 1212× 1592grid, with 648,658 pixels containing data for all vari-ables.

The climatic variables derive from data providedby the Intergovernmental Panel on Climate Change(IPCC;New et al., 1999). The original variables havea resolution of 0.5◦ by 0.5◦, and were produced us-ing thin-plate spline interpolation based on readingstaken at weather stations around the world from 1961to 1990. They describe mean monthly values of variousvariables, which we processed to convert to ascii rastergrid format, as required by GARP and Maxent. Fromthese monthly data, we also created annual variablesby averaging or taking the minimum or maximum asappropriate.

Of the many monthly and annual variables avail-able, we selected the following twelve, based on ourassessment that they would likely have relevance for thespecies being modeled (see alsoPeterson and Cohoon,1999): annual cloud cover; annual diurnal temperaturerange; annual frost frequency; annual vapor pressure;January, April, July, October and annual precipitation;and minimum, maximum and mean annual tempera-ture. We used bilinear interpolation to resample to apixel size of 0.05◦ by 0.05◦. Although this resamplingclearly does not actually increase the resolution of thed ans

thec omUfb ri-a theC partoe unth iono econ-s on.W in av d toa tw ital

data differed slightly from the description and mapin Dinerstein et al. (1995)by having 15 rather than11 major habitat types. The differences arise fromthe addition of a snow/ice/glaciers/rock category,a tundra category and a water category; deletionof the restingas category; splitting of grasslandsavannas and shrublands into temperate versus tropi-cal/subtropical categories; and splitting of temperateforests into temperate coniferous and temperatebroadleaf and mixed forests. The processed climaticvariables (at the original resolution), all resampledvariables, and the occurrence localities are availableathttp://www.cs.princeton.edu/∼schapire/maxent.

2.5. Model building

For each species, we made 10 random partitions ofthe occurrence localities. Each partition was createdby randomly selecting 70% of the occurrence localitiesas training data, with the remaining 30% reserved fortesting the resulting models. Twelve of the original 128localities forB. variegatus lay in coastal areas or onislands that were missing data for one or more of theenvironmental variables, and were excluded from thisstudy. Each partition forB. variegatus thus held 81training localities and 35 test localities, and those forM.minutus held 61 training localities and 27 test localities.

We made 10 random partitions rather than a singleone in order to assess the average behavior of the algo-rithms, and to allow for statistical testing of observedd ankt thef f alla cies’p

itht nlyc ari-a atingp cli-m thew andw n bec ten-t reast uredd ricalv ms

ata, bilinear interpolation is likely more realistic thimply using nearest-neighbor interpolation.

Two other variables were used in addition tolimatic data. An elevation variable was derived frSGS HYDRO1k data(USGS, 2001)by resampling

rom the original finer resolution (1 km pixels) to 0.05◦y 0.05◦. Finally, we used a potential vegetation vable, consisting of a partition of Latin America andaribbean into “major habitat types”, produced asf a terrestrial conservation assessment(Dinersteint al., 1995). This variable does not take into accoistorical (contingent) biogeographic informatr human-induced changes, and represents a rtruction of original vegetation types in the regie used digital data on 15 major habitat types

ector coverage (shape file), which we convertegrid with resolution of 0.05◦ by 0.05◦ coincidenith the climatic and elevational variables. The dig

ifferences in performance (via Wilcoxon signed-rests). In addition, the algorithms were also run onull set of occurrence localities, taking advantage ovailable data to provide best estimates of the speotential distributions for visual interpretation.

The algorithms (Maxent and GARP) were run wwo suites of environmental variables: first with olimatic and elevational data, and then with those vbles plus potential vegetation. The reasons for treotential vegetation separately are three-fold: (1)atic and elevational data are readily available forhole world (whereas potential vegetation is not),e wished to determine whether good models careated using uniformly available data. (2) The poial vegetation coverage is rather subjective, whehe others are objectively produced from measata. (3) Potential vegetation is the only categoariable, and the potential existed for the algorith

Page 13: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 243

to respond differently to categorical versus continuousdata.

2.6. Model evaluation

The first step in evaluating the models produced bythe two algorithms was to verify that both performedsignificantly better than random. For this purpose, wefirst used a (threshold-dependent) binomial test basedon omission and predicted area. However, it does notallow for comparisons between algorithms, as the sig-nificance of the test is highly dependent on predictedarea. Indeed, comparison of the algorithms is madedifficult by the fact that neither gives a binary pre-diction. Hence, we also used two comparative statisti-cal tests that employ very different means to overcomethis complication. First, we employed a new threshold-dependent method of model evaluation, which we termthe “equalized predicted area” test, whose purpose is toanswer the following question: at the commonly usedthresholds representing the extremes of the GARP pre-diction, how does Maxent compare? Second, we used(threshold-independent) receiver operating character-istic (ROC) analysis, which characterizes the perfor-mance of a model at all possible thresholds by a singlenumber, the area under the curve (AUC), which maybe then compared between algorithms.

2.6.1. Threshold-dependent evaluationAfter applying a threshold, model performance can

bi elsnp llt cies.A ent)cc por-t tiona

inew ntlyba -t att fromt aI ine

the probability of having at leastt(1 − r) successes outof t trials, each with probabilitya. Although the prob-abilities for such tests are often approximated usinga χ2 or z test (for large sample sizes), we calculatedexact probabilities for the binomial test usingMinitab(1998).

The binomial test requires that thresholds be used, inorder to convert continuous Maxent and discrete GARPpredictions into binary predictions delimiting the suit-able versus unsuitable areas for the species. A goodgeneral rule for determining an appropriate thresholdwould depend at least on the following factors: thepredicted values assigned to the training localities, thenumber of training localities and the context in whichthe prediction is to be used. Nevertheless, for each runof each algorithm, we simply used the minimum pre-dicted value assigned to any of the training localities asthe threshold. However, for four of the twenty GARPruns, such a threshold would cause the whole studyarea to be predicted (as some training localities fell inpixels not predicted by any of the best-subset models).In those cases, we used the smallest non-zero predictedvalue among the training localities.

Because this omission test is highly sensitive to theproportional predicted area(Anderson et al., 2003), itcannot be used to compare model performance betweentwo algorithms directly. Hence, we propose an “equal-ized predicted area” test, which chooses thresholds sothat the two binary models have the same predictedarea, allowing direct comparison of omission rates.H int pre-d btaina ld fore areaa iledW iva-l inew be-t areai areM m-p bsetma h-o , sof nter-m

e investigated using theextrinsic omission rate, whichs the fraction of the test localities that fall into pixot predicted as suitable for the species, and thepro-ortional predicted area, which is the fraction of ahe pixels that are predicted as suitable for the spe

low omission rate is a necessary (but not sufficiondition for a good model(Anderson et al., 2003). Inontrast, it might be necessary to predict a large proional area to model the species’ potential distribudequately.

A one-tailed binomial test can be used to determhether a model predicts the test localities significaetter than random(Anderson et al., 2002). Say thereret test localities, the omission rate isr, and the propor

ional predicted area isa. The null hypothesis states thhe model is no better than one randomly selectedhe set of all models with proportional predicted area.t is tested using a one-tailed binomial test to determ

ere, composite GARP models have little flexibilityhe choice of threshold. On the other hand, Maxentictions, being continuous, can be thresholded to ony desired predicted area. So, we set a threshoach Maxent prediction to give the same predicteds the corresponding GARP prediction. A two-tailcoxon signed-rank test (a non-parametric equ

ent of a pairedt-test) can then be used to determhether the observed difference in omission rates

ween the two algorithms at the given predicteds statistically significant. We used this test to comp

axent predictions with two thresholds of the coosite GARP predictions, namely 1 (any best-suodel) and 10 (all best-subset models; seeAndersonnd Martınez-Meyer, 2004). These are natural threslds for GARP that are frequently used in practice

or reasons of conciseness, we do not consider iediate thresholds. For some data partitions forB. var-

Page 14: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

244 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

iegatus, the maximum value of the composite GARPmodel was less than 10 (because fewer than 10 GARPmodels met the best-subset criteria), in which case weused the maximum predicted value instead of 10.

The thresholds and resulting predicted areas usedabove are not necessarily optimal for either algorithm.Rather, they were chosen to facilitate statistical analy-sis of the algorithms. Note that we are not suggestingthat GARP should or need be used in general to select athreshold for Maxent predictions when binary predic-tions are desired. Rather, we took advantage of the flex-ibility of Maxent’s continuous outputs to allow directcomparisons of omission rates between it and GARP.Determining optimal thresholds for Maxent models re-mains a topic of future research. In practice, thresholdswould currently be chosen by hand, since no general-purpose thresholding rule has been developed yet foreither algorithm (but see Section2.2for theoretical ex-pectations for Maxent).

2.6.2. Threshold-independent evaluationA second common approach compares model

performance using receiver operating characteris-tic (ROC) curves. ROC analysis was developed insignal processing and is widely used in clinicalmedicine(Hanley and McNeil, 1982, 1983; Zweig andCampbell, 1993). The main advantage of ROC analy-sis is that area under the ROC curve (AUC) providesa single measure of model performance, independentof any particular choice of threshold. ROC analysish tionp ta lso l,1

ralt gh idera itherp aluet pliedt bels{s os-it ifiedN ate,a antity

1–specificity is also known as the false positive rate,and represents commission error.

The ROC curve is obtained by plotting sensitivityon they axis and 1–specificity on thex axis for all pos-sible thresholds. For a continuous prediction, the ROCcurve typically contains one point for each test instance,while for a discrete prediction, there will typically beone point for each of the different predicted values, inaddition to the origin. The area under the curve (AUC)is usually determined by connecting the points withstraight lines; this is called the trapezoid method (asopposed to parametric methods, which fit a curve tothe points). The AUC has an intuitive interpretation,namely the probability that a random positive instanceand a random negative instance are correctly orderedby the classifier. This interpretation indicates that theAUC is not sensitive to the relative numbers of positiveand negative instances in the test data set.

When only presence data are available, it wouldappear that ROC curves are inapplicable, since with-out absences, there seems to be no source of negativeinstances with which to measure specificity (see Sec-tion 1.1, and the discussion of real and apparent com-mission error inAnderson et al. (2003)andKarl et al.(2002)). However, we can avoid this problem by con-sidering a different classification problem, namely, thetask of distinguishing presence from random, ratherthan presence from absence. More formally, for eachpixel x in the study area, we define a negative instancexrandom. Similarly, for each pixelx that is includedi de-fi -b xelsc thel ep ces( e in-s ran-d tiona f-fi bea s ofR usingp anal-yb hodso top

as recently been applied to a variety of classificaroblems in machine learning (for exampleProvosnd Fawcett, 1997) and in the evaluation of modef species distributions(Elith, 2002; Fielding and Bel997).

Here we will first describe ROC curves in geneerms, followingFawcett (2003), before demonstratinow they apply to presence-only modeling. Consclassification problem, where each instance is eositive or negative. A classifier assigns a real v

o each instance, to which a threshold may be apo predict class membership; for clarity we use laY,N} for the class predictions. Thesensitivity of a clas-ifier for a particular threshold is the fraction of all ptive instances that are classifiedY, while specificity ishe fraction of all negative instances that are class. Sensitivity is also known as the true positive rnd represents absence of omission error. The qu

n the species’ true geographic distribution, wene a positive instancexpresence. A species distriution model can then make predictions for the piorresponding to these instances, without seeingabelsrandom or presence. Thus, we can makredictions for both a sample of positive instanthe presence localities) and a sample of negativtances (background pixels chosen uniformly atom, or according to another background distribus described in Section1.3). Together these are sucient to define an ROC curve, which can thennalyzed with all the standard statistical methodOC analysis. This process can be interpreted asseudo-absence in place of absence in the ROCsis, as is done inWiley et al. (2003). However, weelieve that the observation that the statistical metf ROC analysis can be applied without prejudiceresence/random data is new.

Page 15: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 245

The above treatment differs from the use of ROCanalysis on presence/absence data in one important re-spect: with presence-only data, the maximum achiev-able AUC is less than 1(Wiley et al., 2003). If thespecies’ distribution covers a fractiona of the pixels,then the maximum achievable AUC can be shown tobe exactly 1− a/2. Unfortunately, we typically do notknow the value ofa, so we cannot say how close tooptimal a given AUC value is. Nevertheless, we canstill use standard methods to determine statistical sig-nificance of the AUC, and to distinguish between thepredictive power of different classifiers. We note thatrandom prediction still corresponds to an AUC of 0.5.

We used AccuROC Version 2.5(Vida, 1993)for theROC analyses. AccuROC uses the trapezoid method, asdescribed above. To test if a prediction is significantlybetter than random, AccuROC uses a ties-correctedMann–Whitney-U statistic, which it approximates us-ing az-statistic. It uses a non-parametric test(DeLonget al., 1988)to determine whether one prediction issignificantly better than another when using correlatedsamples (i.e., with both predictions evaluated on thesame test instances), and reports the result as aχ2

statistic and correspondingp value. For each ROCanalysis, we used all the test localities for the speciesas presence instances, and a sample of 10,000 pix-els drawn randomly from the study region as randominstances.

3

3

3ons

t plet nt rb cies( de uitei ax-e holdrat2 l

significance, omission rates were consistently low orzero, never exceeding 17% (Table 1).

The results of the equalized predicted area test dif-fered between the species (Tables 2 and 3). For B.variegatus, the omission rates of the two algorithmswere lower for Maxent in 16 cases, equal in 15 cases,and lower for GARP in 9 cases. However, two-tailedWilcoxon signed-rank tests did not reveal a significantdifference in median omission rates for either thresh-old or either variable suite (p = 0.178 and 0.314 forthresholds of 1 and 10, respectively, with climatic andelevational variables;p = 0.371 and 0.155 for thresh-olds of 1 and 10, respectively, with addition of the po-tential vegetation variable).

Maxent almost always had equal or lower omissionthan GARP forM. minutus (19 out of 20 models). Thedifference in median omission rates was significant atboth thresholds on runs with climatic and elevationalvariables (p = 0.036 andp = 0.014 for thresholds of 1and 10, respectively; two-tailed Wilcoxon signed-ranktest). When the potential vegetation variable was added,the difference in median omission rates was highly sig-nificant for a threshold of 10, but not for a threshold of1 (p = 0.009 and 0.345, respectively), largely becauseMaxent had greater omission than before on data par-tition 2, discussed below (Section4.3).

3.1.2. Threshold-independent testsFor all partitions of the occurrence data, the

AUC values (calculated on extrinsic test data) wereh ndv r-t nif-in -ov va-tw

uldi tiona l forMM e-d -t RPt eA

. Results

.1. Quantitative results

.1.1. Threshold-dependent omission testsBoth algorithms consistently produced predicti

hat were better than random. Using the simhreshold rule (Section2.6.1), the binomial omissioest was highly significant (p < 0.001, one-tailed) footh algorithms on all data partitions for each speseeTable 1for details on runs with the climatic anlevational variables; results on the variable s

ncluding potential vegetation were similar). For Mnt, the thresholds determined by the simple thresule ranged from 0.022 to 2.564 forB. variegatusnd 0.543 to 3.822 forM. minutus. For GARP, the

hresholds ranged from 1 to 7 forB. variegatus andto 10 for M. minutus. In addition to statistica

ighly statistically significant for both algorithms aariable suites (p < 0.0001), again indicating bettehan-random predictions. The Maxent AUC was sigcantly greater than that of GARP (p < 0.05; two-tailedon-parametric test ofDeLong et al., 1988; see Methds) in all data partitions exceptB. variegatus-4 andB.ariegatus-8 for models using the climatic and eleional variables, andB. variegatus-8 andM. minutus-2hen potential vegetation was added (Table 4).Addition of the potential vegetation variable sho

ncrease the AUC, since there is more informavailable to the classifier. This was true in generaaxent and in some cases for GARP (Table 4). Foraxent onB. variegatus, the overall increase in mian AUC approached significance (p = 0.093, one

ailed Wilcoxon signed rank test). However, for GAhe test was not significant (p = 0.949); indeed, thUC generally decreased. ForM. minutus, the AUC

Page 16: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

246 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

Table 1Results of the threshold-dependent binomial tests of omission

Data partition Maxent GARP

Area Omission rate Area Omission rate

Bradypus variegatus-1 0.51 0.03 0.41 0.11B. variegatus-2 0.66 0 0.56 0.06B. variegatus-3 0.80 0 0.61 0.03B. variegatus-4 0.42 0.17 0.51 0B. variegatus-5 0.75 0.03 0.57 0.06B. variegatus-6 0.62 0 0.54 0B. variegatus-7 0.59 0 0.53 0B. variegatus-8 0.59 0.06 0.62 0B. variegatus-9 0.69 0 0.66 0B. variegatus-10 0.62 0.06 0.44 0.06Average 0.626 0.034 0.545 0.031

Microryzomys minutus-1 0.03 0.11 0.06 0.15M. minutus-2 0.04 0.11 0.06 0.15M. minutus-3 0.03 0.11 0.07 0.15M. minutus-4 0.04 0.04 0.08 0.04M. minutus-5 0.03 0.04 0.06 0.15M. minutus-6 0.04 0.15 0.06 0.11M. minutus-7 0.05 0 0.09 0.07M. minutus-8 0.04 0.04 0.10 0M. minutus-9 0.03 0.07 0.10 0M. minutus-10 0.03 0.11 0.08 0.07Average 0.035 0.078 0.075 0.089

Area (proportion of the study area predicted) and extrinsic omission rate (proportion of the test localities falling outside the prediction) are givenfor each of 10 random data partitions for Maxent and GARP. For bothB. variegatus andM. minutus, the binomial test was highly significant forall partitions (p < 0.001, one-tailed). Models were derived using the climatic and elevational variables for each random partition of occurrencerecords, and area and omission rates were calculated using simple threshold rules based on the training localities (see Section2). The resultsfor models made with the addition of the potential vegetation variable were similar but are not shown here (see Section3). The omission ratesshould not be compared between algorithms, as they are strongly affected by differences in predicted area. The simple threshold rule used herefor Maxent is not recommended for general use in practice; in this case, it gives too high a threshold for Maxent onB. variegatus-4, causing ahigh omission rate, and too low a threshold onB. variegatus-3, resulting in too much predicted area.

usually increased for both Maxent and GARP, with re-sults significant or nearly so for both (p = 0.051 and0.033, respectively, although performance was poorerfor Maxent on data partition 2; see Section4.3). Whilethe differences in AUC values are very small, thechanges may still be meaningful biologically. For ex-ample, the largest visual effect of adding potential veg-etation for Maxent was to (correctly) exclude somenon-forested areas from the prediction forB. varie-gatus (Section3.2.2). However, because of the smallgeographic extent of those areas, the effect on AUCvalues was small.

The ROC curves for the two algorithms showed dis-tinct patterns, evident in the curves for the first randomdata partition for each species, for models made usingclimatic and elevational variables (Fig. 3). In the case

of M. minutus, the performance of Maxent was betteracross the entire spectrum: for any given omission rate,Maxent achieved that rate with a lower false positiverate (1–specificity, which is almost identical to propor-tional predicted area, see Section2). The results withB.variegatus were more complex. There is a point wherethe ROC curves for the two algorithms intersect, cor-responding to a sensitivity of 0.83 (omission rate of0.17) and a false positive rate of 0.27. At that point,therefore, the performance of the two algorithms wasthe same. A small component of the higher AUC forMaxent was due to the lower omission rate it achievedto the right of that point. However, most of Maxent’shigher AUC occurred to the left of that point, wheremany test localities fell in small areas very stronglypredicted by Maxent. In contrast, GARP did not differ-

Page 17: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 247

Table 2Results of the equalized predicted area tests of omission forB. variegatus andM. minutus produced with Maxent and GARP using the climaticand elevational variables

Data partition GARP threshold = 1 GARP threshold = 10

Area Maxent omission GARP omission Area Maxent omission GARP omission

B. variegatus-1 0.59 0.03 0.03 0.27 0.17 0.17B. variegatus-2 0.56 0.03 0.06 0.34 0.11 0.17B. variegatus-3 0.61 0.06 0.03 0.33 0.09 0.17B. variegatus-4 0.63 0.14 0 0.40 0.17 0.06B. variegatus-5 0.67 0.03 0 0.36 0.11 0.26B. variegatus-6 0.69 0 0 0.29 0.14 0.11B. variegatus-7 0.74 0 0 0.31 0.03 0.14B. variegatus-8 0.69 0 0 0.33 0.17 0.11B. variegatus-9 0.72 0 0 0.36 0.06 0.11B. variegatus-10 0.61 0.06 0.03 0.34 0.14 0.17Average 0.652 0.034 0.014 0.333 0.120 0.149

M. minutus-1 0.12 0 0.07 0.06 0.04 0.15M. minutus-2 0.10 0 0.07 0.06 0.04 0.15M. minutus-3 0.16 0 0.04 0.07 0.07 0.15M. minutus-4 0.17 0 0.04 0.08 0.04 0.04M. minutus-5 0.12 0 0.07 0.06 0 0.15M. minutus-6 0.12 0 0.04 0.06 0.07 0.11M. minutus-7 0.16 0 0 0.09 0 0.07M. minutus-8 0.17 0 0 0.09 0 0M. minutus-9 0.17 0 0 0.09 0 0.04M. minutus-10 0.18 0 0 0.08 0 0.07Average 0.146 0 0.033 0.073 0.026 0.093

Area (proportion of the study area predicted by GARP with the indicated threshold) and extrinsic omission rate (proportion of test localitiesfalling outside the prediction) for each algorithm are given for each random partition of occurrence records under two threshold scenarios.Thresholds were set for the extremes of the GARP predictions: any GARP model predicting presence (GARP threshold = 1) and all 10 GARPmodels predicting presence (GARP threshold = 10). To allow for direct comparison of omission rates between the algorithms, thresholds werethen set for each Maxent model to yield a binary prediction with the same area as the corresponding GARP prediction.

entiate environmental quality to the left of that point,as all pixels there were predicted by all 10 of the best-subset models. Results for other data partitions wereroughly similar (not shown).

3.2. Visual interpretation

The output format differs dramatically betweenMaxent and GARP, so care must be taken when makingcomparisons between them. Maxent produces a con-tinuous prediction with values ranging from 0 to 100,whereas GARP, as used here, yields a discrete compos-ite prediction with integer values from 0 to 10. Visualinspection of the Maxent predictions for both speciesindicated that a low threshold was appropriate, and ingeneral terms, pixels with predicted values of at least1 may be interpreted as a reasonable approximation ofthe species’ potential distribution. This is in concor-

dance with the thresholds obtained in Section3.1.1,and the theoretical expectation that the omission ratefor a thresholded cumulative prediction will be approx-imately equal to the threshold value (see Section2.2).For GARP, visual inspection suggested a higher thresh-old in the range 5–10 was appropriate for approximat-ing the species’ potential distribution. In the followingsections, we interpret the models under this framework.

3.2.1. Models derived from climatic andelevational variables

When using the full set of occurrence localities foreach species, the two algorithms produced broadlysimilar predictions for the potential geographic distri-bution ofB. variegatus (Fig. 4). For this species, bothalgorithms indicated suitable conditions throughoutmost of lowland Central America, wet lowland areasof northwestern South America, most of the Amazon

Page 18: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

248 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

Table 3Results of the equalized predicted area tests of omission forB. variegatus andM. minutus produced with Maxent and GARP using the climatic,elevational and potential vegetation variables

Data partition GARP threshold = 1 GARP threshold = 10

Area Maxent omission GARP omission Area Maxent omission GARP omission

B. variegatus-1 0.57 0.03 0.03 0.28 0.20 0.23B. variegatus-2 0.58 0 0.06 0.29 0.11 0.29B. variegatus-3 0.67 0 0.03 0.33 0.14 0.11B. variegatus-4 0.67 0 0 0.42 0.06 0.11B. variegatus-5 0.67 0.03 0.03 0.36 0.14 0.17B. variegatus-6 0.71 0 0 0.28 0.17 0.17B. variegatus-7 0.74 0 0 0.33 0.06 0.20B. variegatus-8 0.67 0 0 0.34 0.20 0.17B. variegatus-9 0.78 0 0 0.39 0.03 0.06B. variegatus-10 0.67 0 0 0.36 0.14 0.17Average 0.672 0.006 0.014 0.337 0.126 0.169

M. minutus-1 0.12 0 0.04 0.06 0.04 0.15M. minutus-2 0.11 0.11 0.04 0.06 0.15 0.19M. minutus-3 0.13 0 0.04 0.07 0.04 0.15M. minutus-4 0.15 0 0.04 0.08 0.04 0.04M. minutus-5 0.12 0 0.07 0.06 0 0.15M. minutus-6 0.14 0 0 0.05 0.04 0.11M. minutus-7 0.16 0 0.04 0.08 0 0.07M. minutus-8 0.16 0 0 0.08 0 0.04M. minutus-9 0.16 0 0 0.08 0 0.07M. minutus-10 0.17 0 0 0.07 0 0.04Average 0.142 0.011 0.026 0.070 0.030 0.100

Area (proportion of the study area predicted by GARP with the indicated threshold) and extrinsic omission rate (proportion of test localitiesfalling outside the prediction) for each algorithm are given for each random partition of occurrence records under two threshold scenarios.Thresholds were set for the extremes of the GARP predictions: any GARP model predicting presence (GARP threshold = 1) and all 10 GARPmodels predicting presence (GARP threshold = 10). To allow for direct comparison of omission rates between the algorithms, thresholds werethen set for each Maxent model to yield a binary prediction with the same area as the corresponding GARP prediction.

basin, large areas of Atlantic forests in southeasternBrazil, and most large Caribbean islands in the studyarea. The species was generally predicted absent fromhigh montane areas, temperate areas in southern SouthAmerica, and non-forested areas of central Brazil (e.g.,cerrado). The algorithms differed in their predictionsfor non-forested savannas in northern South America.The composite GARP model indicated the species’potential presence there, but Maxent excluded somenon-forested savannas in Venezuela (llanos) and theGuianas.

In contrast, the algorithms yielded quite differentpredictions forM. minutus (Fig. 4). Maxent indicatedsuitable conditions for the species in the northern andcentral Andes (and associated mountain chains) fromBolivia and northern Chile to northern Colombia andVenezuela. It also included highland areas in Jamaica,the Dominican Republic and Haiti, as well as very

small highland areas in Brazil, southeastern Mexico,Costa Rica and Panama. In contrast, GARP predicteda much more extensive potential distribution for thespecies. In addition to a broad highland prediction inthe northern and central Andes and the Caribbean, thecomposite GARP prediction also included areas of thesouthern Andes as well as extensive highland regionsin Mesoamerica, the Guianan-shield region andsoutheastern Brazil. The prediction in the Brazilianhighlands extended into adjacent lowland areas ofthat country as well as into Uruguay and northernArgentina.

3.2.2. Addition of potential vegetation variableThe two algorithms responded differently to the in-

clusion of the potential vegetation variable (Fig. 5).The Maxent prediction with potential vegetation forB.variegatus was generally similar to the original one,

Page 19: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 249

Table 4Results of threshold-independent receiver operating characteristic (ROC) analyses forB. variegatus andM. minutus produced with Maxent andGARP using the climatic and elevational variables (left) and climatic, elevational and potential vegetation variables (right)

Data partition Without potential vegetation With potential vegetation

Maxent AUC GARP AUC p Maxent AUC GARP AUC p

B. variegatus-1 0.889 0.807 <0.01 0.879 0.793 <0.01B. variegatus-2 0.892 0.765 <0.01 0.899 0.769 <0.01B. variegatus-3 0.872 0.779 0.01 0.887 0.790 <0.01B. variegatus-4 0.819 0.789 0.51 0.858 0.757 <0.01B. variegatus-5 0.868 0.740 <0.01 0.885 0.753 <0.01B. variegatus-6 0.881 0.818 <0.01 0.868 0.812 0.03B. variegatus-7 0.902 0.812 <0.01 0.919 0.784 <0.01B. variegatus-8 0.839 0.807 0.34 0.829 0.786 0.13B. variegatus-9 0.903 0.794 <0.01 0.897 0.784 <0.01B. variegatus-10 0.866 0.779 0.01 0.879 0.769 <0.01Average 0.873 0.789 0.880 0.780

M. minutus-1 0.985 0.926 0.01 0.986 0.946 0.02M. minutus-2 0.987 0.931 0.02 0.932 0.943 0.75M. minutus-3 0.985 0.938 <0.01 0.987 0.939 <0.01M. minutus-4 0.983 0.938 <0.01 0.984 0.941 <0.01M. minutus-5 0.988 0.926 0.02 0.990 0.926 0.01M. minutus-6 0.983 0.947 0.05 0.986 0.966 <0.01M. minutus-7 0.989 0.950 <0.01 0.988 0.936 <0.01M. minutus-8 0.988 0.954 <0.01 0.989 0.956 <0.01M. minutus-9 0.989 0.952 <0.01 0.990 0.955 <0.01M. minutus-10 0.985 0.955 <0.01 0.987 0.961 <0.01Average 0.986 0.942 0.982 0.947

For each random partition of occurrence records, the area under the ROC curve (AUC) is given for Maxent and GARP, as well as the probabilityof the observed difference in the AUC values between the two algorithms (under a null hypothesis that the true AUCs are equal). All AUC valuesfor both algorithms were significantly better than a random prediction (p < 0.0001; individualp values not shown). AUC values are given tothree decimal places to reveal small changes under addition of the potential vegetation coverage.

but now indicated unsuitable conditions for the speciesin the llanos of Colombia and Venezuela and in othernon-forested areas in Bolivia and Brazil. On the con-trary, the composite GARP prediction with potentialvegetation included was very similar to the originalprediction, still indicating suitable environmental con-ditions for the species in non-forested areas of Colom-bia, Venezuela, Guyana, Brazil, Paraguay and Bolivia.

Addition of the potential vegetation variablechanged the Maxent and GARP predictions forM.minutus only minimally. The Maxent prediction withpotential vegetation differed principally by a sharpreduction in the area predicted for the species alongthe western slopes of the Andes in central and southernPeru and in northern Chile. The composite GARPprediction with potential vegetation differed from theoriginal one mainly by indicating a smaller area ofsuitable environmental conditions for the species incentral Chile and in central-eastern Argentina.

4. Discussion and conclusions

4.1. Statistical tests

Both algorithms consistently performed signif-icantly better than random, and Maxent frequentlyachieved better results than GARP. Threshold-dependent binomial tests (Table 1) showed low omis-sion of test localities and significant predictions forboth algorithms across the board. The equalized pre-dicted area test generally indicated better performancefor Maxent onM. minutus, but the test did not detect asignificant difference between the two algorithms forB. variegatus (Tables 2 and 3). Threshold-independentROC analysis also showed significantly better-than-random performance for both algorithms. The areaunder the ROC curve (AUC) was significantly higherfor Maxent on almost all data partitions for both species(Table 4). Use of the categorical potential vegetation

Page 20: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

250 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

Fig. 3. Extrinsic receiver operating characteristic (ROC) curves forMaxent and GARP on the first random partition of occurrence recordsof B. variegatus (left) andM. minutus (right). Models were producedusing the climatic and elevational variables. Sensitivity equals theproportion of test localities correctly predicted present (1–extrinsicomission rate). The quantity (1–specificity) equals the proportion ofall map pixels predicted to have suitable conditions for the species.Note that both algorithms perform much better than random, andthat Maxent is generally superior to GARP; seeTable 4for results ofstatistical analyses.B. variegatus is a wider-ranging species thanM.minutus, so it has a smaller maximum achievable AUC in these ROCanalyses performed without true absence data (see Section2.6.2).The curves therefore do not necessarily imply that the algorithms areperforming better onM. minutus.

variable (in addition to the continuous climatic andelevational variables) generally improved performancefor both algorithms onM. minutus and for Maxenton B. variegatus, but the changes had limited statis-tical significance, likely due to the small amount ofdata.

4.2. Biological interpretations

Both algorithms produced reasonable predictions ofthe potential distribution forB. variegatus. The areaspredicted by 5–10 GARP models were similar geo-graphically to those areas predicted with a value ofat least 1 (out of 100) for Maxent. Although muchresearch addressing the issue of operationally deter-mining an optimal threshold remains for both algo-rithms, these thresholds produce good maps of thespecies’ potential distributions (areas of suitable en-vironmental conditions). In particular, the models per-form far superior to the shaded outline maps availablein standard field guides, (e.g.,Eisenberg and Redford,1999; Emmons, 1997), and in digital compilations ofspecies ranges designed for use in conservation biologyand macroecological studies(Patterson et al., 2003).Most strikingly, the models correctly indicate an ex-pansive region of unsuitable environmental conditionsfor B. variegatus in the non-forestedcerrado of Brazil,whereas the shaded outline maps indicate continuousdistribution for the species from Amazonian foreststo coastal Atlantic forests. Although GARP has thecapacity to consider categorical variables, the inclu-sion of the potential vegetation variable did not rectifythe deficiencies seen in the original composite GARPprediction forB. variegatus. In contrast, Maxent suc-cessfully integrated this additional information. Thisis most evident in close-up images inFig. 4.2, whereGARP (incorrectly) predicted suitable conditions fortV

forB hepu hint laya on-t andB ndi-t notb arri-e lowp en-t thata andC andM and

he species in the non-forestedllanos of Colombia andenezuela.

In contrast to the generally similar predictions. variegatus, different deficiencies were evident in tredictions produced by the two algorithms forM. min-tus. Maxent produced an impressive prediction withe species’ known range. The Maxent predictionlmost entirely within the Andes. However, wet m

ane forests also exist in Mesoamerican, Guiananrazilian highlands. Those areas likely contain co

ions suitable for the species, but it apparently haseen able to colonize them due to geographic brs. We investigated possible reasons for Maxent’srediction in these areas, by examining environm

al characteristics of six classical highland sitesre well sampled for small mammals: Monteverdeerro de la Muerte in Costa Rica, Auyan-tepuiount Duida in southern Venezuela, and Itatiaia

Page 21: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 251

Fig. 4. Predicted potential geographic distributions forB. variegatus (top) andM. minutus (bottom) made using all occurrence records and theclimatic and elevational variables. Results are given for Maxent (left) and GARP (right). Four colors are used to indicate the strength of theprediction for each individual map pixel. Maxent produces a continuous prediction with values ranging from 0 to 100; the values are depictedhere using white = [0,1); pale grey = [1,34); dark grey = [34,66); black = [66,100]. The best-subsets selection procedure employed here forGARP yields a discrete prediction with integer values from 0 to 10, depicted here using white = 0; pale grey = 1–4; dark grey = 5–9; black= 10. The strength of the predictions thus cannot be compared directly. All areas with a Maxent prediction of 1 or greater likely representsuitable environmental conditions for the species; in contrast, areas with a GARP prediction of 5–10 probably indicate suitable conditions (seeSection3.2).

Page 22: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

252 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

Fig. 5. Predicted potential geographic distributions forB. variegatus (top) andM. minutus (bottom) made using all occurrence records andclimatic, elevational and potential vegetation variables. Results are given for Maxent (left) and GARP (right). Four colors are used to indicatethe strength of the prediction for each map pixel. Maxent produces a continuous prediction with values ranging from 0 to 100; the values aredepicted here using white = [0,1); pale grey = [1,34); dark grey = [34,66); black = [66,100]. The best-subsets selection procedure employedhere for GARP yields a discrete composite prediction with integer values from 0 to 10, depicted here using white = 0; pale grey = 1–4; darkgrey = 5–9; black = 10. The strength of the predictions thus cannot be compared directly. All areas with a Maxent prediction of 1 or greaterlikely represent suitable environmental conditions for the species; in contrast, areas with a GARP prediction of 5–10 probably indicate suitableconditions (see Section3.2).

Page 23: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 253

Caparao in Brazil (Gouvea and deAvila Pires, 1999;Hershkovitz, 1998; McPherson, 1985; Paynter, 1982;Tate, 1939). For the first four sites, July precipitationvalues were at least 5 standard deviations higher thanthe mean of theM. minutus occurrence localities. Inaddition, the annual maximum temperature at thosesites was 1.37–3.23 standard deviations higher thanthe mean of the occurrence localities. In contrast, theBrazilian sites had July precipitation within the samerange as the occurrence localities, but the January pre-cipitation was much higher for both (by 3.12 and 1.84standard deviations, respectively), and maximum tem-perature was much higher for Caparao (by 1.95 stan-dard deviations). Thus, Maxent’s behavior given thedata provided is correct and reasonable. However, de-spite the differences in some environmental variables,the forests in the six sites are probably functionallysimilar to those inhabited byM. minutus. This situationhighlights the difficulty of extrapolating from a species’realized distribution, and emphasizes that the variablesused should be chosen with care. ForM. minutus, betterextrapolation might be achieved using derived climaticparameters that are more relevant for the species, forexample, precipitation of wettest month(Busby, 1986),rather than values for specific months (see Section1.1).Quite the opposite to the Maxent predictions, extensiveareas of potential distribution indicated in Mesoameri-can, Guianan and Brazilian highland regions by GARPsurely overestimate the extent of suitable environmen-t thev d-e thes

4

p omed arti-t andf en-t thes icalm icald ion.H ss-l ege-t with

forested regions below;Carleton and Musser, 1989).For data partition 2, both of those latter two locali-ties fell in the test dataset (i.e., not the training set).Accordingly, Maxent’s prediction strongly avoided the“montane grasslands” class. The pixels correspondingto those two test localities thus had very low predictedvalue, bringing down the AUC for that partition. This isan artifact of under-regularization. More regularizationfor categorical features would allow some prediction inclasses with no presence records, especially if the to-tal number of presence records is small (Haffner et al.,in preparation, and implemented in later versions ofMaxent).

The behavior of Maxent is in fact reasonable in thiscase, as the training data do not cover the range ofvegetation classes that the species can inhabit. Further-more, it is better than the statistics would suggest, asthe occurrence localities falling in montane grasslandsboth lie on the border with pixels of one of the other twoclasses inhabited by the species, and are therefore closeto highly predicted areas. Their omission should thusbe penalized less than other test localities(Fielding andBell, 1997). Indeed, smoothing the prediction by twiceapplying a simple 3× 3 smoothing convolution withthe following weights as a low-pass filter(Jensen, 1996)

0.05 0.05 0.05

0.05 0.6 0.05

0.05 0.05 0.05

i isi ndc uchp atiale duet onsr

4

ges( em s. Int best-s son-a (seea ll 10

al conditions for the species there. In particular,ast majority of the pixels predicted by all 10 mols in southeastern Brazil lie below 1000 m, wherepecies’ presence is quite unlikely.

.3. Spatial context of errors

The performance of Maxent onM. minutus when theotential vegetation variable was used warrants siscussion. The AUC for the second random data p

ion was notably lower than for the other partitions,or the model run on the same partition without potial vegetation. Most of the occurrence localities forpecies are contained in the “tropical and subtropoist broadleaf forest” and “tropical and subtropry broadleaf forest” classes of potential vegetatowever, two of them fall within the “montane gra

ands” class (the species indeed can inhabit this vation type in mosaic habitats along the ecotone

ncreases the AUC to 0.98 for that partition, whichn line with those of the other random partitions, aauses very little visible change to the prediction. Sost-processing may be of general utility when sprror is known to exist in the data, for example

o errors in site localities or boundaries of polygepresenting categorical variables.

.4. Advantages of Maxent

Maxent exhibits a number of inherent advantasee Section1). In addition, visual inspection of thodels indicates two further possible advantage

hese analyses, areas predicted by 5–10 of theubset GARP models generally showed a reable prediction of the species’ geographic rangesbove). Most of those areas were predicted by a

Page 24: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

254 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

Page 25: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 255

models. In contrast, the Maxent prediction is continu-ous, and within those areas suitable for each species, itfurther distinguishes between those with a marginally(but sufficiently) strong prediction versus those with in-creasingly stronger predictions. This represents an im-portant advantage for Maxent, and explains part of itsconsistently higher AUC values. The AUC for GARPcould potentially be improved by attempting to increasethe resolution at the left end of the ROC curve, namelyby creating more original binary GARP models (say1000) and choosing a larger best subset (say 100). Wetried this for both species using all occurrence local-ities and all variables, and found that the predictionswere virtually unchanged (in comparison to a best sub-set of 10 out of 100 models). We also note that evenwith 100 total models, GARP was already testing thelimits of the computers we used (processing all 22datasets produced almost 20 GB of output, comparedwith 285 MB for Maxent). Apart from output size, thecomputational requirements of the two algorithms weresimilar in this study; GARP averaged 1.95 h to producea single prediction (best-subset composite derived from100 models), compared with 2.27 h for Maxent, bothon an 850 MHz Pentium 3 processor. Later versions ofMaxent available on the website use a faster algorithm(Haffner and Phillips, in preparation); Version 1.8.1takes a total of 70 min to process all 22 datasets onthe above-mentioned computer, or 20 min on a newer3.2 GHz Intel Xeon computer.

Secondly, Maxent more successfully integrated finetopographic data for both species, producing more de-tailed (finer-grained) predictions (see close-up imagesin Fig. 4.2). We propose that this is true, at least in part,because the Maxent model exhibits additivity (whileGARP does not), with the contribution of all the vari-ables being added at each pixel (see Eq.(2) in Sec-

Fig. 6. Close-up of northwestern South America for the predicted p(bottom) made using all occurrence records and climatic, elevationa axent (left)and GARP (right). For both species, note the finer grain of the Maxen edunsuitable conditions in the non-forested tropical savannas (llanos) of easte sencethere (even with the inclusion of the potential vegetation variable). F s. Maxentproduces a continuous prediction with values ranging from 0 to 100, = [34,66);black = [66,100]. The best-subsets selection procedure employed for lues from 0to 10, depicted here using white = 0; light grey = 1–4; dark grey = 5– compareddirectly. All areas with a Maxent prediction of 1 or greater likely repres rast, areaswith a GARP prediction of 5–10 probably indicate suitable condition r thespecies for each algorithm, Maxent indicates areas of successively s (10) to mostsuch areas (see Section4.4).

tion 2.1.2). We tried two approaches in an attempt toget GARP to make finer predictions. First, we exam-ined the composite models derived from much largernumbers of GARP models (described above), but theresolution did not increase noticeably. Second, we de-creased the convergence limit, allowing GARP to re-fine its predictions and potentially make more complexmodels. Again using the full datasets, we reduced theconvergence limit from 0.01 to 0.0001, which increasedthe running time five-fold. Decreasing the convergencelimit may result in overfitting in some circumstances;however, we saw no indication of that here. In fact, itimproved the prediction forB. variegatus somewhat(for example, reducing overprediction in some high-land areas), but it did not increase the apparent resolu-tion of the predictions.

4.5. Future work

Much work can be done to refine the use of Max-ent for modeling species geographic distributions. Re-search should determine the number of occurrence lo-calities needed to make an adequate prediction, and todetermine how much regularization is needed to avoidoverfitting when the number of occurrence localities issmall; preliminary results regarding these issues arepresented byDudık et al. (2004)and Phillips et al.(2004). Regarding the quality of the inputs to Maxent,the effect of non-uniform sampling of species locali-ties should be also investigated, building onZadrozny( m-pe intoa thans am-p

otential geographic distributions ofB. variegatus (top) andM. minutusl and potential vegetation variables. Results are given for Mt prediction. ForB. variegatus, the Maxent prediction correctly indicatrn Colombia, but the GARP prediction continued to predict preour colors are used to indicate the strength of the predictiondepicted here by white = [0,1); pale grey = [1,34); dark greyGARP yields a discrete composite prediction with integer va9; black = 10. The strength of the predictions thus cannot beent suitable environmental conditions for the species; in cont

s (see Section3.2). Note that among areas predicted as suitable fotronger predictions, whereas GARP assigns a maximal value

2004), with an eye to estimating and limiting the iact of sampling bias(Reddy and Davalos, 2003). Forxample, selection of background points takingccount which sites have been sampled (ratherimply at random) can ameliorate the effects of sling bias in some cases(Zaniewski et al., 2002). As

Page 26: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

256 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

described in Section4.3, smoothing a prediction maybe a useful general method of reducing the negativeimpact of spatial errors in localities and environmentalvariables. Additionally, before modeling the species’requirements, smoothing could also be applied to anyvariables that are suspected of having spatial errors, butit is far from a complete approach to error management.Another possibility, which may improve performanceeven in the absence of errors in the input data, would beto use bilinear (rather than nearest-neighbor) interpo-lation to obtain values for the environmental variablesat the training localities. Thus, training localities nearthe boundary between two pixels would receive a com-bination of the values of the two pixels; for categoricalvariables, training localities very close to the boundarybetween two classes would have partial membership inboth classes. Alternatively, rather than using a binaryfeature to represent membership in each class, a contin-uous feature representing distance from the class couldbe used.

Research is also called for regarding the use andapplication of Maxent predictions. First, a good ruleneeds to be developed to set a threshold operationallyusing intrinsic data (when a binary prediction is de-sired). Future research should determine to what degreedifferences in Maxent’s prediction strength correspondto the relative environmental suitability of the variousregions, rather than the possibility that they may reflectcollection biases (areas with many occurrence records).Additional feature types should also be considered, forel nvi-r

ralh dif-fi veb eza edd for-mHPm om-i h asM onsp iallyg thatm

Acknowledgments

We thank the Center for Biodiversity and Con-servation at the American Museum of Natural His-tory for fostering research on this topic, and in par-ticular to Eleanor Sterling and Ned Horning for fa-cilitating our collaboration. This work was supportedby AT&T Labs-Research (SJP and RES), NSF grantsIIS-0325500 and CCR-0325463 (RES), a RooseveltPostdoctoral Research Fellowship from the Ameri-can Museum of Natural History (RPA), and by fundsprovided by the Office of the Dean of Science andthe Office of the Provost, City College of the CityUniversity of New York (RPA). Enrique Martinez-Meyer, Miguel Ortega-Huerta and Townsend Peter-son supplied the elevational variable. David Lees sug-gested the cumulative representation for the Max-ent output. Kevin Koy gave us advice and assis-tance with GIS. We thank Miroslav Dudık, Cather-ine Graham, Ned Horning, Claire Kremen, TownsendPeterson and Christopher Raxworthy for insightfulcomments on the manuscript. We thank an anony-mous reviewer for a very detailed and comprehen-sive review. Our locality records derived from projectssurveying specimens housed in the following natu-ral history museums(Anderson and Handley, 2001;Carleton and Musser, 1989): Academy of NaturalSciences, Philadelphia; American Museum of Natu-ral History, New York; Carnegie Museum of Natu-ral History, Pittsburgh; Field Museum, Chicago (for-m deC om-b at-u sn hi-g seod r-a u-s ver-s gy,U ryM t-u ofN delC r-s ce;a nnA

xample, threshold features (seeDudık et al., 2004) andog-linear features (the logarithm of continuous eonmental variables;Holdridge et al., 1971).

The great potential utility of data from natuistory museums and herbaria, as well as theculty of making such data readily available, haeen clear for many years(Anderson, 1963; Suarnd Tsutsui, 2004). Recent advances in distributatabases and biodiversity informatics facilitate ination retrieval(Baker et al., 1998; Bisby, 2000; Perez-ernandez et al., 1997; Soberon, 1999; Soberon andeterson, 2004; Stein and Wieczorek, 2004). Further-ore, fine-resolution environmental data are bec

ng increasingly available. The success of tools sucaxent and GARP in modeling species distributirovides increased impetus for such efforts, especiven the gravity of many environmental issuesay be addressed using these techniques.

erly Field Museum of Natural History); Institutoiencias Naturales, Universidad Nacional de Colia, Bogota; Instituto del Desarrollo de Recursos Nrales Renovables, INDERENA, Bogota (specimenow at the Instituto Alexander von Humboldt); Mican State University Museum, East Lansing; Muel Instituto La Salle, Bogota; Museum of Compative Zoology, Harvard University, Cambridge; Meum of Natural Science, Louisiana State Uniity, Baton Rouge; Museum of Vertebrate Zooloniversity of California, Berkeley; Natural Histouseum, London (formerly British Museum [Naral History]); United States National Museumatural History, Washington, DC; Universidadauca, Popayan; Universidad del Valle, Cali; Univeity of Kansas Natural History Museum, Lawrennd University of Michigan Museum of Zoology, Arbor.

Page 27: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 257

References

Anderson, R.P., 2003. Real vs. artefactual absences in species distri-butions: tests forOryzomys albigularis (Rodentia: Muridae) inVenezuela. J. Biogeogr. 30, 591–605.

Anderson, R.P., Gomez-Laverde, M., Peterson, A.T., 2002. Geo-graphical distributions of spiny pocket mice in South America:insights from predictive models. Global Ecol. Biogeogr. 11, 131–141.

Anderson, R.P., Handley Jr., C.O., 2001. A new species of three-toed sloth (Mammalia: Xenarthra) from Panama, with a reviewof the genusBradypus. Proceedings of the Biological Society ofWashington 114, 1–33.

Anderson, R.P., Lew, D., Peterson, A.T., 2003. Evaluating predictivemodels of species’ distributions: criteria for selecting optimalmodels. Ecol. Model. 162, 211–232.

Anderson, R.P., Martınez-Meyer, E., 2004. Modeling species’ geo-graphic distributions for preliminary conservation assessments:an implementation with the spiny pocket mice (Heteromys) ofEcuador. Biol. Conser. 116, 167–179.

Anderson, R.P., Peterson, A.T., Gomez-Laverde, M., 2002. Usingniche-based GIS modeling to test geographic predictions of com-petitive exclusion and competitive release in South Americanpocket mice. Oikos 98, 3–16.

Anderson, S., 1963. Problems in the retrieval of information fromnatural history museums. In: Proceedings of the Conference onData Acquisition and Processing in Biology and Medicine, Perg-amon Press, New York, pp. 55–57.

Aoki, I., 1989. Holological study of lakes from an entropy view-point – Lake Mendota. Ecol. Model. 45, 81–93.

Aspinall, R., 1992. An inductive modeling procedure based onBayes’ theorem for analysis of pattern in spatial data. Int. J. Ge-ogr. Inform. Syst. 6 (2), 105–121.

Austin, M., 2002. Spatial prediction of species distribution: an inter-face between ecological theory and statistical modelling. Ecol.

B , Ed-J.T.,ms,

d de-, 1–4.

B axi-mput.

B and

B uer

Bcol.

C ory-f

C blents,

Corsi, F., de Leeuw, J., Skidmore, A., 2000. Modeling species dis-tribution with GIS. In: Boitani, L., Fuller, T. (Eds.), ResearchTechniques in Animal Ecology. Columbia University Press, NewYork, 389–434.

Corsi, F., Dupre, E., Boitani, L., 1999. A large-scale model of wolfdistribution in Italy for conservation planning. Conserv. Biol. 13,150–159.

Della Pietra, S., Della Pietra, V., Lafferty, J., 1997. Inducing featuresof random fields. IEEE Trans. Pattern Anal. Mach. Intell. 19 (4),1–13.

DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L., 1988. Com-paring the areas under two or more correlated receiver operatingcharacteristic curves: a non-parametric approach. Biometrics 44,837–845.

Dinerstein, E., Olson, D.M., Graham, D.J., Webster, A.L., Primm,S.A., Bookbinder, M.P., Ledec, G., 1995. Ecoregions of LatinAmerica and the Caribbean (inset map). In: A Conservation As-sessment of the Terrestrial Ecoregions of Latin America and theCaribbean, The World Bank, Washington, DC.

Dudık, M., Phillips, S.J., Schapire, R.E., 2004. Performance guar-antees for regularized maximum entropy density estimation.In: Proceedings of the 17th Annual Conference on Compu-tational Learning Theory, ACM Press, New York, pp. 655–662.

Eisenberg, J.F., Redford, K.H., 1999. Mammals of the Neotropics.vol. 3, the central Neotropics: Ecuador, Peru, Bolivia, Brazil.University of Chicago Press, Chicago.

Elith, J., 2002. Quantitative methods for modeling species habitat:comparative performance and an application to Australian plants.In: Ferson, S., Burgman, M. (Eds.), Quantitative Methods forConservation Biology. Springer-Verlag, New York, 39–58.

Emmons, L.H., 1997. Neotropical Rainforest Mammals: A FieldGuide, 2nd ed.. University of Chicago Press, Chicago.

Engler, R., Guisan, A., Rechsteiner, L., 2004. An improved approachfor predicting the distribution of rare and endangered species

l. 41,

F ations, Palo

F nesspre-

us-

F ndediver-ling.

F as-sence

G -

G A.T.,d ap-

Model. 157, 101–118.aker, R.J., Phillips, C.J., Bradley, R.D., Burns, J.M., Cooke, D.

son, G.F., Haragan, D.R., Jones, C., Monk, R.R., Montford,Schmidly, D.J., Parker, N.C., 1998. Bioinformatics, museuand society: integrating biological data for knowledge-basecisions. Occasional Papers Museum Texas Tech Univ. 187

erger, A.L., Della Pietra, S.A., Della Pietra, V.J., 1996. A mmum entropy approach to natural language processing. CoLinguist. 22 (1), 39–71.

isby, F.A., 2000. The quiet revolution: biodiversity informaticsthe internet. Science 289, 2309–2312.

rown, J.H., Lomolino, M.V., 1998. Biogeography, 2nd ed.. SinaAssociates, Sunderland, Massachusetts.

usby, J.R., 1986. A biogeographical analysis ofNothofagus cun-ninghamii (Hook.) Oerst. in southeastern Australia. Aust. J. E11, 1–7.

arleton, M.D., Musser, G.G., 1989. Systematic studies ofzomyine rodents (Muridae, Sigmodontinae): a synopsis oMi-croryzomys. Bull. Am. Museum Nat. History 119, 1–83.

arpenter, G., Gillison, A.N., Winter, J., 1993. DOMAIN: a fleximodeling procedure for mapping potential distributions of plaanimals. Biodivers. Conserv. 2, 667–680.

from occurrence and pseudo-absence data. J. Appl. Eco263–274.

awcett, T., 2003. ROC graphs: notes and practical considerfor data mining researchers. Technical Report HPL-2003–4Alto, CA: HP Laboratories.

errier, S., Watson, G., 1996. An evaluation of the effectiveof environmental surrogates and modelling techniques indicting the distribution of biological diversity. Canberra, Atralia: NSW National Parks and Wildlife Service.http://www.deh.gov.au/biodiversity/publications/technical/surrogates.

errier, S., Watson, G., Pearce, J., Drielsma, M., 2002. Extestatistical approaches to modelling spatial pattern in biodsity in northeast New South Wales. 1. Species-level modeBiodivers. Conserv. 11, 2275–2307.

ielding, A.H., Bell, J.F., 1997. A review of methods for thesessment of prediction errors in conservation presence/abmodels. Env. Conserv. 24, 38–49.

ouvea, E., deAvila Pires, F.D., 1999. Mamıferos do Parque Nacional do Itatiaia. Revista Cientıfica do Centro Universitario doBarra Mansa, UBM 1 (2), 11–26.

raham, C.H., Ferrier, S., Huettman, F., Moritz, C., Peterson,2004. New developments in museum-based informatics an

Page 28: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

258 S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259

plications in biodiversity analysis. Trends Ecol. Evol. 19 (9),497–503.

Guisan, A., Edwards Jr., T.C., Hastie, T., 2002. Generalized linear andgeneralized additive models in studies of species distributions:setting the scene. Ecol. Model. 157, 89–100.

Guisan, A., Zimmerman, N.E., 2000. Predictive habitat distributionmodels in ecology. Ecol. Model. 135, 147–186.

Hanley, J., McNeil, B., 1982. The meaning and use of the area undera receiver operating characteristic (ROC) curve. Radiology 143,29–36.

Hanley, J., McNeil, B., 1983. A method of comparing the areas underreceiver operating characteristic curves derived from the samecases. Radiology 148, 839–843.

Hershkovitz, P., 1998. Report on some sigmodontine rodents col-lected in southeastern Brazil with descriptions of a new genusand six new species. Bonner zoologische Beitrage 47, 193– 256.

Hirzel, A.H., Hausser, J., Chessel, D., Perrin, N., 2002. Ecological-niche factor analysis: how to compute habitat-suitability mapswithout absence data? Ecology 87, 2027–2036.

Holdridge, L., Grenke, W., Hatheway, W., Liang, T., Tosi Jr., J., 1971.Forest Environments In Tropical Life Zones: A Pilot Study. Perg-amon Press, New York.

Hutchinson, G.E., 1957. Concluding remarks. In: Cold Spring Har-bor Symposia on Quantitative Biology 22, 415–427.

Jaynes, E.T., 1957. Information theory and statistical mechanics.Phys. Rev. 106, 620–630.

Jaynes, E.T., 1990. Notes on present status and future prospects. In:Grandy Jr., W.T., Schick, L.H. (Eds.), Maximum Entropy andBayesian Methods. Kluwer, Dordrecht, The Netherlands, 1–13.

Jensen, J.R., 1996. Introductory Digital Image Processing, A RemoteSensing Perspective. Prentice Hall, Upper Saddle River, NJ.

Joseph, L., Stockwell, D.R.B., 2002. Climatic modeling of the dis-tribution of somePyrrhura parakeets of northwestern SouthAmerica with notes on their systematics and special referenceto Pyrrhura caeruleiceps Todd, 1947. Ornitologıa Neotropical

K ott,bitat-

son,.B.y and

M ical. J.

M enc-273.

M ege,

N ieth-lop-tol-pcc-

N tiveayes.

Nix, H., 1986. A biogeographic analysis of Australian elapid snakes.Atlas of Elapid Snakes of Australia. Australian Government Pub-lishing Service, Canberra, Australia, 4–15.

Patterson, B.D., Ceballos, G., Sechrest, W., Tognelli, M.F., Brooks,T., Luna, L., Ortega, P., Salazar, I., Young, B.E., 2003. DigitalDistribution Maps of the Mammals of the Western Hemisphere,Version 1.0. NatureServe, Arlington, VA.

Paynter Jr., R.A., 1982. Ornithological Gazetteer of Venezuela. Mu-seum of Comparative Zoology, Harvard University, Cambridge,MA.

Pearson, R.G., Dawson, T.P., Lin, C., 2004. Modelling species dis-tributions in Britain: a hierarchical integration of climate andland-cover data. Ecography 27, 285–298.

Perez-Hernandez, R., Colomine, G., Villarroel, G., 1997. Los museosde historia natural vinculados con la universidad venezolana y susperspectivas hacia el siglo XXI. Acta Cientıfica Venezolana 48,177–181.

Peterson, A.T., Cohoon, K.P., 1999. Sensitivity of distributionalprediction algorithms to geographic data completeness. Ecol.Model. 117, 154–164.

Peterson, A.T., Holt, R.D., 2003. Niche differentiation in Mexicanbirds: using point occurrences to detect ecological innovation.Ecol. Lett. 6, 774–782.

Peterson, A.T., Kluza, D.A., 2003. New distributional model-ing approaches for gap analysis. Anim. Conserv. 6, 47–54.

Peterson, A.T., Robins, C.R., 2003. Using ecological-niche modelingto predict barred owl invasions with implications for spotted owlconservation. Conserv. Biol. 17, 1161–1165.

Peterson, A.T., Shaw, J., 2003.Lutzomyia vectors for cutaneous leish-maniasis in southern Brazil: ecological niche models, predictedgeographic distribution, and climate change effects. Int. J. Para-sitol. 33, 919–931.

Peterson, A.T., Soberon, J., Sanchez-Cordero, V., 1999. Conser-vatism of ecological niches in evolutionary time. Science 285,

P pyof theress,

P Eval-ess-

P assi-st dis-ss,

P istri-

R lan-enn-

R d itsgr.

R istri-

13, 1–8.arl, J.W., Svancara, L.K., Heglund, P.J., Wright, N.M., Sc

J.M., 2002. Species commonness and the accuracy of harelationship models. In: Scott, J.M., Heglund, P.J., MorriM.L., Haufler, J.B., Raphael, M.G., Wall, W.A., Samson, F(Eds.), Predicting Species Occurrences: Issues of AccuracScale. Island Press, Washington, DC, 573–580.

ackey, B.G., Lindenmayer, D.B., 2001. Towards a hierarchframework for modelling the spatial distribution of animalsBiogeogr. 28, 1147–1166.

cPherson, A., 1985. A biogeographical analysis of factors influing the distribution of Costa Rican rodents. Brenesia 23, 97–

initab, 1998. Minitab, Release 12.1. Minitab, Inc., State CollPA.

ew, M., Hulme, M., Jones, P., 1999. Representing twentcentury space-time climate variability. Part 1. Devement of a 1961–90 mean monthly terrestrial climaogy. J. Climate 12, 829–856. Data available at http://iddc.cru.uea.ac.uk/crudata/examine/cruclimate.html.

g, A.Y., Jordan, M.I., 2001. On discriminative versus generaclassifiers: a comparison of logistic regression and naive BAdv. Neural Inform. Process. Syst. 14, 605–610.

1265–1267.hillips, S.J., Dudık, M., Schapire, R.E., 2004. A maximum entro

approach to species distribution modeling. In: Proceedings21st International Conference on Machine Learning, ACM PNew York, pp. 655–662.

onder, W.F., Carter, G.A., Flemons, P., Chapman, R.R., 2001.uation of museum collection data for use in biodiversity assment. Conserv. Biol. 15, 648–657.

rovost, F.J., Fawcett, T., 1997. Analysis and visualization of clfier performance: comparison under imprecise class and cotributions. Knowledge Discovery and Data Mining. ACM PreNew York, 43–48.

ulliam, H.R., 2000. On the relationship between niche and dbution. Ecol. Lett. 3, 349–361.

atnaparkhi, A., 1998. Maximum entropy models for naturalguage ambiguity resolution. Ph.D. Thesis, University of Psylvania, Philadelphia, PA.

eddy, S., Davalos, L.M., 2003. Geographical sampling bias animplications for conservation priorities in Africa. J. Biogeo30, 1719–1727.

oot, T., 1988. Environmental factors associated with avian dbutional boundaries. J. Biogeogr. 15, 489–505.

Page 29: Maximum entropy modeling of species geographic distributions - Robert Schapire's …rob.schapire.net/papers/ecolmod.pdf ·  · 2015-07-13Maximum entropy modeling of species geographic

S.J. Phillips et al. / Ecological Modelling 190 (2006) 231–259 259

Schneider, E., Kay, J., 1994. Life as a manifestation of the secondlaw of thermodynamics. Math. Comput. Model. 19 (6–8), 25–48.

Scott, J.M., Heglund, P.J., Morrison, M.L., Haufler, J.B., Raphael,M.G., Wall, W.A., Samson, F.B. (Eds.), 2002. Predicting SpeciesOccurrences: Issues of Accuracy and Scale. Island Press, Wash-ington, DC.

Shannon, C.E., 1948. A mathematical theory of communication. BellSyst. Tech. J. 27, 379–423, 623–656.

Soberon, J., 1999. Linking biodiversity information sources. TrendsEcol. Evol. 14, 291.

Soberon, J., Peterson, A.T., 2004. Biodiversity informatics: manag-ing and applying primary biodiversity data. Philos. Trans. RoyalSoc. Lond. B 359, 689–698.

Stein, B.R., Wieczorek, J., 2004. Mammals of the world: MaNIS as anexample of data integration in a distributed network environment.Biodivers. Inform. 1 (1), 14–22.

Stockwell, D., Peters, D., 1999. The GARP modeling system: prob-lems and solutions to automated spatial prediction. Int. J. Geo-graph. Inform. Sci. 13, 143–158.

Stockwell, D.R.B., Noble, I.R., 1992. Induction of sets of rules fromanimal distribution data: a robust and informative method of dataanalysis. Math. Comput. Simul. 33, 385–390.

Stockwell, D.R.B., Peterson, A.T., 2002a. Controlling bias in bio-diversity data. In: Scott, J.M., Heglund, P.J., Morrison, M.L.,Haufler, J.B., Raphael, M.G., Wall, W.A., Samson, F.B. (Eds.),Predicting Species Occurrences: Issues of Accuracy and Scale.Island Press, Washington, DC 537–546.

Stockwell, D.R.B., Peterson, A.T., 2002b. Effects of sample size onaccuracy of species distribution models. Ecol. Model. 148, 1–13.

Suarez, A.V., Tsutsui, N.D., 2004. The value of museum collectionsfor research and society. BioScience 54 (1), 66–74.

Tate, G.H.H., 1939. The mammals of the Guiana region. Bull. Am.Museum Nat. History 76, 151–229.

Thomas, C.D., Cameron, A., Green, R.E., Bakkenes, M., Beau-mont, L.J., Collingham, Y.C., Erasmus, B.F.N., de Siqueira, M.F.,Grainger, A., Hannah, L., Hughes, L., Huntley, B., van Jaarsveld,A.S., Midgley, G.F., Miles, L., Ortega-Huerta, M.A., Peterson,A.T., Phillips, O.L., Williams, S.E., 2004. Extinction risk fromclimate change. Nature 427, 145–148.

USGS, 2001. HYDRO 1k, elevation derivative database. UnitedStates Geological Survey, Sioux Falls, South Dakota. Availableathttp://www.edcdaac.usgs.gov/gtopo30/hydro.

Vida, S., 1993. A computer program for non-parametric receiveroperating characteristic analysis. Comput. Meth. Prog. Biomed.40, 95–101.

Welk, E., Schubert, K., Hoffmann, M.H., 2002. Present and poten-tial distribution of invasive mustard (Alliara petiolata) in NorthAmerica. Divers. Distributions 8, 219–233.

Wiley, E.O., McNyset, K.M., Peterson, A.T., Robins, C.R., Stewart,A.M., 2003. Niche modeling and geographic range predictionsin the marine environment using a machine-learning algorithm.Oceanography 16 (3), 120–127.

Williams, P.M., 1995. Bayesian regularization and pruning using aLaplace prior. Neural Comput. 7 (1), 117–143.

Yee, T.W., Mitchell, N.D., 1991. Generalized additive models in plantecology. J. Veg. Sci. 2, 587–602.

Yom-Tov, Y., Kadmon, R., 1998. Analysis of the distribution of in-sectivorous bats in Israel. Divers. Distributions 4, 63–70.

Zadrozny, B., 2004. Learning and evaluating classifiers under sampleselection bias. In: Proceedings of the 21st International Confer-ence on Machine Learning, pp. 903–910.

Zaniewski, A.E., Lehmann, A., Overton, J.M., 2002. Predictingspecies spatial distributions using presence-only data: a casestudy of native New Zealand ferns. Ecol. Model. 157, 261–280.

Zweig, M.H., Campbell, G., 1993. Receiver-operating characteristic(ROC) plots: a fundamental evaluation tool in clinical medicine.Clin. Chem. 39 (4), 561–577.