Data analysis in R: linking species distribution data to ... · PDF fileThe niche concept 2/90...

Ecologie et Dynamique des Systèmes Anthropisés FRE 3498 CNRS-UPJV www.u-picardie.fr/edysan

Photo: Jonathan Lenoir

5th IRSAE Summer School – Bø – 04-08/08/2014

Data analysis in R: linking species distribution data to climatic data

Niche & Distribution

Niche & Distribution Data Presence-Absence Presence-Only 1/90

The niche concept

Key papers (Pulliam, 2000; Soberon, 2007; Colwell & Rangel, 2009)

Niche & Distribution Data Presence-Absence Presence-Only

Climatic gradient 1

The niche concept

The Hutchinson’s “fundamental niche” (Pulliam, 2000)

Presence

Absence

Focal species distribution

Climatic gradient 1

The niche concept

The Hutchinson’s “realized niche” (Pulliam, 2000)

Presence

Absence

Dominant competitor

Climatic gradient 1

The niche concept

The dispersal limitation effect (Pulliam, 2000)

Presence

Absence

Dominant competitor

Climatic gradient 1

The niche concept

The source-sink dynamic effect (Pulliam, 2000)

Presence

Absence

Dominant competitor

Source

The niche concept

Modified from Soberon (2007) & Anderson (2013)

Assisted Migration

The niche concept

Lessons from biological invasions (Guisan et al., 2014)

The niche concept

Lessons from biological invasions (Guisan et al., 2014)

The n-dimensions of the niche

scenopoetic factors (i.e., broad-scale conditions): Grinnell bionomic factors (i.e., resource-consumer dynamics): Elton

The Grinnellian niche & the Eltonian niche (Soberon, 2007)

indirect variables (e.g., elevation, slope & aspect) direct variables (e.g., air temperature & soil pH) resource variables (e.g., nutrients, soil or air water & light)

Variables to consider to capture the niche (Austin & Smith, 1989)

Towards modelling species distribution

Example for plant distribution (Guisan & Zimmermann, 2000)

Species distribution models (SDMs)

The modelling trade-off (Guisan & Zimmermann, 2000)

Four main types of SDMs regarding the algorithm

profile methods regression methods machine learning methods geographic methods

Two main types of SDMs regarding the response variable

presence-only models presence-absence models

NB: profile methods are always presence-only models but machine learning & regression methods can be either presence-absence or presence-only models depending on whether survey-absence or pseudo-absence/background data have been generated

14/90 Niche & Distribution Data Presence-Absence Presence-Only

If you also have survey-absence data from a well designed survey

use regression or machine learning methods do not use profile methods

If you only have occurrences, you can still substitute absences with

background data (do not depend on where occurrences are) pseudo-absence data (depend on where occurrences are)

NB: survey-absence data can be biased due to detectability issues

CCL: be careful not to get mixed up between presence-absence (survey-absence) & presence-only (pseudo-absence/background) models when using regression or machine learning methods

The BIOCLIM Algorithm: BIOCLIM (P) The Domain Algorithm: Domain (P) Generalized Linear Models: GLMs (R) Generalized Linear Mixed Models: GLMMs (R) Generalized Additive Models: GAMs (R) Generalized Additive Mixed Models: GAMMs (R) Structural Equation Models: SEMs (R) Random Forests: RFs (ML) Boosted Regreesion Trees: BRTs (ML) Artificial Neural Networks: ANNs (ML) The Maximum Entropy Approach: Maxent (ML) Auto-Logistic Models: SAR (R- and ML-compatible) Residual Auto-Covariate Models: RAC (R- and ML-compatible)

Few Profile (P), Regression (R) & Machine Learning (ML) methods

GLMs tipically used with presence-absence (survey-absence) data Maxent classically used with presence-only (background) data

During this course, we will specifically focus on

The dependent or response variable (Y)

Distribution of Sycamore maple (Acer pseudoplatanus L.)

presence-only data (1) @ GBIF (http://www.gbif.org/) presence-absence data (0/1) @ IGN (http://www.ign.fr/)

The independent or predictor variables (Xi)

Climatic variables @ WorldClim (http://www.worldclim.org/)

Annual Mean Temperature (BIO1) Max Temperature of Warmest Month (BIO5) Min Temperature of Coldest Month (BIO6) Temperature Annual Range (BIO7 = BIO5-BIO6) Annual Precipitation (BIO12) Precipitation of Wettest Month (BIO13) Precipitation of Driest Month (BIO14) Precipitation Seasonality (Coefficient of Variation) (BIO15) Water Balance (sum of monthly prec. minus monthly PET) (WBAL)

Quick look at the data in R

Import data into R

> setwd("C:/Users/admin2/Documents/Enseignements/IRSAE-2014")

> ap <- read.table("Data/FR/apPAfr.txt", header=TRUE, sep="\t")

> str(ap)

'data.frame': 46589 obs. of 15 variables:

$ x : num 5.75 7.15 4.99 -0.76 1.16 ...

$ y : num 48.3 48.8 47.8 43 47.2 ...

$ year : int 2005 2005 2005 2005 2005 ...

$ pa : int 1 0 0 0 0 ...

$ wbal : int 88 -27 67 239 -164 ...

$ bio1 : num 8.7 9.1 9.1 8.5 11 ...

$ bio5 : num 22.2 23.1 22.7 21.9 24.6 ...

$ bio6 : num -2.6 -2.3 -2.2 -2.4 0.2 ...

$ bio7 : num 24.8 25.4 24.9 24.3 24.4 ...

$ bio12: int 852 750 849 1068 697 ...

$ bio13: int 87 78 85 114 67 ...

$ bio14: int 55 50 55 60 50 ...

$ bio15: int 14 15 15 15 9 ...

Plot the empirical distribution of Sycamore maple on a map

> library(raster)

> fr <- getData("GADM", country="FRA", level=0, path="Data/FR")

> projection(fr)

[1] "+proj=longlat +ellps=WGS84 +datum=WGS84 +towgs84=0,0,0"

> plot(fr)

> occ <- which(ap$pa==1)

> length(occ)

[1] 6455

> abs <- which(ap$pa==0)

> length(abs)

[1] 40134

> points(ap$x[abs], ap$y[abs], pch=4, col="red", cex=0.1)

> points(ap$x[occ], ap$y[occ], pch=3, col="green", cex=0.1)

> leg <- c("Presence", "Absence")

> legend("bottomleft", leg, pch=c(3, 4), col=c("green", "red"))

Import raster layers of all predictor variables

> TMEAN <- raster("Data/FR/bio1.tif")

> TMAX <- raster("Data/FR/bio5.tif")

> TMIN <- raster("Data/FR/bio6.tif")

> TSEA <- raster("Data/FR/bio7.tif")

> PANN <- raster("Data/FR/bio12.tif")

> PMAX <- raster("Data/FR/bio13.tif")

> PMIN <- raster("Data/FR/bio14.tif")

> PSEA <- raster("Data/FR/bio15.tif")

> WBAL <- raster("Data/FR/wbal.tif")

Plot predictor variables

> library(fBasics)

> colT <- rev(divPalette(n=100, name="RdYlBu"))

> colP <- seqPalette(n=100, name="YlGnBu")

> colWBAL <- divPalette(n=100, name="RdYlGn")

> par(mfrow=c(3, 3), mai=c(bottom=0.1, 0, top=0.2, 0))

> plot(TMEAN, col=colT, axes=FALSE, main="tmean")

> plot(TMIN, col=colT, axes=FALSE, main="tmin")

> plot(TMAX, col=colT, axes=FALSE, main="tmax")

> plot(PANN, col=colP, axes=FALSE, main="pann")

> plot(PMIN, col=colP, axes=FALSE, main="pmin")

> plot(PMAX, col=colP, axes=FALSE, main="pmax")

> plot(TSEA, col=colT, axes=FALSE, main="tsea")

> plot(PSEA, col=colP, axes=FALSE, main="psea")

> plot(WBAL, col=colWBAL, axes=FALSE, main="wbal")

Data preparation

Remove NA values & check correlations among predictors

> dim(ap)

[1] 46589 15

> ap <- na.omit(ap)

> dim(ap)

[1] 46549 15

> round(cor(ap[, 7:15]), 2)

wbal bio1 bio5 bio6 bio7 bio12 bio13 bio14 bio15

wbal 1.00

bio1 -0.70 1.00

bio5 -0.74 0.86 1.00

bio6 -0.58 0.91 0.59 1.00

bio7 -0.13 -0.13 0.39 -0.51 1.00

bio12 0.91 -0.46 -0.43 -0.46 0.07 1.00

bio13 0.76 -0.21 -0.29 -0.16 -0.12 0.88 1.00

bio14 0.75 -0.61 -0.47 -0.63 0.21 0.77 0.43 1.00

bio15 -0.12 0.37 0.15 0.46 -0.36 -0.13 0.31 -0.66 1.00

Data preparation

Remove highly correlated variables to avoid multicollinearity issues

> ap <- ap[, c(-8, -9, -12, -13, -14)]

> names(ap)[8:10] <- c("tmin", "tsea", "psea")

> str(ap)

'data.frame': 46549 obs. of 10 variables:

$ x : num 5.75 7.15 4.99 -0.76 1.16 ...

$ y : num 48.3 48.8 47.8 43 47.2 ...

$ day : int 29 24 13 26 14 ...

$ month: int 6 3 9 7 3 ...

$ year : int 2005 2005 2005 2005 2005 ...

$ pa : int 1 0 0 0 0 ...

$ wbal : int 88 -27 67 239 -164 ...

$ tmin : num -2.6 -2.3 -2.2 -2.4 0.2 ...

$ tsea : num 24.8 25.4 24.9 24.3 24.4 ...

$ psea : int 14 15 15 15 9 ...

Presence-Absence

The case of presence-absence data (0/1) or logistic regression

Generalized Linear Models (GLMs)

Y: the response variable which values p [0 : 1] Xi: explanatory variables being either qualitative or quantitative i: coefficient parameters : the error term (mean 0 & variance dependent on the Xis)

𝑙𝑜𝑔𝑖𝑡 𝑌 = 𝑙𝑛𝑝

1 − 𝑝= 𝛽0 +𝛽1𝑋1 + 𝛽2𝑋2 +⋯+ 𝛽𝑘𝑋𝑘 + 𝜀

𝑝 =1

1 + 𝑒− 𝛽0+𝛽1𝑋1+𝛽2𝑋2+⋯+𝛽𝑘𝑋𝑘+𝜀∈ 0 ∶ 1

GLMs’ properties

the family distribution (gaussian, poisson, binomial, gamma) the linear predictor (η = 0 + 1X1 + 2X2 + ... + kXk) the link function (identity, log, logit, inverse)

R syntax

glm(formula, family, data, …) family=gaussian(link="identity") family=poisson(link="log") family=binomial (link="logit") formula=y~x formula=y~poly(x, 2) formula=y~x1+x2 formula=y~x1+x2+x1:x2

Split the original dataset into train (4/5) & test (1/5) datasets

> library(dismo)

> fold <- kfold(ap, k=5)

> calib <- ap[which(fold!=1), ]

> dim(calib)

[1] 37239 10

> write.table(calib, "Data/FR/calib.txt", sep="\t")

> valid <- ap[which(fold==1), ]

> dim(valid)

[1] 9310 10

> write.table(valid, "Data/FR/valid.txt", sep="\t")

Divide the initial dataset into five groups of similar sizes

Model fitting: Akaike Information Criterion (AIC) & goodness of fit

> M1 <- glm(pa~tmin, family=binomial, data=calib)

> summary(M1)

Estimate Std. Error z value Pr(>|z|)

(Intercept) -2.411640 0.021762 -110.82 <2e-16 ***

tmin -0.379508 0.007279 -52.14 <2e-16 ***

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 29952 on 37238 degrees of freedom

Residual deviance: 26924 on 37237 degrees of freedom

AIC: 26928

> k <- length(M1$coefficients)

> aic <- (2*k)-(2*logLik(M1)[[1]])

> round(aic)

[1] 26928

> gof <- (M1$null.deviance-M1$deviance)/M1$null.deviance

[1] 0.1010928

Formula for AIC

Formula for goodness of fit

Number of parametres

Model prediction: linear or response predictions?

> tmin <- calib$tmin[1:5]

> tmin <- data.frame(tmin)

> tmin

1 -2.6

2 -2.3

3 -2.4

5 -1.6

> predict(M1, newdata=tmin)

1 2 3 4 5

-1.424919 -1.538772 -1.500821 -2.487542 -1.804427

> predict(M1, newdata=tmin, type="response")

1 2 3 4 5

0.19389155 0.17671390 0.18230312 0.07673617 0.14131298

Raw predictions from the linear

predictor (η)

𝑝[0, 1]

Model prediction: plot response curve

> xmax <- max(calib$tmin)

> xmin <- min(calib$tmin)

> tmin <- seq(xmin, xmax, length.out=30)

> p <- predict(M1, newdata=tmin, type="response")

> plot(tmin$tmin, p, type="l", lwd=2, cex.lab=1.5)

Model prediction: plot response curve & empirical data

> occ <- which(calib$pa==1)

> abs <- which(calib$pa==0)

> hall <- hist(calib$tmin, breaks=tmin$tmin, plot=FALSE)

> hocc <- hist(calib$tmin[occ], breaks=tmin$tmin, plot=FALSE)

> mat <- matrix(c(2, 1), 2, 1, byrow=TRUE)

> nf <- layout(mat, width=4, height=c(1,3), respect=TRUE)

> layout.show(nf)

> par(mar=c(5, 4, 1, 1))

> plot(calib$tmin, calib$pa, type="n", cex.lab=1.5)

> points(calib$tmin[occ], calib$pa[occ], col="green", pch=3)

> points(calib$tmin[abs], calib$pa[abs], col="red", pch=4)

> lines(tmin$tmin, p, lwd=2)

> par(mar=c(0, 4, 1, 1))

> barplot(hall$counts, axes=FALSE, col="red")

> barplot(hocc$counts, axes=FALSE, col="green", add=TRUE)

Model prediction: map probabilities of occurrence

> tmin <- getValues(TMIN)

> p <- predict(M1, newdata=tmin, type="response")

> p <- setValues(TMIN, p)

> colAP <- seqPalette(n=100, name="Greens")

> plot(p, col=colAP)

Model prediction: map predicted occurrences & absences

> p <- getValues(p)

> n <- rep(1, length(p))

> size <- rep(1, length(p))

> p <- rbinom(n=n, size=size, prob=p)

> p <- setValues(TMIN, p)

> plot(p, col=c("gray", "green"), legend=FALSE)

Model evaluation: Area Under Curve (AUC), sensitivity & specificity

> p <- predict(M1, newdata=valid, type="response")

> occ <- which(valid$pa==1)

> abs <- which(valid$pa==0)

> e <- evaluate(p=p[occ], a=p[abs])

> class(e)

[1] "ModelEvaluation"

attr(,"package")

[1] "dismo"

class : ModelEvaluation

n presences : 1293

n absences : 8017

AUC : 0.7226648

cor : 0.2876606

max TPR+TNR at : 0.1183897

Model evaluation: confusion matrix

true positive (tp) true negative (tn) false positive (fp) false negative (fn)

Observations

1 tp fp

0 fn tn

sensitivity or True Positive Rate (TPR) specificity or True Negative Rate (TNR)

Sensitivity & specificity

𝑇𝑃𝑅 =𝑡𝑝

𝑡𝑝 + 𝑓𝑛∈ 0 ∶ 1

𝐹𝑃𝑅 =𝑓𝑝

𝑓𝑝 + 𝑡𝑛= 1 − 𝑇𝑁𝑅 ∈ 0 ∶ 1

𝑇𝑁𝑅 =𝑡𝑛

𝑓𝑝 + 𝑡𝑛∈ 0 ∶ 1

Model evaluation: confusion matrix for several threshold values

> head(e@t, 4)

[1] 0.00569990 0.00690331 0.00868921 0.00972660

> head(e@confusion, 4)

tp fp fn tn

[1,] 1293 8017 0 0

[2,] 1293 8009 0 8

[3,] 1293 8001 0 16

[4,] 1293 7997 0 20

> tail(e@t, 4)

[1] 0.6928991 0.7672997 0.7673997 0.7674997

> tail(e@confusion, 4)

tp fp fn tn

[235,] 1 8 1292 8009

[236,] 0 2 1293 8015

[237,] 0 2 1293 8015

[238,] 0 0 1293 8017

Vector of threshold

values

Each row is the confusion matrix of a given threshold

Model evaluation: Receiver Operating Characteristic (ROC) curve

> plot(e, "ROC", cex.lab=1.5, col="blue", type="l", lwd=2)

Model evaluation: threshold value maximizing TPR + TNR

> plot(e@t, e@TPR+e@TNR, type="l", lwd=2, cex.lab=1.5)

> max(e@TPR+e@TNR)

[1] 1.350668

> e@t[which(e@TPR+e@TNR==max(e@TPR+e@TNR))]

[1] 0.1183897

> mat <- e@confusion[which(e@TPR+e@TNR==max(e@TPR+e@TNR)), ]

tp fp fn tn

980 3265 313 4752

> num <- (mat[1]*mat[4])-(mat[3]*mat[2])

> den <- (mat[1]+mat[2])*(mat[3]+mat[4])

> tss <- num/den

[1] 0.1690632

Model evaluation: True Skill Statistic (TSS) at max[TPR + TNR]

Observations

1 tp fp

0 fn tn

𝑇𝑆𝑆 =𝑡𝑝 ∗ 𝑡𝑛 − (𝑓𝑛 ∗ 𝑓𝑝)

𝑡𝑝 + 𝑓𝑛 ∗ (𝑓𝑝 + 𝑡𝑛)∈ −1 ∶ 1

Model fitting: test for second-order polynomial relationship

> M2 <- glm(pa~poly(tmin, 2), family=binomial, data=calib)

> summary(M2)

(Intercept) -2.11902 0.01983 -106.844 < 2e-16 ***

poly(tmin, 2)1 -183.17107 4.28577 -42.739 < 2e-16 ***

poly(tmin, 2)2 -30.51618 3.77913 -8.075 6.75e-16 ***

AIC: 26862

> AIC(M1)

[1] 26927.64

Drop in AIC from M1 to

M2, but is it significant?

Model fitting: is it worth complicating the initial model?

> anova(M1, M2, test="Chisq")

Analysis of Deviance Table

Model 1: pa ~ tmin

Model 2: pa ~ poly(tmin, 2)

Resid. Df Resid. Dev Df Deviance Pr(>Chi)

1 37237 26924

2 37236 26856 1 68.009 < 2.2e-16 ***

Model prediction: compare response curves from M1 & M2

> p1 <- predict(M1, newdata=tmin, type="response")

> p2 <- predict(M2, newdata=tmin, type="response")

> lines(tmin$tmin, p1, lty=2, lwd=2, cex.lab=1.5)

> lines(tmin$tmin, p2, lwd=2, cex.lab=1.5)

> leg <- c("M1", "M2")

> legend(x="topright", leg, lwd=c(2, 2), lty=c(2, 1))

Model evaluation: is M2 performing better than M1?

n presences : 1293

n absences : 8017

AUC : 0.7226648

cor : 0.2900685

Model fitting: test the full model

> f <- ~.+poly(tsea, 2)+poly(psea, 2)+ poly(wbal, 2)

> M3 <- update(M2, f)

> summary(M3)

(Intercept) -2.32488 0.02461 -94.487 < 2e-16 ***

poly(tmin, 2)1 -267.56293 9.11505 -29.354 < 2e-16 ***

poly(tmin, 2)2 -117.68537 6.38682 -18.426 < 2e-16 ***

poly(tsea, 2)1 -137.51657 6.16349 -22.311 < 2e-16 ***

poly(tsea, 2)2 69.54516 4.09620 16.978 < 2e-16 ***

poly(psea, 2)1 -63.20803 7.18893 -8.792 < 2e-16 ***

poly(psea, 2)2 52.39951 6.42492 8.156 3.47e-16 ***

poly(wbal, 2)1 2.17989 5.74146 0.380 0.704

poly(wbal, 2)2 19.19854 3.64584 5.266 1.40e-07 ***

AIC: 25746

Model prediction: compare response curves from M1, M2 & M3

> tsea <- rep(mean(calib$tsea), 30)

> psea <- rep(mean(calib$psea), 30)

> wbal <- rep(mean(calib$wbal), 30)

> predictors <- cbind(tmin, tsea, psea, wbal)

> predictors <- data.frame(predictors)

> p1 <- predict(M1, newdata=predictors, type="response")

> lines(predictors$tmin, p1, lty=3, lwd=2, cex.lab=1.5)

> lines(predictors$tmin, p2, lty=2, lwd=2, cex.lab=1.5)

> lines(predictors$tmin, p3, lwd=2, cex.lab=1.5)

> leg <- c("M1", "M2", "M3")

> legend(x="topright", leg, lwd=c(2, 2, 2), lty=c(3, 2, 1))

Model prediction: compare probability maps from M1, M2 & M3

> predictors <- stack(TMIN, TSEA, PSEA, WBAL)

> names(predictors)

[1] "bio6" "bio7" "bio15" "wbal"

> names(predictors) <- c("tmin", "tsea", "psea", "wbal")

> p1 <- predict(predictors, M1, type="response")

Model prediction: compare probability maps from M1, M2 & M3

> windows(12, 4)

> par(mfrow=c(1, 3), mar=c(1, 1, 2, 4))

> plot(p1, col=colAP, axes=FALSE, main="M1")

Model evaluation: is M3 performing better than M1 & M2?

n presences : 1293

n absences : 8017

AUC : 0.7651777

cor : 0.3129263

Yes, it does

Model selection: which model shall I use between M1, M2 & M3?

> library(AICcmodavg)

> Mcands <- list(M1, M2, M3)

> Mnames <- c("M1", "M2", "M3")

> aictab(cand.set=Mcands, modnames=Mnames, second.ord=FALSE)

Model selection based on AIC :

K AIC Delta_AIC AICWt Cum.Wt LL

M3 9 25745.85 0.00 1 1 -12863.92

M2 3 26861.63 1115.78 0 1 -13427.81

M1 2 26927.64 1181.79 0 1 -13461.82

𝐴𝐼𝐶𝑊𝑡𝑖 =𝑒−

12∆𝐴𝐼𝐶𝑖

𝑒−12∆𝐴𝐼𝐶𝑘𝑁

𝑘=1

NB: if ΔAICi 0, the ith model is having some support and can be also considered in addition to the best model (cf. model averaging)

Where N is the total number of candidate

models

M3 is the best model

Check residuals from the best model (M3): raw data

> plot(predict(M3), residuals(M3), type="n", cex.lab=1.5)

> abs <- which(calib$pa==0)

> points(predict(M3)[occ], residuals(M3)[occ], col="green")

> points(predict(M3)[abs], residuals(M3)[abs], col="red")

> abline(h=0, lty=2, lwd=2)

Not meaningful for residuals’

diagnosis

Check residuals from the best model (M3): using a spline function

> library(splines)

> resVSfit <- lm(residuals(M3)~bs(predict(M3), degree=8))

> Y <- predict(resVSfit, se=TRUE)

> cisup <- Y$fit+2*Y$se.fit

> ciinf <- Y$fit-2*Y$se.fit

> ord <- order(predict(M3))

> x.coord <- c(predict(M3)[rev(ord)], predict(M3)[ord])

> y.coord <- c(ciinf[rev(ord)], cisup[ord])

> plot(predict(M3), residuals(M3), type="n", cex.lab=1.5)

> polygon(x.coord, y.coord, col="grey", border="grey")

> lines(predict(M3)[ord], ciinf[ord], lty=2, lwd=2, col="red")

> lines(predict(M3)[ord], cisup[ord], lty=2, lwd=2, col="red")

> abline(h=0, lty=2, lwd=2)

Check residuals from the best model (M3): spatial autocorrelation

> library(spdep)

> sel <- sample(c(1:nrow(calib)), 1000, replace=FALSE)

> xy <- as.matrix(cbind(calib$x[sel], calib$y[sel]))

> nb <- dnearneigh(xy, d1=0, d2=25, longlat=TRUE)

Neighbour list object:

Number of regions: 1000

Number of nonzero links: 4578

Percentage nonzero weights: 0.4578

Average number of links: 4.578

49 regions with no links:

4 26 40 41 44 62 77 107 132 153 ...

In kms

> plot(nb, xy, cex=0.7, pch=20)

Neighboorhood relationships

> z <- residuals(M3)[sel]

> c <- sp.correlogram(nb, z, "I", order=15, zero.policy=TRUE)

> plot(correlog.res)

Spatial autocorrelation in the residuals of M3 remains significant within 0-75 kms

Solutions

look at relationships between the residuals & the predictors look for additional predictors able to capture the spatial signal incorporate an autocovariate in the model’s predictors (SAR/RAC) incorporate spatial eigenvectors (PCNMs) in the model’s predictors

NB: if you incorporate an autocovariate in the model’s predictors, compute it so that it captures spatial autocorrelation in the residuals (cf. 0-75 kms)

Presence-Only

Maximum entropy modelling (Maxent)

Key papers (Phillips et al., 2006; Elith et al., 2011)

Main concepts & terms used in Maxent (Elith et al., 2011)

occurrences: geographic coordinates where a species is present background: a random sample of coordinates from the mask mask: a grid of locations to be sampled for the background covariates: set of predictor variables at each sampled location features: transformations (linear, quadratic, product, hinge, etc.) regularization: smoothing of the model to limit complexity prevalence: 0.5 by default (the strongest assumption)

NB: background points can be sampled at locations where species occurrences have been recorded and thus there is independency between the background sample & occurrence records, which is not the case for pseudo-absences

Background matters (VanDerWal et al., 2009)

Some golden rules for Maxent modelling (Elith et al. 2011)

make a mask including the full environmental range of the species use the mask to exclude areas which were probably not surveyed exclude also areas where the species cannot have dispersed use a reduced set of covariates based on meaningful assumptions project covariate grids using equal area projections (area effect) use product features only if you are interested in interactions increase regularization parameters to get more diffused outputs decrease regularization parameters to get more localized outputs tune the prevalence parameter for cross-species comparisons

1. Download Maxent

2. Copy/paste the file “maxent.jar” into C:/…/R/library/dismo/java

3. Use the maxent() function from the R-package “dismo”

Application: distribution of Acer Pseudoplatanus L. in France

we assume that we only have occurrence records across France we use the French territory as a mask for the background sample

Prepare the occurrence records to be used for training the model

> library(rgdal)

> EPSG <- make_EPSG()

> laea <- EPSG[which(EPSG$code==3035), "prj4"]

> occtrain <- data.frame(cbind(calib$x[occ], calib$y[occ]))

> names(occtrain) <- c("lon", "lat")

> occtrain <- SpatialPoints(occtrain, fr@proj4string)

> occtrain <- spTransform(occtrain, CRS(laea))

Prepare the mask & covariates

> mask <- rasterize(fr, TMIN, field=1)

> crs <- CRS(laea)

> res <- c(1000, 1000)

> mask <- projectRaster(mask, method="ngb", res=res, crs=crs)

> predictors <- stack(TMIN, TSEA, PSEA, WBAL)

> predictors <- projectRaster(predictors, mask)

> predictors <- predictors*mask

> names(predictors)

[1] "layer.1" "layer.2" "layer.3" "layer.4"

> names(predictors) <- c("tmin", "tsea", "psea", "wbal")

Fit the model with Maxent’s default parameters

> arg <- c("responsecurves=TRUE")

> M4 <- maxent(x=predictors, p=occtrain, args=arg)

> class(M4)

[1] "MaxEnt"

attr(,"package")

[1] "dismo"

Display all Maxent’s

outputs on an html page

Maxent’s outputs

Variable contributions

> plot(M4, pch=20, cex.lab=1.5, col="blue")

Response curves

> response(M4, cex.lab=1.5, col="black")

Predictions of the geographic distribution of Sycamore maple

> p4 <- predict(M4, predictors)

class : RasterLayer

dimensions : 1182, 1206, 1425492 (nrow, ncol, ncell)

resolution : 1000, 1000 (x, y)

extent : 3086567, 4292567, 2025783, 3207783

coord. ref. : +proj=laea +lat_0=52 +lon_0=10 ...

data source : in memory

names : layer

values : 0.007738914, 0.8186277 (min, max)

Predictions of the geographic distribution of Sycamore maple

> plot(p4, col=colAP, axes=FALSE, box=FALSE)

Occurrence records from the train dataset overlaid on the map

> points(occtrain, pch=3, cex=0.5)

Predictions from GLM & Maxent with the same set of predictors

> windows(16, 8)

> par(mfrow=c(1, 2), mar=c(1, 1, 2, 4))

> p3 <- projectRaster(p3, mask)

> p3 <- p3*mask

> plot(p3, col=colAP, axes=FALSE, main="GLM")

> plot(p4, col=colAP, axes=FALSE, main="Maxent")

Evaluation of maxent performances

> occtest <- data.frame(cbind(valid$x[occ], valid$y[occ]))

> names(occtest) <- c("lon", "lat")

> occtest <- SpatialPoints(occtest, fr@proj4string)

> occtest <- spTransform(occtest, CRS(laea))

> bg <- randomPoints(mask=mask, n=nrow(valid)-length(occ))

> e <- evaluate(model=M4, p=occtest, a=bg, x=predictors)

n presences : 1354

n absences : 7940

AUC : 0.8145358

cor : 0.3862409

Performs better than M3 (GLM)

Change Maxent’s default parameters to fit a smother model

> a1 <- c("product=FALSE")

> a2 <- c("threshold=FALSE")

> a3 <- c("hinge=FALSE")

> a4 <- c("responsecurves=TRUE")

> M5 <- maxent(predictors, occtrain, args=c(a1, a2, a3, a4))

Response curves

> response(M5, cex.lab=1.5, col="black")

Compare predictions between M4 & M5

> windows(16, 8)

> par(mfrow=c(1, 2), mar=c(1, 1, 2, 4))

> p5 <- predict(M5, predictors)

> plot(p4, col=colAP, axes=FALSE, main="M4 (Maxent)")

> plot(p5, col=colAP, axes=FALSE, main="M5 (Maxent)")

Model evaluation

> e <- evaluate(model=M5, p=occtest, a=bg, x=predictors)

n presences : 1354

n absences : 7940

AUC : 0.8001492

cor : 0.3670693

Data analysis in R: linking species distribution data to ... · PDF fileThe niche concept 2/90...

Documents

MARKET NICHE Adhesives, Sealants, Glues, & Caulks ... · PDF file850-983-4777 | m Adhesives, Sealants, Glues, & Caulks Manufacturing MARKET NICHE POSITIONS NICHE Sales

MARKET NICHE Adhesives, Sealants, Glues, & Caulks ... · PDF fileAdhesives, Sealants, Glues, & Caulks Manufacturing ... distributing tape on a global basis. ... construction market

AVON & SOMERSET · PDF fileAVON & SOMERSET CONSTABULARY Project Atlas (Niche) - Project Management Review FINAL ... Use of DigiSpocs and uPerform to support users at go live

MARKET NICHE Soap and Detergent Additives POSITIONS NICHE R&D · PDF fileMARKET NICHE Soap and Detergent Additives ... company that operates worldwide as a leading manufacturer of

Trade Union Identities and the Role of Niche Unionism ... · niche union identity and niche unionism 21 1.1 Developing an understanding of union identities, niche unions and niche

niche site project FIND YOUR NICHE€¦ · niche site project FIND YOUR NICHE (in 10 minutes) This is a critical phase and your eventual success hinges on selecting a niche. Do not

MARKET NICHE Oilfield & Drilling Chemicals POSITIONS NICHE · 2018-07-22 · MARKET NICHE Oilfield & Drilling Chemicals POSITIONS NICHE R&D JOB TITLE Production Chemist CLIENT WSP

MARKET NICHE Resins POSITIONS NICHE Sales · MARKET NICHE Resins POSITIONS NICHE Sales ... specialty resins for the ion exchange, catalyst, ... Since all Purolite produces are ion

Carl Landau Grand Poobah, Niche Media Niche Your Niche

Niche Cruise Marketing Alliance: NICHE Cruise Specialist Certification Program

MARKET NICHE Soap & Detergent · PDF fileSoap & Detergent Manufacturers ... parquet flooring business of Racine Hardware Company in 1886 and . ... market analysis including concept,

LUMIPLUS DESIGN NICHE NICHE LUMIPLUS DESIGN … · lumiplus design niche niche lumiplus design nichos lumiplus design nicchia lumiplus design nische fÜr lumiplus design nicho lumiplus

Niche Sites

MARKET NICHE Consumer Products POSITIONS NICHE Sales€¦ · MARKET NICHE Consumer Products POSITIONS NICHE Sales VP of Innovation & Business Dev’t. JOB TITLE CLIENT Georgia-Paci˜c

Niche Marketing

Climate Change and Biodiversity in North America Jorge Soberon University of Kansas and CONABIO

34 Soberon Llorente 1993

A guide to working in West End & niche law firms - The SR ... · PDF fileIntroduction Unlike most of our competitors, Taylor Root has a dedicated team focused on the West End and niche

Regional multinational company Corporate strategy Niche market player Niche market player – therapeutic niche - womens healthcare – geographic niche -

Performance Coordinates · 2020. 7. 27. · SPLENDOR . Mineral. SPLENDOR Anti-Microbial. Ivory NICHE. Melon NICHE. Cranberry NICHE. Azure NICHE. Turquoise PALERMO. Dove PALERMO. Buff