26
All for one or One for All? Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather Roberts and Harold Zald August 10, 2012 Ecological Society of America Annual Meeting Portland, Oregon

Emilie Henderson, Janet Ohmann , Matthew Gregory, Heather Roberts and Harold Zald

  • Upload
    saul

  • View
    34

  • Download
    1

Embed Size (px)

DESCRIPTION

All for one or One for All? Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann , Matthew Gregory, Heather Roberts and Harold Zald August 10, 2012 Ecological Society of America Annual Meeting Portland, Oregon. - PowerPoint PPT Presentation

Citation preview

Page 1: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

All for one or One for All?

Mapping many species individually vs. simultaneously with random forest.

Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather Roberts and Harold Zald

August 10, 2012Ecological Society of America Annual Meeting

Portland, Oregon

Page 2: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Species Distribution Modeling

• Been around for a long time, and has exploded over the last decade.

With the rise of new powerful statistical techniques and GIS tools, the development of predictive habitat distribution models has rapidly increased in ecology.

– Guisan and Zimmerman 2000• Generalized Linear/Additive Models • Neural networks• Bayesian models• Ordination• Classification methods

• Web of Knowledge: ‘species distribution’– 2000 - 2001: 556 articles– 2011 – 2012: 1,389 articles

Page 3: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

SDM Uses

From Giusan and Thuiller 2005

Page 4: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Strategies for community-level modeling

• ‘assemble first, predict later’

• ‘predict first, assemble later’

• ‘assemble and predict together’

--Ferrier & Guisan 2006

Objective: Compare two strategies for community-level predictive mapping.

Page 5: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

You Are Here

Pacific silver fir Abies amabilisGrand fir/ White fir Abies grandis / concolorSubalpine fir Abies lasiocarpaNoble fir / Shasta red fir Abies procera/shastensisBigleaf maple Acer macrophyllusRed alder Alnus rubraMadrone Arbutus menzieziiIncense cedar Calocedrus decurrensMountain mahogany Cercocarpus ledifoliusGiant chinkapin Chrysolepis chrysophyllaPacific Dogwood Cornus nutalliiOregon ash Fraxinus latifoliaWestern Juniper Juniperus occidentalisNo Trees PresentLodgepole pine Pinus contortaEngelman spruce Picea engelmaniiJeffrey Pine Pinus jeffreyiiSugar pine Pinus lambertianaWestern white pine Pinus monticolaPonderosa pine Pinus ponderosaBlack cottonwood Populus balsamifera ssp trichocarpaBitter cherry Prunus emarginataDouglas-fir Pseudotsuga menzieziiOregon white oak Quercus garryanaCalifornia black oak Quercus kelloggiiPacific yew Taxus brevifoliaWestern red cedar Thuja plicataWestern hemlock Tsuga heterophyllaMountain hemlock Tsuga mertensiana

Plot Data

Forest Inventory and Analysis Annual Plots: 1948 plots

Techniques – Random Forest Based (Breiman 2001, Cutler et al. 2007)

Binary prediction (R package: randomForest, Liaw & Wiener 2002)

Continuous prediction

Nearest Neighbor Imputation (R package: yaImpute, Crookston & Finley 2008)

Spatial Data Layers

Climate (from PRISM climate data)

Soil Parent Material (from SSURGO/Soil Resources Inventory)

Topography (from National Elevation Dataset)

Spectral reflectance (LANDSAT)

Page 6: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

|SMRTP < 228.5

ANNTMP < 606

TC3 < -1433.5

SMRTP < 244.5

FALSE TRUEFALSE

FALSE FALSE

|SMRTMP < 1169

TC3 < -1440.39 SMRTP < 246.5

ANNTMP < 748.5FALSE TRUE

FALSE FALSEFALSE

|SMRTMP < 1223.5

SMRTP < 228.5

TC1 < 2164.61

SMRTP < 246.5

TRUE FALSEFALSE FALSE FALSE

|SMRTP < 228.5

DEM < 1268.5

TC1 < 2162.89

SMRTP < 244.5

FALSETRUE FALSE

FALSE FALSE

|SMRTP < 228.5

ANNTMP < 611.5

TC3 < -1239.17

SMRTP < 268.5

FALSE TRUEFALSE

FALSE FALSE

|SMRTP < 228.5

ANNTMP < 611.5

TC3 < -1240.94

SMRTMP < 1327.5

FALSE TRUEFALSE

FALSE FALSE

# True / # Trees = 4/6 = .66

For RF Regression, predicted value for a pixel is the average of all the predictions of nodes.

Page 7: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Random forest -- Nearest-Neighbor imputation

Imputation = Filling in missing values from existing values.

Page 8: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

studyarea

(2) Place new pixel

withinfeature

space

(3) find nearest-neighbor plot within feature

space

(4) impute nearest

neighbor’s Plot ID # to

pixel

Methods: k-NN

feature space geographic space

Elevation

Rainfall

(1)Place plots

within feature space

“Assemble and Predict Together”

Page 9: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

(2) calculate

axis scores of pixel from

mapped data layersstudyarea

(3) find nearest-neighbor plot

in gradient space

(4) impute nearest

neighbor’s Plot ID# to

pixel

Methods: GNN (Ohmann and Gregory 2002)

gradient space geographic spaceCCA

Axis 2(e.g., Temperature,

Elevation)

CCAAxis 1

(e.g., Rainfall, local

topography)

(1)conductgradient

analysis ofplot data

Page 10: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

studyarea

Methods: Random Forest Nearest Neighbor Imputation

Random Forest space geographic space

Page 11: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

|SMRTP < 228.5

ANNTMP < 606

TC3 < -1433.5

SMRTP < 244.5

FALSE TRUEFALSE

FALSE FALSE

|SMRTMP < 1169

TC3 < -1440.39 SMRTP < 246.5

ANNTMP < 748.5FALSE TRUE

FALSE FALSEFALSE

|SMRTMP < 1223.5

SMRTP < 228.5

TC1 < 2164.61

SMRTP < 246.5

TRUE FALSEFALSE FALSE FALSE

|SMRTP < 228.5

DEM < 1268.5

TC1 < 2162.89

SMRTP < 244.5

FALSETRUE FALSE

FALSE FALSE

|SMRTP < 228.5

ANNTMP < 611.5

TC3 < -1239.17

SMRTP < 268.5

FALSE TRUEFALSE

FALSE FALSE

|SMRTP < 228.5

ANNTMP < 611.5

TC3 < -1240.94

SMRTMP < 1327.5

FALSE TRUEFALSE

FALSE FALSE

23

4

567

89 10

3 3

3 1

11

77

777

5

5

5

2

2 2

5 4

68

Nearest Neighbor Plot: #3Second Nearest Neighbor: #5

Page 12: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Strategies for communitiy-level modeling

• ‘assemble first, predict later’

• ‘predict first, assemble later’– Random forest – classification (binary prediction)– Random forest – regression (continuous prediction)

• ‘assemble and predict together’– Random forest – imputation (continuous prediction)

--Ferrier & Giusan 2006

Page 13: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Dimensions of Map Accuracy

• Single-species metrics– Range – presence/absence– Abundance – How much basal area?– Is the distribution of values predicted realistic?

• Community-level metrics– Diversity– Composition

Page 14: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Sensitivity: True positives/(True Positives + False Negatives)

Specificity: True Negatives/(True Negatives + False Positives)

True Skill Statistic (TSS): Sensitivity + Specificity - 1

Page 15: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Root Mean Square Difference:

17.72

18.46

0 50 100 150

0.40.50.60.70.80.91.0

Value

Cum

ulat

ive

% o

f da

tase

t

RF_CRFNNPlot Data

Page 16: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Sen

sitiv

ityS

peci

ficity

TSS

0.0

0.2

0.4

0.6

0.8

1.0

Page 17: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

0 50 100 150 200

0.40.50.60.70.80.91.0

Value

Cum

ulat

ive

% o

f dat

aset RF_C

RFNNPlot DataRoot Mean Square Difference:

21.34

18.73

Page 18: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Single Species Models• Range

– Random Forest – Binary: best– Random Forest – Nearest Neighbor: acceptable– Random Forest -- Continuous: fail

• Abundance (Basal Area)– RMSD

• Random Forest – Continuous: best• Random Forest – Nearest Neighbor: acceptable• Random Forest – Binary: NA

– Empirical Cumulative Distribution Functions: (predicted value distributions)

• Random Forest – Nearest Neighbor: best• Random Forest – Continuous: fail• Random Forest – Binary: NA

Page 19: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Obse

rvatio

ns

RF_B

RF_C

RFN

N_C

Alpha diversity

0

5

10

15

20

Diversity: Species Richness and Evenness

Obse

rvatio

ns

RF_C

RFN

N_C

Shannon diversity

y0.0

0.5

1.0

1.5

Page 20: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Beta Diversity

Obse

rvatio

ns

RF_B

RF_C

RFN

N_C

Beta

Div

ers

ity

0

2

4

6

8

10

12

Page 21: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

1 1 3 5 6

1 1 3 5 4

2 2 3 5 4

2 2 3 4 4

2 2 3 4 4

Average Alpha Diversity for Blue Pixel: 3.04

Page 22: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

1 1 3 5 6

1 1 3 5 4

2 2 3 5 4

2 2 3 4 4

2 2 3 4 4

Page 23: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Results – Composition

RF_B

RF_C

RFN

N_B

Bray-Curtis, Binary

0.0

0.2

0.4

0.6

RF_

C

RFN

N_C

Bray-Curtis, Continuous

0.0

0.1

0.2

0.3

0.4

What is the Bray-Curtis distance between our observed and predicted communities?

Page 24: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Discussion• Species absences are an important dimension of

composition– Disturbance?– Succession?– Competition/Facilitation?– Dispersal limitations?

• Community assembly rules can be used to help refine mapped species lists. (e.g., Guisan and Rahbek, 2011)

• But… imputation avoids the pitfalls & complications of re-assembling communities after mapping because they are never taken apart.

Page 25: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Conclusions• Practical Considerations:

– Models of individual species may be • Strongest in one dimension• Useful for understanding species’ ecology• The best option for some types of available data (e.g.,

presence-only data from museum specimens)

– Nearest Neighbor mapping is a useful tool for building multipurpose maps.

• Ranges and abundances• Composition• Diversity

Page 26: Emilie Henderson, Janet  Ohmann , Matthew Gregory, Heather Roberts and Harold  Zald

Acknowledgements

• Nationwide Forest Imputation Study

• Landscape Ecology Modeling Mapping and Analysis team in Corvallis.