Best Model Dylan Loudon. Linear Regression Results Erin Alvey

Preview:

Citation preview

Best Model

Dylan Loudon

Linear Regression Results

Erin Alvey

Who will you trust?

• Field technicians?

• Software programmers?

• Statisticians?

• Instructors?

• GIS technicians?

• Other researchers?

• Yourself?

Regression (Correlation) Modeling• Creates a model in N-Dimensional

“Hyper-Space”

• Defined by:– Covariates– Response variables– Mathematics used to create the model– Statistics used to optimize parameters– Options for model evaluation– Predictor variables

Multiple Linear Regression

Linear Regression: 2 Predictors

Mathworks.com

Non-Linear Regression

Regression Methods• Continuous Regression:

– Linear Regression– Generalized Linear Models (GLM)– Generalized Additive Models (GAMs)

• Categorical Regression (trees):– Regression Trees– Classification and regression trees (CART)

• Machine Learning:– Maximum Entropy (Maxent)– NPMR, HEMI, BRTs, etc.

Brown Shrimp Size

• Add graph from work

Terminology

• Plant uses:– Measured value and response variable– Explanatory variable

• I prefer:– Response variable– I’ll use “measured value” to identify measured

values in field data– Covariate: Explanatory variable used to build

the model– Predictor: Explanatory variable used to predict

Douglas Fir Habitat Model

Hab

itat

Qua

lity

Precipitation (mm)0 10000

1

PredictorModel

Prediction

PredictorModel

Prediction

Field Data

Covariate

Model Selection and Parameter Estimation

PredictorModel

Prediction

Field or Sample Data

Covariate

Model Selection and Parameter Estimation

Model Validation

Douglas-Fir sample dataLat Lon F3 MeanTempPrecip

40.893634 -121.802272 41 69 107040.987702 -122.117088 45 96 140640.987702 -122.117088 40 96 140640.987702 -122.117088 43 96 140640.987702 -122.117088 42 96 140640.987702 -122.117088 46 96 1406

Create the Model

Model“Parameters”

Precip

To Points

Extract

Text File

To Raster

X Y MeanTempPrecip Predict-123.677 41.61906 71 1548 193.6-123.344 41.61906 55 1212 150.4-123.011 41.61906 79 887 187.5667-122.677 41.61906 68 584 155.4667-122.344 41.61906 102 513 221.1

Prediction

Attributes

Data

• Response Variable– From the field data (sample data)

• Covariates– From the field or remotely sensed

• Predictors– Typically remotely sensed – Sample as covariates for training– Can be different for predicting to new

scenarios

Response Variable

• What is the:– Spatial uncertainty?– Temporal uncertainty?– Measurement uncertainty?

• Will it answer your question?

Covariate Variables

• What is the:– Spatial uncertainty?– Temporal uncertainty?– Measurement uncertainty?

• How well does the collection time of the covariates match the field data?

• Do they co-vary with the phenomena?

• Do the covariates “correlate”?

Types of uncertainty

• Accuracy (bias)

• Precision (repeatability)

• Reliability (consistency of a set of measurements)

• Resolution (fineness of detail)

• Logical consistency– Adherence to structural rules, attributes,

and relationships

• Completeness

Types of Errors• Gross errors

– Transcription– Sinks in DEMs

• Random– Estimated using probability theory

• Systematic errors– “Drift” in instruments– Dropped lines in Landsat

Gross Errors

• Lat/Lon:– Reversed– 0, names, dates, etc.

• Dates:– Extended in databases

• Measurements:– Inconsistent units– Inconsistent protocols– What can you expect from a field team?

Occurrences of Polar Bears

From The Global Biodiversity Information Facility (www.gbif.org, 2011)

Systematic Errors

Landsat Scan line Error

Response Variable Qualification Tools• Maps (various resolutions)

• Examine the data values:– How many digits?– Repeating patterns, gross errors?

• “Documentation”

• Measurements:– Occurrences?– Binary: Histogram– Categorical: Histogram– Continuous: Histogram

What’s the Impact on Models?

Significant Digits

• How many digits to represent 1 meter?– Geographic: Lat/Lon?– UTM: Eastings/Northings?

Significant Digits

• Geographic:– 1 digit = 1 degree– 1 degree ~ 110 km– 0.00001 ~ 1.1 meters

• UTM:– 1 digit = 1 meter

Covariate Qualification

• Maps

• Documentation

• Examine the data:– How many digits?

• Integer or floating point?

– Repeating patterns?

• Histograms

CONUS Annual Percip.

Covariate Uncertinaty

0.00

0.20

0.40

0.60

0.80

1.00

1.20-231

-219

-207

-195

-183

-172

-160

-148

-136

-124

-112

-100 -88

-77

-65

-53

-41

-29

-17 -5 7 19 30 42 54 66 78 90 102

Num

ber o

f Pix

els

Scal

ed to

1

Degrees C Times 10

Min Temp of Coldest Month

Min Temp of Coldest Month

0.00

0.20

0.40

0.60

0.80

1.00

1.20-230

-215

-201

-186

-172

-157

-143

-128

-114

-100 -85

-71

-56

-42

-27

-13 2 16 31 45 60 74 88 103

Num

ber o

f Occ

urre

nces

Sca

led

to 1

Degrees C Times 10

Min Temp: Envrionment

Histograms

hist(Temp,breaks=400)

Covariate Correlation

• Correlation Plots

• Pearson product-moment correlation coefficient

• Spearman’s rho – non parametric correlation coefficient

Correlation plots

California Correlations

California Predictors

Response vs. Covariates

• For Occurrences:– Histogram covariates at occurrences vs.

overall covariates

• For Binary Data:– Histogram covariates for each value

• For Categorical Data :– Histogram covariates for each value– Or scatter plots

• For Continuous Data– Scatter plots

Covariate Occurrence Histograms

Precipitation with Douglas-Fir Occurrences

Douglas Fir Model In HEMI 2

Green: Histogram of all of CaliforniaRed: Histogram of Douglas-Fir Occurrences

Doug-Fir Height vs. Precip.

Douglas Fir Height

Terrestrial Predictors

• Elevation:– Slope– Aspect– Absolute Aspect

• Distance to:– Roads– Streams (streamline)

• Climate– Precip– Temp

• Soil Type• RS:

– Landsat– MODIS– NDVI, etc.

Marine Predictors

• Temp• DO2• Salinity• Depth• Rugosity

(roughness)• Current (at depths)• Wind

More Complicated

• Associated species• Trophic levels• Temporal• Cyclical

Predictor Layers

• Means, mins, maxes

• Range of values

• Heterogeneity

• Spatial layers:– Distance to…– Topography: elevation, slope, aspect

Field Data and Predictors

• As close to field measurements as possible

• Clean and aggregate data as needed– Documenting as you go

• Estimate overall uncertainty

• Answer the question:– What spatial, temporal, and measurement

scales are appropriate to model at given the data?

Temporal Issues

• Divide data into months, seasons, years, decades.– Consistent between predictors and

response

• Extract predictors as close to sample location and dates as possible

• Use the “best” predictor layers

Additional Slides

Dimensions of uncertainty

• Space

• Time

• Attribute

• Scale

• Relationships

Basic Tools

• Histograms: What is the distribution of occurrences of values (range and shape)

• Scattergrams: What is the relationship between response and predictor variables and between predictor variables

• QQPlots: Are the residuals normally distributed?

Types of Data

• “God does not play dice”– Einstein

• “the end of certainty”– Prigogine, 1977 Nobel Prize

• What remains is:– Quantifiable probability with uncertainty

Uncertainty Factors

• Inherent uncertainty in the world

• Limitation of human congnition

• Limitation of measurement

• Uncertainty in processing and analysis

Recommended