48
Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R Revolution Analytics Wednesday 13 th June 1300 EST

Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Embed Size (px)

DESCRIPTION

Everything happens somewhere and spatial analysis attempts to use location as an explanatory variable. Such analysis is made complex by the very many ways we habitually record spatial location, the complexity of spatial data structures, and the wide variety of possible domain-driven questions we might ask. One option is to develop and use software for specific types of spatial data, another is to use a purpose-built geographical information system (GIS), but determined work by R enthusiasts has resulted in a multiplicity of packages in the R environment that can also be used.

Citation preview

Page 1: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in

R

Revolution AnalyticsWednesday 13th June 1300 EST

Page 2: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

The instructor

• Dave Unwin• Retired Geography

professor• University of London,

UK• Spatial analysis & GIS in

environmental sciences

Page 3: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Geography is everywhere?

• Everything happens somewhere• Interest is on geo-spatial data at scales from a

few meters to the planet Earth

Page 4: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Spatial analysis is the name given to a variety of methods of analysis in which we use LOCATION as an explanatory variableNB: Not all spatial analysis is spatial statistical analysis and not all spatial analysis is geospatial

Page 5: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Typical Questions

• Is there an unusual clustering of point objects such as crimes/cases of a disease/trees/whatever here that we need to worry about? If so does the point pattern help explain why?

• Does this phenomenon in these areas (counties, states, countries) show spatial variation I need to know about? Does the pattern help explain why?

• What is the most probable value for a continuous variable at this location?

Page 6: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Characteristics of spatial data?

• Almost always given: typically the analyst has no choice in their acquisition, sometimes even their formatting;

• They have additional structure that defines their geometry (point, line/network, area/lattice, surface/field/geostatistical)

Page 7: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Types of spatial data

Objectscan be points, lines/networks or areas/lattices with L0, L1 and L2 dimension of length

Fieldsare self-defining and spatially continuous: everywhere has a value (e.g. temperature, mean annual rainfall, …)

Page 8: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Locating things on Planet Earth

• There are many ways by which we measure our location (place name, address, ZIP/Post code , latitude/longitude, grid reference etc)

• How we locate depends on context and scale• Spatial resolution of location measurements vary• For analysis we (usually) need (x, y) co-ordinates in a projected system • Need for keys to provide these data, often added after the data have been

collected• GPS & GPS-enabled devices are changing this and LBS is a massive and

growing industry that is changing our spatial behaviour

Page 9: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Why R?

• A consistent environment for statistical computing and graphics

• Relative proximity to the data• Easy links to code in numerous languages and to

DBMS• Easier development of new methods• Packages available to perform most analyses• Immensely supportive community

Page 10: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

The sp Spatial Class and its subclasses

Page 11: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

> library(sp)> getClass("Spatial")

Class "Spatial" [package "sp"]

Slots:

Name: bbox proj4stringClass: matrix CRS

Known Subclasses: Class "SpatialPoints", directlyClass "SpatialLines", directlyClass "SpatialPolygons", directlyClass "SpatialPointsDataFrame", by class "SpatialPoints", distance 2Class "SpatialPixels", by class "SpatialPoints", distance 2Class "SpatialLinesDataFrame", by class "SpatialLines", distance 2Class "SpatialGrid", by class "SpatialPoints", distance 3Class "SpatialPixelsDataFrame", by class "SpatialPoints", distance 3Class "SpatialGridDataFrame", by class "SpatialPoints", distance 4Class "SpatialPolygonsDataFrame", by class "SpatialPolygons", distance 2

Page 12: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

What extra?

• A data matrix called turbines:

> turbine_dflon lat

1 -0.8716027, 52.393532 -0.8781694, 52.393403 -0.8656111, 52.393984 -0.8795611, 52.396265 -0.8804666, 52.399136 -0.8726833, 52.396317 -0.8643472, 52.39723

• A spatial data frame called turbines_spdfthat adds three bits of ‘geography’

1. lon/lat become spatial coordinates

2. A coordinate reference system (CRS) to which these relate, and

3. A bounding box (for display)

Page 13: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Why bother?You can do a lot of spatial analysis using a simple Cartesian co-ordinate system such as a unit square, but what happens when you want to merge with other geographic data?Here is a simple example in which turbines_spdf has been written out in KML and then ‘mashed ‘ onto Google Earth to create a ‘pin’ map

Page 14: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Packages for spatial data

Contributed packages with spatial statisticsapplications:

• Utilities: rgdal, sp, maptools• Point patterns: spatstat, VR:spatial, splancs;• Geostatistics: gstat, geoR, geoRglm, fields, spBayes,• RandomFields, VR: spatial, sgeostat, vardiag;• Lattice/area data: spdep, DCluster, spgwr, ade4.

Page 15: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Making sense of it all …

• This is the standard work, written by the authors of sp and some of the packages

• It contains just about all you might want to know about spatial analysis in R circa 2008

• Useful new packages have emerged since then

Page 16: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

For spatial and spatial statistical analysis?

Page 17: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Three use case examples

• Each illustrates the analysis of a particular class of spatial data -- points L0, area L2 and surfaces L3

Page 18: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Patterns in drumlins?

A ‘drumlin’ A ‘swarm of them in NI

Our bit

Page 19: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Adding an ‘edge’ ….

Is the pattern CSR as predicted by Smalley and Unwin (1968) over forty years ago?

Page 20: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Visualizing the pattern using kernel density estimation

Page 21: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Simple tests against CSR ….

Using Baddeley’s spatstat package ….

• > # nearest neighbor tests for comparison• > clarkevans(drumlin_ppp)• naive Donnelly cdf• 1.249917 1.215380 1.233599 • > clarkevans(drumlin_rr)• naive Donnelly cdf• 1.238626 NA 1.215134

Page 22: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Ripleys K(d) function …

NB: Modification to L(est) on RHS due to Mark Rosenstein

Page 23: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

The generic question isIs there an unusual clustering of point objects such as crimes/cases of a disease/trees/ whatever here that we need to worry about? If so does the point pattern help explain why?

In this case we conclude that the pattern ismore regular than random at short range,but then we have no evidence that it isother than CSR at longer ranges

Page 24: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Patterns in disease incidence

• Where does this disease occur?• Although disease affects individuals, almost always

the available information will be aggregated intosome areal unit such as a postal code, electoraldistrict, county, state or country

• Such data are called lattice data and they are visualized using choropleth (‘area-value’) maps

• Our questions are essentially the same as before

Page 25: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Lip cancer incidence in the Districts and Islands of Scotland (Clayton and Kaldor, 1987)

> lips <-readShapePoly("C:\\scotlip", IDvar="RECORD_ID")> plot(lips)

Note this is an ESRI ‘shapefile’ a de facto standard for such lattice data

Page 26: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Plotting the raw numbers?

>library(sp)>spplot (lips, “CANCER”)

This is a complete NO

NONO

Page 27: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Plotting the rates?The data are basically Poisson and the numbers are low, which means that these rates are unstable to quite small changes

Page 28: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Two alternatives

Probabilities Bayesian weighting

Page 29: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Chi-square mapping using ‘Pearsonian’ Residuals

> sum(lips$CANCER)[1] 536> sum(lips$POP)[1] 14979894>pop_exp<-536*(lips$POP/14979894)> chisq <- (lips$CANCER-pop_exp)/sqrt(pop_exp)> lips_chi <- spCbind(lips, chisq)>spplot(lips_chi,"chisq")

Page 30: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

But is does it have a ‘geography’?

=

nnn

n

ww

wwwww

1

2221

11211

W

Moran’s I is used globally

Page 31: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Geographic Structure Scheme

Moran’s I Expected value Variance of (E) z-score

Simple contiguity 0.363263693 -0.019230769 (n=52) 0.006769752 4.6488

Delauney 0.519599336 -0.018181818 0.005068704 7.5537

Distance k=3 0.543587908 -0.018181818 0.008287442 6.1709

Sphere of influence 0.483547126 -0.018181818 0.006087487 6.4306

Gabriel graph 0.371846634 -0.022222222 (n=45) 0.007022745 4.7024

Relative neighbors 0.38126027 -0.02500000 (n=40) 0.01206414 3.6988

We conclude that we are not fooling ourselves!

Page 32: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

The generic question is:Does this phenomenon in these areas (counties, states, countries) show spatial variation I need to know about? Does the pattern help explain why?

We conclude that the pattern is ‘real’, the disease has a geography of interest

Page 33: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Spatial interpolation of a continuous field

In effect we take a sample of ‘heights’ and use these to estimate the value EVERYWHERE across the surface

Page 34: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Spatial interpolation

• The key property of the variable is that it is spatially continuous (everywhere has a value and the gradient is likewise a continuous vector field)

• Given a scatter of sample measurements of the ‘height’ of some continuous variable, what is the value of this field variable at this location?

• There are domain-dependent sub-questions such as: what is the gradient of the field at this point? Or : how much of the variable is below the surface (e.g. rainfall totals)

• Examples might be air temperature, rainfall over some period, values of some mineral resource, ground height etc., etc.

• Sometimes results can be verified by further sampling, but equally often there is no external way to test the results

• The process is called spatial interpolation and there are a great many ways of doing it automatically

Page 35: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Interpolation by Inverse Distance Weighting (IDW)

• Estimate each and every location on a very fine grid using an inverse distance weighted sum of the height values of neighboring control points

• Uses the gstat package:• A parameter ‘e’ controls the degree of

smoothing

Page 36: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Rendering

IDWe=2.0

IDW e=1.0

IDW e=3.0

Page 37: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Issues in IDW

• Produces ring contours or bull’s eyes• No way of assessing the likely errors involved• No theoretical reason for the choice of the

distance exponent to be used • Undesirable side effects if the control data are

clustered• But it corresponds fairly well to what a human

might draw

Page 38: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Geostatistics: making use of spatial dependence in interpolation

• For points and areas spatial dependence can complicate any statistical analysis using standard methods

• Can we characterise the spatial dependence across a field and use it to produce better interpolations?

Page 39: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Variography: the semi-variogram ‘cloud’

Page 40: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Summary semi-variogram

We fit one or other of the plausible modelsto these data to derive a function that describes the spatial dependence

Page 41: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Interpolation by Kriging

Error of the estimates can also be mapped:

Page 42: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

The generic question is:

What is the most probable value for a continuous variable at this location?

We have our estimates over the entire area

Page 43: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Some R-fun (1) : using dismo>library(XML) #needs this> library(rgdal) #and this>library (dismo)> place<-geocode("Maidwell, Northamptonshire, UK") #the address needs to have enough to be recognized> place # the place object is a vector of length 7 with a bounding box:ID lon lat lonmin lonmax

latmin latmax1 1 -0.9030642 52.38524 -0.938073 -0.8710494 52.37016 52.40107

location1 Maidwell, Northamptonshire, UK

> size<-extent(unlist(place[4:7])) #what does this do?> map<-gmap(size,type="satellite")> plot(map)> map<-gmap(size,type="roadmap")> plot(map)

To find places and plot them using Google Earth and Maps™

Page 44: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Where I live …

Aerial photography Google Maps™

Page 45: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Or (slightly) better known?

> place<-geocode("The White House, Washington, USA")> size<-extent(unlist(place[4:7]))> map<-gmap(size,type="satellite")> plot(map)

Page 46: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Some R Fun (2): exporting KML

• Due to James Cheshire UCL

• The London Bicycle Hire system

> library(maptools)> library(rgdal)> cycle <-read.csv("London_cycle_hire_locs.csv", header=TRUE)> plot(cycle$X,cycle$Y)

Page 47: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

Some R Fun (2): exporting KML (continued)

• > coordinates(cycle)<- c("X","Y")• > BNG<-CRS("+init=epsg:27700")• > proj4string(cycle) <- BNG• >p4s <- CRS("+proj=longlat

+ellps=WGS84 +datum=WGS84")• > cycle_wgs84 <-

spTransform(cycle,CRS=p4s)• > writeOGR(cycle_wgs84,

dsn="london_cycle_docks.kml", layer= "cycle_wgs84", driver="KML", dataset_options=c("NameField=name"))

Page 48: Finding Meaning in Points, Areas and Surfaces: Spatial Analysis in R

The End

• Taking it further:• Applied Spatial Data Analysis with R (Bivand,

Pebesma and Gomez-Rubio (2008)• Spatial Statistics with R commences 14th

December 2012 at Statistics.com ™

QUESTIONS ARE WELCOME