Spatial analysis: a roadmap David O’Sullivan University of Auckland School of Geography and Environmental Science [email protected]

Spatial analysis: a roadmap

David O’SullivanUniversity of AucklandSchool of Geography and Environmental Science

[email protected]

GIS and Population Science - Penn State - June 12, 2006 - David O'Sullivan 2 / 51

Overview• A definition of spatial analysis

– Data types and basic questions

• The bad news: classic problems of spatial analysis– Spatial dependence

• The good news: potential of spatial analysis– Some generally useful concepts in the analysis of

spatial data


One definition of spatial analysis• According to O’Sullivan and Unwin (2003) in

Geographic Information Analysis:

“concerned with investigating the patterns that arise as a result of processes that may be operating in space. Techniques and methods to enable the representation, description, measurement, comparison and generation of spatial patterns are central to the study of geographic information analysis”

O'Sullivan, D. and Unwin, D. J. 2003. Geographic Information Analysis. Wiley:

Hoboken, NJ.


What else is ‘spatial analysis’?• It depends on your point of view:

1.Spatial data manipulation, often in a GIS, is frequently referred to as ‘spatial analysis’

2.Spatial data analysis is descriptive and exploratory

3.Spatial statistical analysis employs statistical methods to investigate data with respect to some statistical model

4.Spatial modeling is about constructing models to predict or to better understand spatial phenomena


Auckland schools


Two key aspects in spatial analysis

• Spatial data– What types of spatial data exist?

• Applying standard statistical ideas to spatial data– What problems does this introduce?– Why is spatial data special?


Spatial data

• There are, broadly speaking, two main ways of representing the world:

Vector objects Raster fields


Vector object typesPoints

Cities, people, houses, schools, crimes, disease incidence and mortality…

LinesDrainage, road, communications, power

networks, commutes…

AreasCensus districts, cities, boroughs,

townships, school enrolment zones, police precincts, land cover units…


• Field data are useful when a phenomenon is measurable at all locations– Field data are common for natural

phenomena—air pressure, wind speed etc. – In social/human/population geography field

data are sometimes used to represent estimated densities, since a density may be considered to be measurable everywhere (e.g., crime mapping)

Fields


Attribute data types

• At each location, we have measurements of various attributes

• There are two broad classes (familiar from statistics):– Numerical data, which are either ratio or

interval; and– Categorical data, which are either ordinal

or nominal


The entity-attribute model

Points

Lines

Areas

Fields

Nominal

Ordinal

Interval

Ratio

State highway

Spot height

Electoral districts


Some possible combinations

Source: O'Sullivan and Unwin. 2003.


Some reservations … or … not so fast!

• This model is reductive– In particular, it is a statistician’s and GIS

person’s view– How do you represent complex

ethnographic data in this framework? ‘Home’? Photographs? Sound-clips? Hyperlinks?

– In addition scale is often a complicating factor


OK, but…why the interest in spatial data?

• We assume that location makes a difference, so…– statistical distributions remain relevant…– … but now spatial patterns in the data are

also of interest…– … and the possible relationships between

the two are really what matter


Source: O'Sullivan and Unwin. 2003.

Statistics and spatial analysis• Statistics is about

– Describing observed data

– Comparing observed data to expectations based on a statistical model

– Inferring from the comparison whether the observed data is compatible with the assumptions


Simplification in statistics• To make the mathematics work,

assumptions about observed data are made. In particular that–Observations are random samples from a

population–Observations are independent of one

another

• But… these assumptions are never true of spatial data


The problem with spatial dataor: why “spatial [data] is special”

• In a nutshell: “Everything is related to everything else, but near

things are more related than distant things ”Tobler, W. 1970. A computer movie simulating urban growth

in the Detroit region, Economic Geography 46, 234–40

– This is sometimes called the First Law of Geography … because it is generally true!

– It follows that: spatial data cannot be considered independent random samples from a population


Why “spatial is special” more specifically

• Some commonly identified ‘problems’ with spatial data are:– Autocorrelation– The modifiable areal unit problem (or

MAUP)– Scale effects– Non-uniformity of space and ‘edge effects’


Spatial autocorrelation

• This follows directly from the observation that “… near things are more related than distant things”– Spatial data are self-correlated – There is redundancy in spatial data

because observations made at locations near one another tend to be similar


• Percent Pakeha by meshblock• Positive autocorrelation is much more

common in socio-economic data


Problem, or opportunity?

• Autocorrelation is only a problem if we choose to see it as one. Equally, we can– Describe or measure the autocorrelation

structure of spatial data, in order to characterize it

– Potentially use the description to improve subsequent analysis (e.g., simple interpolation becomes kriging in Geostatistics)


Describing autocorrelation

• Two broad effects can be considered:– First order variation

• Large scale variation in the mean value—a trend or background effect

– Second order variation• Local variation perhaps due to interaction

effects between observations• May be isotropic (no directional effects) or

anisotropic (with a directional component)


First or second order?

• In practice, 1st and 2nd order effects are hard to distinguish

Source: O’Sullivan and Unwin. 2003.


Autocorrelation statistics

• A number of formal measures exist:– Joins count statistics for binary or classed

data– Moran’s I and Geary’s c for numeric data,

at points or aggregated to areas– Semivariogram and covariogram functions

for point data


Results

• For the Pakeha (European) population in Auckland City, using Moran’s I, we get:


The modifiable areal unit problem (MAUP)

• Areas are ‘arbitrary’: they are designed for convenience of data collection, not with respect to underlying patterns– Standard statistical techniques are sensitive to the

choice of units– In one study* it was shown that the correlation

between two variables can be estimated anywhere between –1 and +1, depending on the spatial units used!

*Openshaw, S. and P. J. Taylor. 1979. A million or so correlation coefficients: three experiments on the modifiable areal unit

problem. In N. Wrigley (ed.) Statistical Methods in the Spatial Sciences, Pion: London, 127-44.


Redistricting

• Perhaps the clearest example of MAUP in practice…

Source: The Economist


Scale• Geographic scale effects are

fundamental:– Different data types are appropriate at

different scales, so available spatial data may be dependent on scale, ruling out some types of analysis

– Scale is a factor in autocorrelation, and in the distinction between 1st and 2nd order effects

– It is also a factor in MAUP with respect to the level of aggregation used


Non-uniformity of space• The non-uniformity of space refers to

problems arising from tacit assumptions about the uniform spatial density of ‘background’ populations– For example, ‘clusters’ of crime are expected in

urban areas because more people live there– This leads to numerous analytic complexities

Example: ISO9000 certified firms in the United States


Edge effects

• Entities on the edge of a study area only have neighbors in one direction (toward the middle)– Unless care is taken, this can distort things– Again, coping with this leads to numerous

analytic complexities


The bad news summarized• Data are spatially dependent

– Spatial autocorrelation

• Data are also dependent on how you look at them spatially:– Aggregation– Scale– Non-uniformity of space– Edge-effects


Some good news

• Spatial data do have an intrinsic advantage, however…

• In addition to data we have a record of where the data were observed

• Making the most of this extra information lies at the heart of spatial analysis


Some useful general concepts

• A number of concepts are frequently invoked in spatial analysis– Distance– Adjacency– Interaction– Neighborhood– Proximity polygons


Distance

• Easily calculated from two coordinates– Use Pythagoras’s theorem

– This is trickier on a sphere, but for projected data at sub-regional scales the Euclidean approximation is adequate

22 yxd y

x

d


Other distance metrics

• A variety of non-Euclidean measures

• Network distance on a transport system

• Travel time

• Perceived distance


Adjacency

• This is a sort of binary distance: two spatial objects are either adjacent or not– Often we use distance to decide: if d = 0,

then two objects are adjacent– This can get complex for some kinds of

object– The meaning is clearest for polygons


Queens and rooks

• These terms are fairly self explanatory, referring to which types of adjacency we choose to ‘allow’


A simple example


Adjacency applied to measuring autocorrelation

• For each pair of neighboring cases, calculate the covariance:

– This produces a positive number when two values are similar, and a negative number when they are different

• Averaged over all neighboring cases, and scaled by dividing by the variance of the data, we get a number between –1 and +1– This is interpreted in the same way as a standard

correlation coefficient

xxxx ji


Election 2000 again• Those county level results expressed in terms

of the percent share for George W. Bush

The clear spatial pattern in these data is confirmed by a Moran’s I value of around 0.45


Interaction

• Interaction (often denoted wij) is a

measure of the likely strength of relationship between two entities– The most common form is inverse-distance

ijij d

w1


Other measures of interaction–Inverse distance powered (usually squared)

kij dw

1

–Negative exponential

–Weighted inverse distance

kdij ew

k

jiij d

AAw


Matrices and spatial pattern

• Many of these concepts assign a value to describe the relationship between every pair of objects

• This lends itself to being recorded in a matrix:

04510811610141

45060919924

1086006711068

116916705168

1019911051066

36246868660

D


Spatial weights matrices

• A particularly common matrix is the spatial weights matrix, usually denoted by W

• This records the interaction between each pair of objects, and appears in– autocorrelation, point pattern analysis,

interpolation, spatial regression, geographically weighted regression, spatial interaction modeling…


Neighborhood

• Neighborhood is a less clear-cut concept– It can mean the region of space around

some object– Or the set of objects considered to be

neighbors of that object

• Some notion of neighborhood is implied by any given weights matrix


Proximity polygons• Proximity polygons are an increasingly

important example of the neighborhood concept

• A proximity polygon is associated with each spatial object and is the region of space nearer to that object than to any other

• A good demonstration of the idea is Voroglide by: Praktische Informatik VI, FernUniversität Hagen, Christian Icking, Rolf Klein, Peter Köllner, Lihong Ma


Uses of proximity polygons

• Proximity polygons are commonly used in– Interpolation– Point pattern analysis

• Increasingly they are used throughout spatial analysis, especially in location decision making


Just to review…

• Distance

• Adjacency

• Interaction

• Neighborhood


GIS and spatial analysis

• GIS vendors often claim to offer ‘spatial analysis’– This usually doesn’t mean statistical spatial

analysis, but spatial data manipulation—buffering, overlay etc.

– However, GIS has increased the need for spatial analysis, because more people are making maps, and asking questions about them!


If spatial analysis is so useful, why is it not integrated into GIS?!

• Different perspectives on spatial data– GIS is built around the entity-attribute model.

Spatial analysis uses this data (because that’s the way it comes). Conceptually, spatial analysis sees data as patterns which are the outcomes of processes, which can be quite different.

• Spatial analysis is not widely understood– It has been a specialized field, and therefore hard

to justify incorporating into GIS, as a standard tool

• Spatial analysis can make GIS hard to sell– Spatial analysis is about asking difficult questions,

not about easy answers


Questions?

David O’SullivanUniversity of AucklandSchool of Geography and Environmental Science

[email protected]

Documents

Spatial analysis: a roadmap David O’Sullivan University of Auckland School of Geography and Environmental Science [email protected]