A Comparison of Zonal Smoothing Techniques Prof. Chris Brunsdon Dept. of Geography University of...

Preview:

Citation preview

A Comparison of Zonal Smoothing Techniques

Prof. Chris BrunsdonDept. of Geography

University of Leicestercb179@leicester.ac.uk

Background

Much social science data comes aggregated over irregular spatial zones

Census Wards

Police beat zones

Neighbourhood renewal areas

CDRP Special Areas

Typical Problems

Changing from one set of geographical units to another

Areas of special concern for crime reduction (not the aggregation units used to report crime rates)

Compare crime rates with social data (different aggregation units)

One solution

Convert to surface - re-aggregate to new zones

Factors to Consider

Data Collection

Statistical Issues

Software Issues

Underlying Theory

Diagnostics

Organisational Issues

Background (1)

CAMSTATS web site

Developed at UCL as a consultancy (Muki Hacklay)

Gives public access to crime data - going back to April 2000

Designed so that police officers (or civilians) can update web page in a single button click

Has run without problems or need for advice or intervention

Background (2)

Crime rates are mapped for a number of areal units

Wards

Police Sectors

Neighbourhood Renewal Areas

Special Areas

Approaches

Roughness Penalty

Pycnophylactic Interpolation

Naive Averaging

Form of Problem

Estimate an underlying crime risk surface from zonal data

Continuous version of model:In some approaches only

Discrete Approximation:

This is an over specified regression model.

NB - error term only in some approaches

Over-Specified?

What does this mean?

More variables than observations

Solution is not unique

ie - for a given zone set all pixels to zero, and set one to crime count

set all pixels to 1/n of crime count if n is number of pixels in region

A Discrete roughness penaltyRougness Penalty

In fact there are an infinite number of solutions to equation on earlier slide

Favour those with a lower roughness penalty

c.f. regularization problems

Aim to minimise sum of

squared errors + const. x roughness

Roughness at

This Can be solved by matrix algebra

Contains info relating pixels to zones

Encapsulates ‘total roughness’ for all pixels

Controls roughness penaltyObserved zonal count

X is an indicator matrix showing which pixel is in which zone

Software

Techniques here are not ‘off the shelf’

Statistical/numerical as well as GIS techniques

Here the ‘R’ package used

Statistical programming language

Good graphical support

Open Source (with lots of libraries - including GIS-type support)

Pycnophylactic Interpolation

Similar to Roughness Penalty - but no errors allowed - cf Tobler 1979

Can be solved as a quadratic programming problem

Naive Approach

Assume that the density within each areal unit is constant

HOUSING DENSITY: Is it sensible to assume intensity of household burglaries

is smooth?

Model Modification

Densities can be obtained with David Martin’s SURPOP approach - can apply this modification

to all approaches described earlier

Routine activity Theory

We now assume risk per household is smooth

Perhaps in line with Cohen & Felson’s ROUTINE ACTIVITY THEORY?

Offenders choose targets according to their usual movement patterns

Familiary with a pixel suggests familiarity with its neighbours

But potential targets have to be there as well!

EvaluationCamstats web site (www.met.police.uk/camden/camstats)

Monthly household burglary rates from April 2003 to March 2006

Aggregated over a number of different zones

Models are calibrated by UK census wards (64x64 pixels)

Then tested against two special interest areas

Camden Town / King’s Cross

Results

Method King’s Cross Camden Town

Pycnophylactic (HH) 1.94 2.90

Pycnophylactic 1.60 3.13

Naive (HH) 1.26 3.13

Naive 1.37 3.48

Roughness (HH) 2.05 2.86

Roughness 1.65 3.04Numbers are mean absolute deviations in estimated burglary counts - lowest in red,

runner up in green

Discussion

Is simplest best?

Further findings show simple estimators work best on areas close to the edge of the region, but smoothing based approaches work best further inside the region

Camden Isn’t An ISLAND!

Consequences

Smoothing based approaches ‘borrow information’ from nearby places

cf Toblers First Law of Geography: Everything is related to everything else, but near things are more related than distant things

Because Camden isn’t an island, things are going on beyond the ‘edges’.

But we don’t know what they are!

So we can’t reliably borrow information

So probably simpler methods perform better near the ‘edges’

A real-world problem

In practice organisations sub-divide data geographically

But without data sharing, individual regions appear (at least mathematically) as islands!

Conclusions - Further Work ?

For Camden Town, Roughness Penalty performed best.

For King’s Cross, the Naive method worked best

In both cases, taking household density into account proved best

Edge effects?

Merging predictors?

Further work - kernel based approaches...

Recommended