37
Joining Data Together 1 John Murray @MurrayData

Geovation Water Challenge: Joining Data by Location, John Murray

Embed Size (px)

Citation preview

Page 1: Geovation Water Challenge: Joining Data by Location, John Murray

Joining Data Together

1

John Murray

@MurrayData

Page 2: Geovation Water Challenge: Joining Data by Location, John Murray

Joining Data Together

2

• The real value of data is not the data itself, but the insights derived from it.

• To achieve maximum economic benefit from data, disparate sources need to be joined:• Appending socio-demographic data to a customer

database for marketing insights.• Merging crime data with benefits and deprivation data

to analyse causes of crime.• Joining NHS mortality and prescribing data to census

data to examine factors in poor health.• Geography is a common "currency" in much Open Data

which allows us to join it.

Page 3: Geovation Water Challenge: Joining Data by Location, John Murray

Geography Types in Open Data

3

• Census geography.• Output areas.

• Administrative geography.• Local authorities, regions, NHS areas, Police Forces.

• Political geography.• Electoral wards, Parliamentary constituencies.

• Postal geography.• Postcodes, sectors, areas, districts.

• Unstructured geography.• Spatial points.• Bespoke catchments, e.g. retail stores.

Page 4: Geovation Water Challenge: Joining Data by Location, John Murray

Census Geography

4

• Hierarchy of published area statistics.• Output Area (OA)

• 40-250 households. • Lower Super Output Area (LSOA)

• 400-1200 households. • Middle Super Output Area (MSOA)

• 2000-6000 households.• Links to administrative geography• Open data geography tables:

• ONS Postcode Directory (ONSPD)• National Statistics Postcode Lookup (NSPL)

Page 5: Geovation Water Challenge: Joining Data by Location, John Murray

Administrative and Political Geography

5

• Local Authorities.• District. • County. • Metropolitan Boroughs and Unitary Councils.• Parish and Town Councils.

• Parliamentary Constituencies.• Government Regions.• NHS.• Police Forces.• Environment Agency Regions.• Links Provided in ONSPD and NSPL.

Page 6: Geovation Water Challenge: Joining Data by Location, John Murray

Postal Geography

6

• Based around the postcode.• Introduced in 1959 on a trial basis.• Current UK system in use since 1967.• Designed for the purpose of efficient delivery of

mail.• Doesn't align exactly with Census and

Administrative Geography.• 1.8 million postcodes currently in use.• Mean number of "delivery points" is 14.

Page 7: Geovation Water Challenge: Joining Data by Location, John Murray

Anatomy Of A Postcode

7

CH1 2HS • CH – Postcode Area• CH1 – Postcode District• CH1 1 – Postcode Sector• CH1 2HS – Postcode• "HS" is called the walk.• CH1 referred to as the Out code• 2HS is referred to as the In code

Page 8: Geovation Water Challenge: Joining Data by Location, John Murray

Postcode Facts

8

• Postcode mean 14 delivery points.• Postcode sector mean 2530 delivery points.• Postcode district mean 9080 delivery points.• Postcode area mean 200,000 delivery points.• 26 million delivery points in UK.• Ordnance Survey Codepoint Open, ONSPD and

NSPL contain grid references for postcode centroids.

Page 9: Geovation Water Challenge: Joining Data by Location, John Murray

Joining Data

9

• In most cases, use ONSPD• Although approximate, good enough for most uses.

• Political and public sector, use NSPL• Specifically designed for that purpose.

• Use postcode to join data.• Can join individual/household data.• Augment existing data, e.g. customer database

• Customer demographic profiling.• Store catchment analysis.• Join open and closed data sources.• Common in many open data sources.• Links easily to other levels of geography.

Page 10: Geovation Water Challenge: Joining Data by Location, John Murray

Joining Data

10

Page 11: Geovation Water Challenge: Joining Data by Location, John Murray

Geospatial Data in Databases

11

• Spatial data types• Point (single point)• Line (set of joined points e.g. road)• Polygon (closed set of joined points e.g. boundary)

• Most database support spatial data types• Proprietary e.g. MS SQL Server, Oracle.• Open source: MySQL, MariaDB, PostGreSQL• NSQL: Neo4J, MongoDB, PostGIS

• Spatial queries• Contained in (point in polygon).• Intersects (crosses).• Distance (not supported by all).

Page 12: Geovation Water Challenge: Joining Data by Location, John Murray

Example of Polygon Data

12

Page 13: Geovation Water Challenge: Joining Data by Location, John Murray

Distance Metrics

13

• Euclidean Distance• “Crow flies” linear distance

• Graph Distance• Road distance

• Manhattan Distance• Rectilinear distance

• Great Circle• Shortest distance between two points on the surface

of a sphere

Page 14: Geovation Water Challenge: Joining Data by Location, John Murray

Euclidian Distance

14

• University of Chester to Liverpool Airport.

• Euclidean distance 9.4 miles.

• Manhattan distance 11.1 miles.

• Graph distance (fastest) 24.5 miles.

• Used OS Strategy Roads Opendata and A* algorithm.

Page 15: Geovation Water Challenge: Joining Data by Location, John Murray

Non-Formal Unstructured Geography

15

• Micro geo-centric analysis• Describe neighbourhood

• Point based data• Relate to formal geography through boundaries.

• User defined• Store catchments• Sales territories• Radial/drive time

Page 16: Geovation Water Challenge: Joining Data by Location, John Murray

Point Based Data

16

• The simplest type of spatial object.• Represents a point relative to the Earth's surface.• Has at least 2 values for coordinates.• May optionally have an elevation z value in some

systems.• Ordnance Survey grid references are Cartesian

Coordinates, in metres, east and north of origin point.

Page 17: Geovation Water Challenge: Joining Data by Location, John Murray

Converting Between Systems

17

• Use GIS software or conversion software.• Scripts freely downloadable from Ordnance Survey and

others.• Ordnance Survey provide comprehensive guides and

resources to write your own scripts.• Unfortunately, it ISN’T as straightforward as using a

formula.• Need to take into account tectonic shifts and historic

inaccuracies in surveying.• OS provides a dataset of shifts to do this.

Page 18: Geovation Water Challenge: Joining Data by Location, John Murray

Geocentric analysis

18

• Use point as centre.• Use Euclidian distance to aggregate metrics.• Standardise units.• Example – population density at postcode level:• Census Postcode Estimates• Ordnance Survey Code-Point Open• Join the datasets.• Sum the counts within specified radius.• Convert to standardised unit e.g. people/hectare

Page 19: Geovation Water Challenge: Joining Data by Location, John Murray

Geocentric analysis

19

Page 20: Geovation Water Challenge: Joining Data by Location, John Murray

Geocentric analysis

20

Postcode 1km 750m 500m 250m 100m EA NOCH1 4BA 17.24 15.71 17.01 21.13 12.41 340282 367773CH1 4BB 16.26 14.56 14.23 16.04 20.05 340178 367782CH1 4BD 16.01 13.43 14.48 15.48 20.05 340143 367784CH1 4BE 7.72 6.85 5.67 4.32 8.91 339532 368352CH1 4BF 14.26 12.58 13.58 16.9 13.36 340105 367827CH1 4BG 14.52 15.36 11.45 9.47 25.78 339790 367448CH1 4BH 12.84 11.53 8.25 9.37 5.09 339647 367485CH1 4BJ 19.21 24.05 23.49 23.73 8.91 340104 367217CH1 4BL 12.84 11.53 8.25 9.37 5.09 339647 367485CH1 4BN 15.95 17.04 20.1 23.37 36.92 339857 367238CH1 4BP 13.64 10.3 9.51 14.26 19.73 339982 367851CH1 4BQ 14.89 16.43 13.2 11.2 25.78 339834 367446CH1 4BR 18.56 23.03 26.61 24.95 8.91 340059 367161CH1 4BS 14.64 17.14 21.11 21.13 28.32 339806 367194CH1 4BT 15.01 15.83 14.03 10.13 41.06 339791 367360CH1 4BU 15.58 16.85 17.22 16.75 35.01 339827 367314CH1 4BW 14.87 16.89 17.46 21.95 46.15 339813 367264CH1 4BX 18.48 16.17 20 21.03 25.46 340323 367747CH1 4BY 16.06 17.8 22.62 37.84 54.43 339843 367069CH1 4BZ 15.22 17.29 21.83 33.97 41.38 339808 367083CH1 4DA 13.37 12.59 10.42 15.53 15.59 340163 367955CH1 4DB 12.94 9.84 6.48 6.51 18.46 339621 367567CH1 4DD 14.59 16.39 20.99 35.54 41.38 339768 367073CH1 4DE 16.15 18.22 27.15 53.62 9.54 339871 366882CH1 4DF 13.31 14.62 19.21 32.08 27.69 339715 367101CH1 4DG 14.21 15.69 23.1 41.6 49.01 339759 366931CH1 4DH 16.34 18.96 28.95 40.64 28.96 339934 366769CH1 4DJ 15.58 17.63 24.8 53.27 34.37 339837 366904CH1 4DN 11.4 12.44 16.5 18.84 33.42 339585 366986CH1 4DP 12.25 13.99 18.69 24.64 21.64 339644 367016CH1 4DR 25.61 29.57 30.96 26.94 10.5 340393 366940

Page 21: Geovation Water Challenge: Joining Data by Location, John Murray

Wirral Population Density

21

Page 22: Geovation Water Challenge: Joining Data by Location, John Murray

Wirral Anti-Depressant Prescribing

22

Page 23: Geovation Water Challenge: Joining Data by Location, John Murray

Chester Postcode Crime Density (500m)

23

Page 24: Geovation Water Challenge: Joining Data by Location, John Murray

INSPIRE Directive

24

• INfrastructure for Spatial InfoRmation in Europe.• EU Directive since May 2007.• Lays down framework for spatial information.• Aim is ensure compatibility and usability across

member states.• Interoperability of spatial datasets.• Metadata standards.• Ordnance Survey Opendata.• Land Registry Cadestral Polygons.

Page 25: Geovation Water Challenge: Joining Data by Location, John Murray

INSPIRE Example – Land Registry Cadestral Polygons

25

Page 26: Geovation Water Challenge: Joining Data by Location, John Murray

Street Level Data

26

• Use proximity to street geometry to link attributes.

• Interrelation between features.• Inference of addresses.• Describe local neighbourhood.

Page 27: Geovation Water Challenge: Joining Data by Location, John Murray

Street Level Data Demo – OS OpenMap

27

Page 28: Geovation Water Challenge: Joining Data by Location, John Murray

Screenshot 1 - Roads

28

Page 29: Geovation Water Challenge: Joining Data by Location, John Murray

Screenshot 2 – Add railways

29

Page 30: Geovation Water Challenge: Joining Data by Location, John Murray

Screenshot 3 – Add buildings

30

Page 31: Geovation Water Challenge: Joining Data by Location, John Murray

Screenshot 4 – Add functional sites

31

Page 32: Geovation Water Challenge: Joining Data by Location, John Murray

Screenshot 5 – Add important buildings

32

Page 33: Geovation Water Challenge: Joining Data by Location, John Murray

Screenshot 6 – Add water

33

Page 34: Geovation Water Challenge: Joining Data by Location, John Murray

Screenshot 7 – Add postcode centroids

34

Page 35: Geovation Water Challenge: Joining Data by Location, John Murray

Screenshot 8 – Add INSPIRE polygons

35

Page 36: Geovation Water Challenge: Joining Data by Location, John Murray

Screenshot 9 – Add Census output areas

36

Page 37: Geovation Water Challenge: Joining Data by Location, John Murray

Screenshot 10 – Add proportion of 65+

37Key: Red high, yellow average, blue low