Upload
davidmwalker
View
235
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Introducing Netezza Spatial: The ability to analyse information in a geographic context:– Where is the nearest petrol station? – Which road am I on?– How many ATMs are in this area?
Citation preview
in conjunc(on with
Data Management & Warehousing http://www.datamgmt.com
What is the Spa(al Module?
• It’s the ability to analyse informa(on in a geographic context: – Where is the nearest petrol sta(on? – Which road am I on? – How many ATMs are in this area?
• It’s not maps and images – These come later with tools that help present the informa(on
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 2
The three types of data & many ques(ons
• Points – OS Grid – La(tude & Longitude
• Lines – Pairs of points – e.g. Road Segments
• Polygons – A series of points that define a boundary
– e.g. Postcode Boundaries
• How close are two points?
• Does a point touch a line?
• Is a point inside or outside a polygon?
• Does a line cross a polygon?
• How many points are in a polygon?
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 3
Using Spa(al Data Is Complex
• Different distances between points at different longitudes and la(tudes
• Measurement over a curved irregular surface
• Mul(ple input and output formats
• Mul(ple co-‐ordinate systems see:A Guide to Coordinate Systems in Great Britain
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 4
Sources of Informa(on – GPS
• In Car Device – Sends frequent data sets to processing centre
– Point Data • Speed, Direc(on, Loca(on and G-‐force
– Aggregate Data • Speed and Direc(on
• Other Devices – Sat Nav Systems – Smart Phone Apps e.g. ‘GPS Tracker’
– Cameras
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 5
Sources of Informa(on – Ordnance Survey
• Integrated Road Network: A series of 3 million ‘linestrings’ and 17 million points that describe every road in the UK
• Linestrings have between 2 and 655 points, most have less than 10
• 23 points for this picture
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 6
Sources of Informa(on – Post Office/GAdm
• Postal Address File: A series of c.1.75M UK postcodes – Postcode Boundaries – Over 28M complete
addresses
• Global Admin Boundaries – Na(onal and regional
boundaries for c.245 countries
– hgp://www.gadm.org
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 7
Data Layers – Enriching what you have
• Data Layers are sets of informa(on (ed to a geographic point – Road Speed for a given road segment – ATM Loca(on – House Price for a postcode
• Where data has loca(on informa(on it is known as ‘Geo-‐tagged’
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 8
Data Layer Sources (1)
• Ordnance Survey – Road Types, Limits, Closures, etc.
• Government – UK Government now providing masses of geo-‐tagged info (hgp://data.gov.uk)
• Met Office / HM Nau(cal Almanac Office – Weather, Daylight to Postcode Level
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 9
Data Layer Sources (2)
• Wikipedia – Geo-‐tag Access API – what’s nearby?
• Google Maps – Road level photographic images
• Commercial Sources – Fast Food Outlets, Supermarkets, Petrol Sta(ons, ATMs, etc.
• Massive growth in both commercial and public domain geo-‐tagged data
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 10
Issues with Geo-‐tagged data
• Geo-‐tagging uses different formats – Longitude & La(tude, OS Grid Reference, etc
• Geo-‐tagging at different levels – Data for a postcode or a an en(re county which makes it difficult to compare
• Geo-‐tagging coverage is patchy and/or historic – Rate of change of fine detail data is very high – e.g. OS issues monthly updates to the UK mapping
• Mul(ple standards and formats – XML & CSV, different file formats, etc.
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 11
Our Model For Delivering Spa(al Data
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 12
Source
Source
Source
Source
Source
Source
(Small) Postgres Database
Netezza
1 3
2
Spa(
al Analysis
(Proximity, Con
tains, Exclude
s)
Spa(
al Presenta(
on
(Sets of data with
spa(a
l ag
ribu
tes)
4 5
1. Load Mul(ple File Formats 2. Standardise Geo-‐Tagging 3. Extract & Load CSVs 4. Perform Spa(al Analysis 5. Create User Access Area
Que
ry & Presenta(
on Too
ls
(Tableau, G
oogle Maps, etc.)
Netezza Spa(al Value Add
• Netezza Spa(al is fast – Analysis
• Look up a typical 18 point trip in the 3M linestrings to find the roads that the vehicle was on in less than 1 second
• Overnight batch process of 300,000 points to matching road names in under 30 minutes
– Presenta(on • Tools rely on fast query access to render any queried map with sub-‐second response (mes
• Netezza Spa(al is easy – Distance and proximity
calcula(ons are simple – ‘Touches’, ‘Overlaps’ &
‘Contains’ queries allow instant value add
• Netezza Spa(al integrates – Works well with Tableau – Easy to generate KML for
use with Google Earth and Google Maps
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 13
Netezza Spa(al Limita(ons
• Fails the Slar(barpast Test: – Polygons for very detailed maps
are too big to be loaded as Netezza limits the maximum block size to 64000 characters
– Named aqer the Hitch-‐Hikers Guide to the Galaxy coastline designer responsible for the twiddly bits around the Norwegian rords
• Work-‐around: – Use regional boundaries (e.g.
UK Coun(es, US States, etc.) and then aggregate into na(onal boundaries
– If a point is in Berkshire then by defini(on it is also in England
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing Page 14
Norway
Slar(barpast
Current Uses …
• M/A/B road driving profiles • Time of day driving profiles
• Speed Limits vs. Driven Speed
• Matching GPS posi(ons to road names
• Out of bounds driving • Customer Demographic Profiles
… but this is only the start in a very short (me
Wednesday, July 28, 2010 © 2010 Data Management & Warehousing 15
in conjunc(on with
Data Management & Warehousing http://www.datamgmt.com