21
GeoSpatial “Unstructured Data” Dan Rickman GeoSpatial SG

GeoSpatial “Unstructured Data” Dan Rickman GeoSpatial SG

Embed Size (px)

Citation preview

GeoSpatial “Unstructured Data”

Dan Rickman

GeoSpatial SG

Agenda

What is geospatial data

What does “structured” geospatial data look like?

General data modelling issues regarding geospatial data

In search of the BLPU

A brief history of OS maps – how structured are they (then and now)

Raster map data

EDRM

Geo-parsers/gazetteers/metadata

Web-based systems

Future directions?

What is Geospatial Information? - 1

Spatial data which relates to the surface of the Earth

Geodetic reference system as base e.g. WGS84 used for Global Positioning System (Earth as an ellipsoid), Latitude and Longitude (Earth as a sphere)

Ordnance Survey (GB) define National Grid – projection onto flat surface – NB: OS(NI) use Irish grid

Spatial relationships – defined around concept of neighbourhood – relates to two “laws” of geography:

• Most things influence most other things in some way• Nearby things are usually more similar than things

which are far apart

What is Geospatial Information? - 2

Unstructured – spaghetti data

Topology – information structured as networks, polygons

GeoSpatial information requires metadata – e.g. minimal information such as map projection used

GeoSpatial information may also temporal modelling – e.g. farm subsidies vary as utilisation and legislation change

Field-based model versus object-based model of space, e.g. rainfall versus buildings on which rain falls

GeoSpatial information requires ontology

– What is the “real world”, how classifiedRelates to semantics

What are GeoSpatial Systems?

Known as Geographic Information Systems, Spatial Information Systems

Enables capture, modelling, storage, retrieval, sharing, manipulation and analysis of geographically referenced data

Database is at the heart – as is “attribute” data

Model developing – perhaps GeoSpatial data better seen as “attribute” of alphanumeric business information

Presentation does not have to be map-based in all cases

Key element is spatial indexing – uses different techniques to alphanumeric indexing

Where used? Examples

Central government – DEFRA, ODPM, Land Registry, ONS

Local government – planning, highways authorities

Utilities – physical and logical network

Insurance – flood plains

Health – epidemiology

Travel, multi-modal route planning

More widespread use – addresses, postcode based data against regional boundaries, infrastructure (“geographies” used to divide country, catchment area)

Fiat boundaries verus “bona fide” boundaries – what is “real world” how do we structure it?

Structured geo-databaseParadigm shift?

Relational Database

(Attribute data)

SpatialData

(proprietary format)

ERP

CRM

Real

Tim

e/Engineering

System

s

Spatially extended RDBMS-Complex data types for spatial data

-Computational geometry-Spatial indexing

-DDL and DML extensions

GIS

ROMANSE - Hampshire CC

Roadwork Information

Geospatial data modelling

Field-based model versus object-based model

Geographic Information Systems are object-based in practice

Most common field based information, e.g. Digital Elevation Model (line of sight applications), attached to objects

Objects rely on field-based model, i.e. spatial co-ordinates

Initiatives such as Digital National Framework encourage organisations to structure data on references to objects, not re-capture and duplicate data

GeoSpatial equivalent of “referential integrity”

Nevertheless duplication, lack of (referential) integrity is common place and hard to eradicate

In search of the BLPU

Basic Land and Property Unit “Holy grail” of industry – no Da Vinci code produced yet!Example of Ordnance Survey Master Map (OSMM):"St Mary's football stadium, Southampton" is one objectTypical detached house and its plot of land, likewiseComplex entities such as "Southampton railway station" are defined in terms multiple objects: one for the main building, several for the platforms, one more for pedestrian bridge over the tracks. (NB: See Wikipedia article on TOID)Defining the candidate BLPU, their lifecycles and their attribute data and verifying that these are meaningful/practicable from the wide variety of business processes which apply to the BLPU and the aggregate entities which are created from them Dependencies so that data sets are based on the BLPU wherever possible limited by business use, e.g. field use change quite different from a tenant/owner perspective

Evolution of geographic information

1950 2010

paperrecords

digital

records

databaserecords

paper mapping

digital mapping

geographicinformation

1970 1990

Raster map data

Scanned ortho-rectified map or map-based data – metadata is co-ordinates, projection, extent

For example Google Maps/Google Earth, Microsoft Virtual Earth

Traditionally stored outside the database as external files, analogous to vector data storage, e.g. Oracle 10g GeoRaster

Data stored as BLOBs, metadata required regarding number of bytes per pixel, compression algorithms and so on

Benefits limited as “intelligence” in map requires interpretation

Still limited progress on map-based pattern recognition – there are semi-automated solutions from companies such as Laser-Scan

EDRM

Electronic document and records management

Increase usage in local/central government due to Freedom of Information act

Contain potentially significant geospatial data

Most common example is address

Requires capture of appropriate metadata or appropriate pattern recognition to identify addresses

Requires gazetteers to provide reference to spatial co-ordinates

NB: most familiar gazetteer – list of streets in AtoZ maps

Geo-parsers/gazetteers/metadata

Geo-parsers: identify spatial tags (geo-tags) in data

Context sensitivity and patterns of usage required

E.g. Jordan (country) != Jordan (Katie Price)

Can see an example at:

http://edina.ac.uk/projects/geoxwalk/geoparser.html

Relies on and populates gazetteer of associated names

Emerging standards for geo-parsing, e.g. Open GIS Consortium looking at:

– Gazetteer service– Geo-coder service– Web services (WMS/WFS)

Web-based systemsGoogle Earth meets Flickr

Web-based systems (metacarta, KML, mashup)

Web-based systems

World wide wild west of unstructured data

Increasing use of systems to control, coordinate and make this accessible

Geo-enabled semantic web – raises issues of ontology

www.metacarta.com – provide web-based Geographic Text Search (GTS), has the ability to confine searches by geography and retrieve information that it detects using the keywords, and then displays this information geographically on a map interface (working now with Google Earth).

They know where you live

MetaCarta(R), Inc., a leading provider of geographic intelligence, announced today that it had won a one-year contract with … the Department of Homeland Security [which] identifies and assesses current and future threats to the homeland, maps those threats against the nation's vulnerabilities, issues timely warnings and takes preventative and protective action… The product automatically identifies geographic references using advanced natural language processing (NLP) from any type of unstructured content in a customer's archives such as email, web pages, newswires or cables. It assigns a latitude and longitude to these references so that users can analyze their text archives using geographic maps, keywords and time as filters. The results of a query are displayed on a map with icons representing the locations found in the natural language text of the documents and as a text results list. Both the icons and text summaries are hyperlinked to the documents they represent. (Source: http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=109&STORY=/www/story/03-14-2005/0003193909&EDATE=)

The future (and summary)

Structured environment – will contain more “unstructured” data

Web will continue to provide unstructured distributed data

Success of semantic-based approach yet to be determined, experience with geospatial data indicates there are significant complexities based around our representations of the “real world”

One issue is clear – increasingly less privacy, location is already accessible through mobile phones and linking this to other data can provide significant intelligence information

Also clear – data quality issues will persist

They will still get it wrong!