Spatial Data Mining. 2 Introduction Spatial data mining is the process of discovering interesting,...

Preview:

Citation preview

Spatial Data Mining

2

Introduction• Spatial data mining is the process of discovering

interesting, useful, non-trivial patterns from large spatial datasets– E.g. co-location patterns of water pumps and cholera– Determining hotspots: unusual locations

• Spatial Data Mining Tasks– Classification/Prediction– Co-location Mining– Clustering

• Recap of special properties of Spatial Data– Spatial autocorrelation– Spatial heterogeneity– Implicit Spatial Relations

3

Spatial Relations

• Spatial databases do not store spatial relations explicitly– Additional functionality required to compute

them• Three types of spatial relations specified

by the OGC reference model– Distance relations

• Euclidean distance between two spatial features– Direction relations

• Ordering of spatial features in space– Topological relations

• Characterise the type of intersection between spatial features

4

Distance relations

• If dist is a distance function and c is some real number

1. dist(A,B)>c,2. dist(A,B)<c and3. dist(A,B)=c

AB

A B

BA

5

Direction relations• If directions of B and C

are required with respect to A

• Define a representative point, rep(A)

• rep(A) defines the origin of a virtual coordinate system

• The quadrants and half planes define the direction relations

• B can have two values {northeast, east}

• Exact direction relation is northeast

A

C

B

rep(A)

C north A

B northeast A

6

Topological Relations• Topological relations describe how geometries

intersect spatially• Simple geometry types

– Point, 0-dimension– Line, 1-dimension– Polygon, 2-dimension

• Each geometry represented in terms of – boundary (B) – geometry of the lower dimension– interior (I) – points of the geometry when boundary is

removed– exterior (E) – points not in the interior or boundary

• Examples for simple geometries– For a point, I = {point}, B={} and E={Points not in I and

B}– For a line, I={points except boundary points}, B={two

end points} and E={Points not in I and B}– For a polygon, I={points within the boundary}, B={the

boundary} and E={points not in I and B}

7

DE-9IM• Topological relations are defined using any

one of the following models– 4IM, four intersection model (only B and E

considered)– 9IM, nine intersection models (B, I, and E)– DE-9IM, dimensionally extended 9 intersection

model• DE-9IM is an OGC complaint model

• Dim is the dimension function

8

Example

• Consider two polygons– A - POLYGON ((10

10, 15 0, 25 0, 30 10, 25 20, 15 20, 10 10))

– B - POLYGON ((20 10, 30 0, 40 10, 30 20, 20 10))

9

I(B) B(B) E(B)

I(A)

B(A)

E(A)

9-Intersection Matrix of example geometries

10

DE-9IM for the example geometries

I(B) B(B) E(B)

I(A) 2 1 2

B(A) 1 0 1

E(A) 2 1 2

11

Relationships using DE-9IM• Different geometries may give

rise to different numbers in the DE-9IM

• For a specific type of relationship we are only interested in certain values in certain positions– That is, we are interested in

patterns in the matrix than actual values

• Actual values are replaced by wild cards– T: value is "true" - non empty

- any dimension >= 0– F: value is "false" - empty -

dimension < 0– *: Don't care what the value is– 0: value is exactly zero– 1: value is exactly one– 2: value is exactly two

A overlaps B

I(B) B(B) E(B)

I(A) T * T

B(A) * * *

E(A) T * *

12

Topological Relations• x.Disjoint(y)

– FF*FF**** • x.Touches(y)

– FT******* Area/Area, Line/Line, Line/Area, Point/Area – F**T***** Not Point/Point – F***T****

• x.Crosses(y) – T*T****** Point/Line, Point/Area, Line/Area – 0******** Line/Line

• x.Within(y)– TF*F*****

• x.Overlaps(y) – T*T***T** Point/Point, Area/Area– 1*T***T** Line/Line

• DE-9IM string for example geometries was ‘212101212’ (from earlier slide)– A crosses B– A overlaps B

13

Approaches to Spatial Data Mining

• Materialize spatial features and use Weka– Required features are added as

additional attributes to the main feature– To create a flat file of data

• Use special data mining techniques that take spatial dependency into account

14

Materializing features- Example

15

Materializing features- Example (2)

16

Spatial Data Mining Architecture

• Retrieve data belonging to multiple themes

• Preprocess spatial data to materialize spatial features– Select the required

features– Use the methods to

compute spatial relations to create a flat file of data

• Use Weka like tool to perform data mining

OGC Complaint Spatial DBMS

Feature Selection & OGC complaint methods

to compute relations

Weka

Flat File

Multiple Themes

17

Spatial Clustering• Also called spatial segmentation• Input

– a table of area names and their corresponding attributes such as population density, number of adult illiterates etc.

– Information about the neighbourhood relationships among the areas– A list of categories/classes of the attributes

• Output– Grouped (segmented) areas where each group has areas with similar

attribute values• Census Website has plenty of examples

– http://www.statistics.gov.uk/census2001/censusmaps/index.html

18

Similarity with image segmentation

• Spatial segmentation is performed in image processing– Identify regions (areas)

of an image that have similar colour (or other image attributes).

– Many image segmentation techniques are available

• E.g. region-growing technique

2 2 2 2

2 2 2 2

2 2 2 2

1 1 1 1 2 2 2 2

1 1 1 1

1 1 1 1

1 1 1 1

19

Region Growing Technique• There are many flavours of

this technique• One of them is described

below:– Assign seed areas to each

of the segments (classes of the attribute)

– Add neighbouring areas to these segments if the incoming areas have similar values of attributes

– Repeat the above step until all the regions are allocated to one of the segments

• Functionality to compute spatial relations (neighbours) assumed

1

1

11

1

2

222

2

2

2

1

20

Summary• Spatial data storage available as extensions of

RDBMS• Visualization of Spatial data available in GIS• Spatial Data Mining requires functionality to

compute spatial relations • OGC specifications provide the standards for all

the above resources• MYSQL provides data spatial data storage

– But only partially provides the functionality for computing relations

• Several OpenSource systems provide all the above resources for spatial data– OpenJump, GeoTools

Recommended