26
Spatial Data Mining CSE 6331, Fall 1999 Ajay Gupta [email protected] m [email protected]

Spatial Data Mining CSE 6331, Fall 1999

  • Upload
    aldon

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Spatial Data Mining CSE 6331, Fall 1999. Ajay Gupta [email protected] [email protected]. What I shall be talking about…. Introduction (What, Why and Where) Orthodox Methods and some primitives Spatial Data Structures and Computations - PowerPoint PPT Presentation

Citation preview

Page 1: Spatial Data Mining CSE 6331, Fall 1999

Spatial Data MiningCSE 6331, Fall 1999

Ajay [email protected]

[email protected]

Page 2: Spatial Data Mining CSE 6331, Fall 1999

What I shall be talking about….• Introduction (What, Why and Where)• Orthodox Methods and some primitives• Spatial Data Structures and Computations• Data Organization and Spatial Data Mining Architecture• Methods for knowledge discovery

– Generalized-based Approach• Spatial-Data-Dominant Generalization• Non-Spatial-Data-Dominant Generalization

– Spatial Association Rules• Geominer : Tool for Spatial Data Mining• Future Work• Conclusions & References

Page 3: Spatial Data Mining CSE 6331, Fall 1999

Introduction (What, Why and Where Spatial Data Mining)

• Used for mining data in spatial databases and databases with huge amounts of data.

• Spatial data is defined as distance-related data in an object. – A spatial object contains both spatial and non-spatial data.– A spatial object is stored using spatial indexes e.g. R-Trees etc.

• A spatial database stores spatial objects and spatial relationships between these objects.

• Spatial Data Mining is used for knowledge discovery in areas e.g. Geographical Information Systems (GIS)

Page 4: Spatial Data Mining CSE 6331, Fall 1999

Orthodox Methods• Advantages

– Work great for numeric attributes.– Gives out realistic outputs.

• Disadvantages– These methods assume statistical independence among spatial data

objects. – Cannot model non-linear rules and cannot symbolic values very well.

• Two techniques borrowed from Machine Learning are widely used in this field:– Learning From Example– Specialization and Generalization

Page 5: Spatial Data Mining CSE 6331, Fall 1999

Some primitives• Rules: Following types of possible rules can be generated from the

database:– Characteristic Rules – Discriminant Rules – Association Rule

• Thematic Maps: Present spatial distribution of a single/multiple attributes. Used for discovering association rules. Typically, we represent non-spatial attributes in this map.

Page 6: Spatial Data Mining CSE 6331, Fall 1999

Spatial Data Structures, Computations

• Spatial Data Structures (Points, rectangles, circles etc.) • Indices for spatial data structures are made using

– Multi-dimensional trees– R-Trees/R*-Trees

• Spatial Computation Techniques– Spatial Joins: Quite expensive. Uses R*-Trees for efficiency.

First step finds all possible pairs of intersecting objects. Second step uses detailed methods to check for intersection.

– Map overlays are used in GIS.

Page 7: Spatial Data Mining CSE 6331, Fall 1999

Data Organization of a Spatial Database

• Spatial data (e.g. seismic data) consists of spatial and non-spatial objects/attributes. – Non-spatial attributes-- stored in relational databases.– Another attribute in the database points to the spatial description

of the object. – Spatial data is described using two properties -- geometric and

topological.• Knowledge discovery methods based on spatial data use both spatial

and non-spatial data.

Page 8: Spatial Data Mining CSE 6331, Fall 1999

Spatial Data Mining Architecture

DBInterface

FocussingComponent

PatternExtraction EvaluationFrom

DBMS

Controller

Knowledge Base

Discovery

From User

DomainKnowledge

Page 9: Spatial Data Mining CSE 6331, Fall 1999

Generalization Based Knowledge Discovery

Computers

Software Hardware

ApplicationSoftware

SystemSoftware

WordProcesors Browsers Compilers Linkers

MemoryModules CPUs

RAMs HardDisks

Intel-basedProcessors

Motorola-based

Processors

MS Office Notepad Netscape InternetExplorer 8086 8088

• Requires existence of background knowledge (concept hierarchies) for both spatial and non-spatial data.

• Concept hierarchies are typically given by domain experts.

Page 10: Spatial Data Mining CSE 6331, Fall 1999

Spatial Attribute Concept Hierarchy

USA

Texas Maryland

Dallas Tarrant

Irving Carrolton Arlington Euless

Montgomery Fredrick

NorthPotomac

Germantown Baltimore Spring

Wells

BeltlineRoad

WalnutHill

MarshLane

FrankfordDrive

Page 11: Spatial Data Mining CSE 6331, Fall 1999

Generalization Based Knowledge Discovery

• Attribute Oriented Induction– Generalize the specific values of the task-relevant attributes to higher

levels.– Remove attributes when:

• further generalization is not possible OR• the number of distinct values for an attribute at the higher levels

exceeds the generalization threshold for that attribute.– Merge identical tuples. Keep quantitative measure of tuples merged

to allow quantitative presentation of knowledge acquired.• Two main algorithms:

– Spatial-Data-Dominant Generalization– Non-Spatial-Dominant Generalization

Page 12: Spatial Data Mining CSE 6331, Fall 1999

Spatial Data-Dominant Generalization

• Collect all task-relevant data sets• Generalize the spatial data first by merging the spatial regions according to the

description in the concept hierarchy. Do not generalize beyond spatial generalization threshold limit (indicates max. number of spatial regions to create).

• Non-spatial data is analyzed for each spatial object using the attribute-oriented approach.• Asymptotic run time of this algorithm is O(N lg N) where N=# of spatial objects

Page 13: Spatial Data Mining CSE 6331, Fall 1999

Spatial-Data-Dominant Generalization

Page 14: Spatial Data Mining CSE 6331, Fall 1999

Non-Spatial-Data Dominant Generalization

• Collect task-relevant data• Perform attribute-oriented induction on non-spatial attributes.

– Generalization threshold indicates the stopping point for the generalization.

• Pointers to spatial objects are collected and put alongwith the generalized non-spatial data.

• Neighboring areas with same generalized value for the non-spatial attributes are merged together using “adjacent_to” spatial predicate.

• Complexity is O(N lg N) where N = # of spatial objects.

Page 15: Spatial Data Mining CSE 6331, Fall 1999

Non-Spatial-Data Dominant Generalization

Page 16: Spatial Data Mining CSE 6331, Fall 1999

What’s the problem with Generalized Data Mining

Techniques• Concept hierarchies can only be done by the domain expert.• Concept hierarchies may not be known in advance in unexplored

domains.• Interestingness and quality of the mined characteristic rules will

depend on the given concept hierarchy.

Page 17: Spatial Data Mining CSE 6331, Fall 1999

Spatial Association Rules• Why we

– All the methods discussed previously only mine characteristic rules. We sometimes need to find relationships (spatial and non-spatial) between attributes that belong to different classes.

• A spatial association rule is of the form X -->Y (c%), where X and Y are sets of spatial or nonspatial predicates and c% is the confidence of the rule.

• Examples--– is_a(X,City) and has(X, beach)-->close_to(X, sea) (Spatial RHS)– Is_a(X, student) --> goes_to(X, school) -- not a spatial association rule.

• Concepts of minimum support and confidence are used to generate rules. Different values at each level .

Page 18: Spatial Data Mining CSE 6331, Fall 1999

Topological Relations Hierarchy

g_close_to

not_disjoint close_to

INTERSECTS CONTAINS

adjacent_to intersects inside covers

EQUAL

containscovered_by

INSIDE

Page 19: Spatial Data Mining CSE 6331, Fall 1999

Spatial Association Rules• Use two-step spatial computation technique to reduce computations.• Computation starts at the high level of spatial predicates like g_close_to

(generalized close_to). • More detailed and finer, but more expensive, spatial computations are

applied at lower concept levels only to those patterns that are large at the level of the predicate g_close_to. – Filtration of large patterns saves a computations since much fewer spatial

association relationships left at the lower concept levels.• Still the user is required to provide concept hierarchies.

Page 20: Spatial Data Mining CSE 6331, Fall 1999

GeoMiner: A Spatial Data Mining Tool

• Can mine characteristic, comparison and association rules.• Support spatial data cube module construction, Spatial OLAP and

Spatial Data Mining modules.• Uses a spatial data mining language called GMQL, extension of

spatial SQL.• Functions on top of DBMiner. Directs all non-spatial data related

functions to DBMiner.

Page 21: Spatial Data Mining CSE 6331, Fall 1999

Geo-CharacterizerExample- Mine characteristic rulesMINE CHARACTERISTICS AS ``USA_states'' ANALYZE sum(pop) WITH RESPECT TO states_census.geo, statename, pop,

med_fam_income, with_bachelor_degp FROM states_census

Page 22: Spatial Data Mining CSE 6331, Fall 1999
Page 23: Spatial Data Mining CSE 6331, Fall 1999

Geo-AssociatorMINE SPATIAL ASSOCIATIONS AS "Golf Courses" WITH RESPECT TO Washington.geo, Washington_Golf_Courses.geo, CFCC FROM Washington_Golf_Courses, Washington WHERE CLOSE_TO(Washington.geo, Washington_Golf_Courses.geo, "3 km") AND Washington.CFCC "Golf Course" SET SUPPORT THRESHOLD 0.4 SET CONFIDENCE THRESHOLD 0.5

Page 24: Spatial Data Mining CSE 6331, Fall 1999

Future Work and Directions• Using multiple thematic maps.• Consider interleaving spatial and nonspatial generalizations to get

more efficient results.• Using parallel machines or distributed farms of workstations can

achieve good speed-up.• Extending Spatial Data Mining to use OO based approaches. • Using mature statistical methods may produce interesting new

techniques. • Use in Spatio-temporal databases to study data deviation and evolution

rules.

Page 25: Spatial Data Mining CSE 6331, Fall 1999

Conclusion• Spatial data mining is a promising field of research with wide

applications in GIS, medical imaging, robot motion planning, etc. • Although, the field is quite young, a number of algorithms and

techniques have been proposed to discover various kinds of knowledge from spatial data. I surveyed some of the existing methods for spatial data mining and mentioned their strengths and weaknesses. The variety of yet unexplored topics and problems makes knowledge discovery in spatial databases an attractive and challenging research field.

Page 26: Spatial Data Mining CSE 6331, Fall 1999

References• Wei Lu and Jiawei Han, `` Discovery of General Knowledge in Large

Spatial Databases'', Proc. of 1993 Far East Workshop on Geographic Information Systems (FEGIS'93), Singapore, June 1993, pp. 275-289

• J. Han, K. Koperski, and N. Stefanovic, `` GeoMiner: A System Prototype for Spatial Data Mining'', Proc. 1997 ACM-SIGMOD Int'l Conf. On Management of Data(SIGMOD'97), Tucson, Arizona, May 1997.

• K. Koperski and J. Han, `` Discovery of Spatial Association Rules in Geographic Information Databases'', Proc. 4th Int'l Symp. on Large Spatial Databases (SSD95), Maine, Aug. 1995, pp. 47-66.