Upload
ezra-gordon
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
Spatial Data Mining
hari agung
What is Spatial Data?
Used in/for: GIS - Geographic Information Systems Meteorology Astronomy Environmental studies, etc.
• The data related to objects that occupy space– traffic, bird habitats, global
climate, logistics, ... • Object types:
– Points, Lines, Polygons,etc.
Why do we need Data Mining?• Large number of records(cases) (108-1012
bytes)– One thousand (103) bytes = 1 kilobyte (KB)– One million (106) bytes = 1 megabyte (MB)– One billion (109) bytes = 1 gigabyte (GB)– One trillion (1012) bytes = 1 terabyte (TB)
• High dimensional data (variables)– 10-104 attributes
• Only a small portion, typically 5% to 10%, of the collected data is ever analyzed
• We are drowning in data, but starving for knowledge!
Spatial Data Mining• Spatial Patterns
– Spatial outliers– Location prediction– Associations, co-locations– Hotspots, Clustering, trends, …
• Primary Tasks– Mining Spatial Association Rules– Spatial Classification and Prediction – Spatial Data Clustering Analysis– Spatial Outlier Analysis
Spatial Classification
• Use spatial information at different (coarse/fine) levels (different indexing trees) for data focusing
• Determine relevant spatial or non-spatial features
• Perform normal supervised learning algorithms– e.g., Decision trees,
Spatial Clustering
• Use tree structures to index spatial data
• DBSCAN: R-tree
• CLIQUE: Grid or Quad tree
• Clustering with spatial constraints (obstacles need to adjust notion of distance)
Spatial Association Rules• Spatial objects are of major interest, not
transactions
• A B– A, B can be either spatial or non-spatial (3
combinations)– What is the fourth combination?
• Association rules can be found w.r.t. the 3 types
Spatial Data Mining Results
• Understanding spatial data, discovering relationships between spatial and nonspatial data, construction of spatial knowledge bases, etc.
• In various forms– The description of the general weather patterns in a set
of geographic regions is a spatial characteristic rule.– The comparison of two weather patterns in two
geographic regions is a spatial discriminant rule.– A rule like “most cities in Canada are close to the
Canada-US border” is a spatial association rule• near(x,coast) ^ southeast(x, USA) ) hurricane(x), (70%)
– Others: spatial clusters,…
Basic Concepts (1)• Spatial data mining follows along the same
functions in data mining, with the end objective to find patterns in geography, meteorology, etc.
• The main difference (Spatial autocorrelation)– the neighbors of a spatial object may have an influence
on it and therefore have to be considered as well
• Spatial attributes– Topological
• adjacency or inclusion information
– Geometric• position (longitude/latitude), area, perimeter, boundary polygon
Basic Concepts (2)• Spatial neighborhood
– Topological relation• “intersect”, “overlap”, “disjoint”,
…
– distance relation• “close_to”, “far_away”,…
– direction/orientation relation• “left_of”, “west_of”,…
• Global model might be inconsistent with regional models
Global Model
Local Model
Applications• NASA Earth Observing System (EOS):
Earth science data• National Inst. of Justice: crime mapping• Census Bureau, Dept. of Commerce:
census data• Dept. of Transportation (DOT): traffic data• National Inst. of Health(NIH): cancer
clusters
Example: What Kind of Houses Are Highly Valued?—Associative Classification
• Data– ERA-15 using a T106L31 model (from 1978 to 1994) with 1.125◦
resolution– Terabytes– Comprises data from approx. 20 variables (such as
temperature,humidity, pressure, etc.) at 30 pressure levels of a 360x360 nodes grid
6
SOM Application for DataMining
Downscaling Weather Forecasts
AdaptiveCompetitive
Learning
Sub-grid details scape from numerical models
Dept. of Applied Mathematics
Universidad de Cantabria
Santander, Spain
And now discussion