Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Lauren Bennett, Flora Vale, Alberto Nieto
Spatial Data Mining: Cluster Analysis and Space-Time Analysis
esriurl.com/spatialstats
Subjectivity of Maps
High Priority 911 Calls in Baltimore
The map as data
Where are the hot spots? Where is the variation greater?
High Priority 911 Calls in Baltimore
The map as data
High Priority 911 Calls in Baltimore
The map as data
Where are the hot spots? Where is the variation greater?
The subjectivity of visual pattern analysisNatural Breaks Quantile
Where are the hot spots? Where is the variation greater?
Minimizing the subjectivityTurning the map into information
Inferential Statistics
Complete Spatial RANDOMNESS
Is there a PATTERN?
3565 89 799214 15 32 38
46 26 43 38 32 79 50 28 84
19 71 69 39 93 75 10 58 20
97 49 44 59 23 07 81 64 06
28 62 08 99 86 28 03 48 25
35
65
8979
92
14
15
62 3846
26
4338
32
79
5028
84
19
71
69
39
93
7510
5820
97
49
4459
23
07
81
64
03
2832
08
99
83
28
0648
25
35
65
8979
92
14
15
62 3846
26
4338
32
79
5028
84
19
71
69
39
93
7510
5820
97
49
4459
23
07
81
64
03
2832
08
99
83
28
0648
25
z-scoresp-values
99%99%
95%95%
90% 90%random
z-scores …-2.58 -1.96 -1.65 0 1.65 1.96 2.58…p-value … 0.01 0.05 0.10 0 0.10 0.05 0.01…
z-scores and p-values
Minimizing the subjectivityTurning the map into information
“…everything is related to everything else, but near things are more related than distant things.”
Hot Spot Analysis
given a set of weighted features, identifies statistically significant hot spots and cold spots using the Getis-Ord Gi* statistic
Polygons
feature
each feature has a
value
Neighborhood
Study Area
is this
significantly different from this?
if significantly higher…
feature is marked as a
hot spot!
…how do we know if it’s
SIGNIFICANTLY different???
MATH!
Getis-Ord Gi* Statistic
Points
“Where are the
Hot Spots?”
“Where are
the highest values?”
not (necessarily)
the same as
How are neighborhoodsizes determined?
Fixed Distance Band
Fixed Distance Band
Fixed Distance Band
Fixed Distance Band
Fixed Distance Band
Fixed Distance Band
z-sc
ore
Spatial Autocorrelation by Distance5
4
3
2
1
Distance (meters)60 80 100 120 140 160 180
Spatial Autocorrelation by Distance
z-sc
ore
5
4
3
2
1
Distance (meters)60 80 100 120 140 160 180
Spatial Autocorrelation by Distance
z-sc
ore
Distance (meters)60 80 100 120 140 160 180
5
4
3
2
1
Fixed Distance Band = 70.3 meters
16
1617
14
32
16
1617
14
32
16
1617
14
32
18
2
dispersed
clustered
Contiguity Edges
Contiguity Rook’s Case
Contiguity Edges/Corners
Contiguity Queen’s Case
K Nearest Neighbors
K Nearest Neighbors
K Nearest Neighbors
Network Spatial Weights
Network Spatial Weights
demo
given a set of weighted features, identifies statistically significant hot spots, cold spots, and spatial outliers using the Anselin Local Moran's I statistic
Cluster and Outlier Analysis
is the neighborhood significantly different from all other neighborhoods?
is the neighborhood significantly different from all other neighborhoods?
is the feature significantly different from all other features?
HH
feature is higher than other features, neighborhood is higherthan other neighborhoods
HH
LL
feature is higher than other features, neighborhood is higherthan other neighborhoods
feature is lower than other features, neighborhood is lower than other neighborhoods
HHHL
LL
feature is higher than other features, neighborhood is higherthan other neighborhoods
feature is lower than other features, neighborhood is lower than other neighborhoods
feature is higher than other features, neighborhood is lowerthan other neighborhoods
LH
HHHL
LL
feature is higher than other features, neighborhood is higherthan other neighborhoods
feature is lower than other features, neighborhood is lower than other neighborhoods
feature is lower than other features, neighborhood is higher than other neighborhoods
feature is higher than other features, neighborhood is lowerthan other neighborhoods
outliers clustersz-score
demo
911 CallsSpace Time Pattern MiningDefining neighbors in space and time
Hot Spot Analysis Cluster and Outlier Analysis
Quantifying Spatial Clusters
What about time?
Where have things changed? Is it meaningful?
January 2016911 Calls
January 2015
911 Calls
“…everything is related to everything else, but
things are more related than distant things.”
near and recent
Quantifying Spatiotemporal Clusters
The Space Time Pattern Mining Tools
Create Space Time Cube By Aggregating Points
aggregates a set of points into space-time bins, counting the number of points and optionally calculating statistics for attributes
10km
February
March
April
May
June
July
10km10km
10km10km
10km10km
January
10km10km
10km10km
10km10km
10km
bin
location
time slice
counts
trend analysis
summaryfieldssummeanmedianminmaxstandard deviation
filling empty bins
zeros
spatial neighbors
space time neighbors
space time neighbors
temporal neighbors
temporal neighbors
past present
3/22 3/23 3/24 3/25 3/26 3/27 3/28 3/29 3/30 3/31 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/12 4/13 4/14 4/15 4/16
First data point:March 22, 2018
(start)
Last data point:April 16, 2018
(end)
past present
End Time
3/213/203/19
ending on 3/23back to but not including
3/19
ending on 3/31back to but not including
3/27
ending on 4/8back to but not including
4/4
ending on 4/12back to but not including
4/8
ending on 4/16back to but not including
4/12
ending on 4/4back to but not including
3/31
ending on 3/27back to but not including
3/23
3/22 3/23 3/24 3/25 3/26 3/27 3/28 3/29 3/30 3/31 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/12 4/13 4/14 4/15 4/16
First data point:March 22, 2018
(start)
Last data point:April 16, 2018
(end)
past present
End Time
3/213/203/19
ending on 3/23back to but not including
3/19
ending on 3/31back to but not including
3/27
ending on 4/8back to but not including
4/4
ending on 4/12back to but not including
4/8
ending on 4/16back to but not including
4/12
ending on 4/4back to but not including
3/31
ending on 3/27back to but not including
3/23
3/22 3/23 3/24 3/25 3/26 3/27 3/28 3/29 3/30 3/31 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/12 4/13 4/14 4/15 4/16
First data point:March 22, 2018
(start)
Last data point:April 16, 2018
(end)
past present
Start Time
3/22 3/23 3/24 3/25 3/26 3/27 3/28 3/29 3/30 3/31 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/12 4/13 4/14 4/15 4/16 4/194/184/17
starting on 4/11up to but not including
4/15
starting on 3/22up to but not including
3/26
First data point:March 22, 2018
(start)
starting on 3/30up to but not including
4/3
starting on 4/7up to but not including
4/11
starting on 4/15up to but not including
4/19
starting on 4/3up to but not including
4/7
starting on 3/26up to but not including
3/30
Last data point:April 16, 2018
(end)
past present
Start Time
3/22 3/23 3/24 3/25 3/26 3/27 3/28 3/29 3/30 3/31 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/12 4/13 4/14 4/15 4/16 4/194/184/17
starting on 4/11up to but not including
4/15
starting on 3/22up to but not including
3/26
First data point:March 22, 2018
(start)
starting on 3/30up to but not including
4/3
starting on 4/7up to but not including
4/11
starting on 4/15up to but not including
4/19
starting on 4/3up to but not including
4/7
starting on 3/26up to but not including
3/30
Last data point:April 16, 2018
(end)
Temporal Bias
Aggregation Options
fishnetgrid
hexagongrid
defined locations
Create Space Time Cube From Defined Locations
structures panel data or station data where geography does not change but attributes change over time into space-time bins
GSB002
GSB003GSB001
ID DateTime Reading
GSB001 09/14/2015 00:00:00 03.14
GSB001 09/14/2015 05:00:00 15.92
GSB001 09/14/2015 10:00:00 69.35
GSB002 09/14/2015 00:00:00 89.79
GSB002 09/14/2015 05:00:00 32.38
GSB002 09/14/2015 10:00:00 46.26
GSB003 09/14/2015 00:00:00 43.38
Data can be one set of features with a related table containing attributes over time
GSB002 GSB003GSB001
or repeating shapes contained within the same feature class
09/14/2015 00:00:00
09/14/2015 05:00:00
09/14/2015 10:00:00
or repeating shapes contained within the same feature class
Fill Missing Values
demo
Emerging Hot Spot Analysisidentifies trends in the clustering of counts or summary fields in a space time cube
significantly different from this?
is this
Entire Cube
significantly different from this?
is this
Neighborhood Time Step
significantly different from this?
is this
Neighborhood Time Step
significantly different from this?
is this
Individual Time Step
significantly different from this?
is this
Individual Time Step
New Hot Spot
New Cold Spot
Consecutive Hot Spot
Intensifying Hot Spot
Consecutive Cold Spot
Persistent Hot Spot
Intensifying Cold Spot
Diminishing Hot Spot
Persistent Cold Spot
Sporadic Hot Spot
Diminishing Cold Spot
Oscillating Hot Spot
Sporadic Cold Spot
Historical Hot Spot
Oscillating Cold Spot
No trend detected
Historical Cold Spot
New Hot Spot
New Cold Spot
Consecutive Hot Spot
Intensifying Hot Spot
Consecutive Cold Spot
Persistent Hot Spot
Intensifying Cold Spot
Diminishing Hot Spot
Persistent Cold Spot
Sporadic Hot Spot
Diminishing Cold Spot
Oscillating Hot Spot
Sporadic Cold Spot
Historical Hot Spot
Oscillating Cold Spot
No trend detected
Historical Cold Spot
New Hot Spot
New Cold Spot
Consecutive Hot Spot
Intensifying Hot Spot
Consecutive Cold Spot
Persistent Hot Spot
Intensifying Cold Spot
Diminishing Hot Spot
Persistent Cold Spot
Sporadic Hot Spot
Diminishing Cold Spot
Oscillating Hot Spot
Sporadic Cold Spot
Historical Hot Spot
Oscillating Cold Spot
No trend detected
Historical Cold Spot
New Hot Spot
New Cold Spot
Consecutive Hot Spot
Intensifying Hot Spot
Consecutive Cold Spot
Persistent Hot Spot
Intensifying Cold Spot
Diminishing Hot Spot
Persistent Cold Spot
Sporadic Hot Spot
Diminishing Cold Spot
Oscillating Hot Spot
Sporadic Cold Spot
Historical Hot Spot
Oscillating Cold Spot
No trend detected
Historical Cold Spot
demo
Local Outlier Analysisidentifies statistically significant clusters and outliers in the context of both space and time
is the neighborhood significantly different from all other neighborhoods?
is the bin significantly different from all other bins?
bin is higher than other bins, neighborhood is lowerthan other neighborhoods
bin is higher than other bins, neighborhood is higher than other neighborhoods
bin is lower than other bins, neighborhood is lower than other neighborhoods
bin is lower than other bins, neighborhood is higher than other neighborhoods
LH
HHHL
LL
Only High-High Cluster
Only High-Low Outlier
Only Low-High Outlier
Only Low-Low Cluster
Multiple Types
Never Significant
Only High-High Cluster
Only High-Low Outlier
Only Low-High Outlier
Only Low-Low Cluster
Multiple Types
Never Significant
Only High-High Cluster
Only High-Low Outlier
Only Low-High Outlier
Only Low-Low Cluster
Multiple Types
Never Significant
Only High-High Cluster
Only High-Low Outlier
Only Low-High Outlier
Only Low-Low Cluster
Multiple Types
Never Significant
demo
Want to learn more???
esriurl.com/spatialstats
TUESDAY_________________________________________
1:45p Data Visualization for Spatial Analysis 146C
3:00p Machine Learning in ArcGIS 146C
4:15p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C
WEDNESDAY______________________________________
8:30a Machine Learning in ArcGIS 146C
11a Data Visualization for Spatial Analysis 146C
1:30p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C
2:45p Spatial Data Mining: Cluster Analysis and Space Time Analysis 146C
4:00p Beyond Where: Modeling Spatial Relationships and Making Predictions 146C
5:15p The Forest for the Trees: Making Predictions Using Forest-Based Classification and Regression 146C
Please fill out a course survey!!!