Upload
mloecher
View
358
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Scoring unusual events in space and time has been an active and important field of research for decades: How do we (i) distinguish normal fluctuations in a stochastic count process from real additive events, (ii) identify spatiotemporal clusters where the event is most strongly pronounced ? and (iii) how do we efficiently graph these clusters in a map overlay ? Supervised learning algorithms are proposed as an alternative to the computationally expensive scan statistic. The task can be reduced to detecting over-densities in space relative to a background density. We frame the relative density estimation as a binary classification problem. In the light of recent advances of embedding map tiles in statistical software via the library RgoogleMaps we have developed an integrated hotspot visualizer. We can now efficiently identify and visualize spatiotemporal clusters in one environment.
Citation preview
Identifying and Visualizing SpatiotemporalClusters on Map Tiles
Markus Locher
July 2012
MotivationSpatial Plots in RSpatiotemporal Clusters
RgoogleMapsStatic Google MapsIntegration of maps into R
Hot Spot DetectionSatScanSpatialEpiUnsupervised as Supervised Learning
TreesPRIM
OutlookOpenstreetMapsOffline use and enlarged maps
Motivation I
Motivation II a
The meuse data set gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variablesat the
observation locations, collected in a flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from
composite samples of an area of approximately 15 m x 15 m.
Motivation II b
Examples: pennLC incidence
County-level (n = 67) population/case data for lung cancer in Pennsylvania in2002, stratified on race (white vs non-white), gender and age (Under 40, 40-59,60-69 and 70+). The data also contain county-specific smoking rates.
0.0263 − 0.03640.0364 − 0.05040.0504 − 0.06980.0698 − 0.09670.0967 − 0.134
Back to 1854
Spatiotemporal Clusters
Scoring unusual events in space and time has been an active andimportant field of research for decades: How do we
I distinguish normal fluctuations in a stochastic count process fromreal additive events ?
I identify spatiotemporal clusters where the event is most stronglypronounced ?
I efficiently graph these clusters in a map overlay ?
Supervised learning algorithms are proposed as an alternative to thecomputationally expensive scan statistic.The task can be reduced to detecting over-densities in space relative to abackground density.
Motivation III
Geovisual analytics of spatial scan statisticsCombining geovisual analytics with spatial scan statisticSpatial scan statistics method s are widely used in identifying and verifying spatial clusters (e.g., crime or disease clusters). However, two issues make using the methods and interpreting the results non-trivial: (1) the methods lack cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling parameters. Most spatial scan statistics provide no direct support for making these choices. In order to address these issues, we have developed a novel geovisual analytics approach that coordinates with Kulldorff’s spatial scan statistic as an example. This approach, which we have implemented in Visual Inquiry Toolkit, mitigates the sensitivity problem and enhances the interpretation of SaTScan results.
In the example below, U.S. vehicle theft data in 2000 are scanned with scales of 4%, 6%, 8%, 10%, 20%, 30%, 40%, 50% of total population at risk. The map matrix shows instability of clusters reported at smaller scales. The reliability map displays reliable, high risk clusters of , U.S. vehicle theft in 2000.
Benefit: Geovisualanalytics of spatial scan statistics support s spatial cluster analysis at multiple scales, mitigating scale sensitivity of spatial scan statistics and enhancing the interpretation of the results.
Early Development Lab Prototype Commercial Product
Jan 2009
Funded by: The National Science Foundation and The Department of Homeland Security.
For more information, contact:Alan MacEachren, (814) 865-7491, [email protected]
www.geovista.psu.edu/NEVAC
Multi-scale analysis of spatial clusters with Visual Inquiry Toolkit : Map matrix (left) displays high risk clusters (as highlighted in orange). Some clusters are heterogeneous – containing normal or low risk locations ( in white or blue ). The reliability map below displays reliable, high risk clusters (in dark green)
North-East Visualization and Analytics Center
Static Google Maps API
zoom level
Quadtree
map type
customized maps
RgoogleMaps
Shortcomings of API:
I URL Size Restriction: Static Map URLs are restricted to 2048characters in size. In practice, you will probably not have need for URLslonger than this, unless you produce complicated maps with a highnumber of markers and paths. Note, however, that certain characters maybe URL-encoded by browsers and/or services before sending them off tothe Static Map service, resulting in increased character usage
I Marker and line styles limited
I Switching Environments slows down development
The RgoogleMaps package serves two purposes:
I Provide a comfortable R interface to query the Google server for staticmaps
I Use the map as a background image to overlay plots within R. Thisrequires proper coordinate scaling.
Note: For dynamic map overlays there are other great packages such as
plotGoogleMaps and googleVis.
3 Essential Commands
mapSD = GetMap(center=c(32.7073, -117.162), zoom=10,destfile=’SDconv.png’)
PlotOnStaticMap
PlotOnStaticMap(mapSD, lat=myTrails$lon, lon=myTrails$lon,col=myTrails$col, cex = myTrails$cex)
PlotPolysOnStaticMap
PlotPolysOnStaticMap(map, shp, lwd=.5, col = shp[,’col’]);shp=importShapefile(shpFile,projection=’LL’);
Scan Statistic
In this spatial surveillance setting, each day we have data collected for a set ofdiscrete spatial locations si. For each locationsi , we have a count ci (e.g.number of disease cases), and an underlying baseline bi . The baseline maycorrespond to the underlying population at risk, or may be an estimate of theexpected value of the count (e.g. derived from the time series of previous countdata).
Our goal, then, is to find if there is any spatial region S (set of locations si ) for
which the counts are significantly higher than expected, given the baselines. For
simplicity, we assume that the locations si are aggregated to a uniform,
two-dimensional, N × N grid G , and we search over the set of all rectangular
regions
CitySense
This type of spatial surveillance is computationally expensive: O(R · N4)
Examples: pennLC clusters
80°W 79°W 78°W 77°W 76°W 75°W
39.5
°N40
°N40
.5°N
41°N
41.5
°N42
°N42
.5°N
Most Likely Cluster
Examples: NYleukemia
Census tract level (n=277) leukemia data for the 8 counties in upstate NewYork from 1978− 1982, paired with population data from the 1980 census.
Examples: NYleukemia
77.5°W 77°W 76.5°W 76°W 75.5°W 75°W 74.5°W
42°N
42.2
°N42
.4°N
42.6
°N42
.8°N
43°N
43.2
°N43
.4°N
Most Likely Cluster
Unsupervised as Supervised Learning
Introduced in Hastie et al for density estimation or association rulegeneralizations. Problem must be enlarged with a simulated data set generatedby Monte Carlo techniques
-1 0 1 2
-20
24
6
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
••
••
•
••
•
•
•
••
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
••
••
•
•
•
•
••
••
•
•
•
••
•
•
••
•
•
•
•
•
•• ••
•
•
• ••
•
•
•
•
••
•
•
•
•
••
•
•
•
•
••
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
••
•
••
•
•
••
•
•
•
••
•
• ••
•
•
• •
•
•••• •••
•
•
•
•
••
•
•
•
• •
•
•
•
•••
••
•
•
••
•
•
• •
•
•
•
••
•
•
•
•
•
••
••
•
-1 0 1 2
-20
24
6
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
••
••
•
••
•
•
•
••
•
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
••
••
•
•
•
•
••
••
•
•
•
••
•
•
••
•
•
•
•
•
•• ••
•
•
• ••
•
•
•
•
••
•
•
•
•
••
•
•
•
•
••
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
••
•
••
•
•
••
•
•
•
••
•
• ••
•
•
• •
•
•••• •••
•
•
•
•
••
•
•
•
• •
•
•
•
•••
••
•
•
••
•
•
• •
•
•
•
••
•
•
•
•
•
••
••
•
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
••
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
••
•
•
•
• •
•
••
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
• •
•
•
••
•
•
•
•
••
•
•
•••
•
•
••
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
••
••
•
••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
• •
•
•
•
•
•
•
•
•
• •
X1X1
X2
X2
FIGURE 14.3. Density estimation via classification. (Left panel:) Training setof 200 data points. (Right panel:) Training set plus 200 reference data points,generated uniformly over the rectangle containing the training data. The trainingsample was labeled as class 1, and the reference sample class 0, and a semipara-metric logistic regression model was fit to the data. Some contours for g(x) areshown.
In epidemiology cases and population naturally provide two classes, for anomaly
detection the background populationıs taken to be some sort of average.
Detecting simple clusters of overdensity
CART
patient rule induction method
Introduced in Hastie et al: Bump Hunting
Algorithm 9.3 Patient Rule Induction Method.
1. Start with all of the training data, and a maximal box containing allof the data.
2. Consider shrinking the box by compressing one face, so as to peel offthe proportion α of observations having either the highest values ofa predictor Xj , or the lowest. Choose the peeling that produces thehighest response mean in the remaining box. (Typically α = 0.05 or0.10.)
3. Repeat step 2 until some minimal number of observations (say 10)remain in the box.
4. Expand the box along any face, as long as the resulting box meanincreases.
5. Steps 1–4 give a sequence of boxes, with different numbers of obser-vations in each box. Use cross-validation to choose a member of thesequence. Call the box B1.
6. Remove the data in box B1 from the dataset and repeat steps 2–5 toobtain a second box, and continue to get as many boxes as desired.
PRIM example
1
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
2
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
3
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
4
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
5
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
6
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
7
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
8
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
12
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
17
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
22
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
27
oo
o
o
o
oo
o
o
o
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
o
o
o oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
PRIM boxes found
Uniform Background Clustered Background
Outlook, IssuesI Openstreet Maps (OSM)
I Offline use and enlarged mapsI Local Database of map tilesI Keep size reasonable by limiting number of zoom levels and spatial
extent.I Construction of static map from local tiles
Google Map Math
With ˜lat = π · lat/180, the transformation from lat/lon to pixels is given by:
Y =1
2πlog
(1 + sin ( ˜lat)
1− sin ( ˜lat)
)(1)
Y = 2zoom−1 ∗ (1− Y ),X = 2zoom−1 ∗ (X + 1) (2)
The integer part of X ,Y specifies the tile, whereas the fractional part times256 is the pixel coordinate within the Tile itself:
x = 256 ∗ (X − bX ), y = 256 ∗ (Y − bY )
Inverting these relationships is rather straightforward. Eq. (1) leads to
˜lat = 2πn ± sin−1
(exp 2πY − 1
exp 2πY + 1
)+ π, n ∈ Z (3)
whereas inverting Eqs. (2) gives
Y = 1− Y /2zoom−1, X = X/2zoom−1 − 1
For longitude, the inverse mapping is much simpler: Since X = lon/180, we get
lon = 180 ·(
X/2zoom−1 − 1).
Offline use and enlarged mapsI Local Database of map tiles
I Keep size reasonable by limiting number of zoom levels and spatialextent.
I Construction of static map from local tiles