Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Identifying and Visualizing SpatiotemporalClusters on Map Tiles

Markus Locher

July 2012

MotivationSpatial Plots in RSpatiotemporal Clusters

RgoogleMapsStatic Google MapsIntegration of maps into R

Hot Spot DetectionSatScanSpatialEpiUnsupervised as Supervised Learning

TreesPRIM

OutlookOpenstreetMapsOffline use and enlarged maps

Motivation I

Motivation II a

The meuse data set gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variablesat the

observation locations, collected in a flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from

composite samples of an area of approximately 15 m x 15 m.

Motivation II b

Examples: pennLC incidence

County-level (n = 67) population/case data for lung cancer in Pennsylvania in2002, stratified on race (white vs non-white), gender and age (Under 40, 40-59,60-69 and 70+). The data also contain county-specific smoking rates.

0.0263 − 0.03640.0364 − 0.05040.0504 − 0.06980.0698 − 0.09670.0967 − 0.134

Back to 1854

Spatiotemporal Clusters

Scoring unusual events in space and time has been an active andimportant field of research for decades: How do we

I distinguish normal fluctuations in a stochastic count process fromreal additive events ?

I identify spatiotemporal clusters where the event is most stronglypronounced ?

I efficiently graph these clusters in a map overlay ?

Supervised learning algorithms are proposed as an alternative to thecomputationally expensive scan statistic.The task can be reduced to detecting over-densities in space relative to abackground density.

Motivation III

Geovisual analytics of spatial scan statisticsCombining geovisual analytics with spatial scan statisticSpatial scan statistics method s are widely used in identifying and verifying spatial clusters (e.g., crime or disease clusters). However, two issues make using the methods and interpreting the results non-trivial: (1) the methods lack cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling parameters. Most spatial scan statistics provide no direct support for making these choices. In order to address these issues, we have developed a novel geovisual analytics approach that coordinates with Kulldorff’s spatial scan statistic as an example. This approach, which we have implemented in Visual Inquiry Toolkit, mitigates the sensitivity problem and enhances the interpretation of SaTScan results.

In the example below, U.S. vehicle theft data in 2000 are scanned with scales of 4%, 6%, 8%, 10%, 20%, 30%, 40%, 50% of total population at risk. The map matrix shows instability of clusters reported at smaller scales. The reliability map displays reliable, high risk clusters of , U.S. vehicle theft in 2000.

Benefit: Geovisualanalytics of spatial scan statistics support s spatial cluster analysis at multiple scales, mitigating scale sensitivity of spatial scan statistics and enhancing the interpretation of the results.

Early Development Lab Prototype Commercial Product

Jan 2009

Funded by: The National Science Foundation and The Department of Homeland Security.

For more information, contact:Alan MacEachren, (814) 865-7491, [email protected]

www.geovista.psu.edu/NEVAC

Multi-scale analysis of spatial clusters with Visual Inquiry Toolkit : Map matrix (left) displays high risk clusters (as highlighted in orange). Some clusters are heterogeneous – containing normal or low risk locations ( in white or blue ). The reliability map below displays reliable, high risk clusters (in dark green)

North-East Visualization and Analytics Center

Static Google Maps API

zoom level

Quadtree

map type

customized maps

RgoogleMaps

Shortcomings of API:

I URL Size Restriction: Static Map URLs are restricted to 2048characters in size. In practice, you will probably not have need for URLslonger than this, unless you produce complicated maps with a highnumber of markers and paths. Note, however, that certain characters maybe URL-encoded by browsers and/or services before sending them off tothe Static Map service, resulting in increased character usage

I Marker and line styles limited

I Switching Environments slows down development

The RgoogleMaps package serves two purposes:

I Provide a comfortable R interface to query the Google server for staticmaps

I Use the map as a background image to overlay plots within R. Thisrequires proper coordinate scaling.

Note: For dynamic map overlays there are other great packages such as

plotGoogleMaps and googleVis.

3 Essential Commands

mapSD = GetMap(center=c(32.7073, -117.162), zoom=10,destfile=’SDconv.png’)

PlotOnStaticMap

PlotOnStaticMap(mapSD, lat=myTrails$lon, lon=myTrails$lon,col=myTrails$col, cex = myTrails$cex)

PlotPolysOnStaticMap

PlotPolysOnStaticMap(map, shp, lwd=.5, col = shp[,’col’]);shp=importShapefile(shpFile,projection=’LL’);

Scan Statistic

In this spatial surveillance setting, each day we have data collected for a set ofdiscrete spatial locations si. For each locationsi , we have a count ci (e.g.number of disease cases), and an underlying baseline bi . The baseline maycorrespond to the underlying population at risk, or may be an estimate of theexpected value of the count (e.g. derived from the time series of previous countdata).

Our goal, then, is to find if there is any spatial region S (set of locations si ) for

which the counts are significantly higher than expected, given the baselines. For

simplicity, we assume that the locations si are aggregated to a uniform,

two-dimensional, N × N grid G , and we search over the set of all rectangular

regions

CitySense

This type of spatial surveillance is computationally expensive: O(R · N4)

Examples: pennLC clusters

80°W 79°W 78°W 77°W 76°W 75°W

39.5

°N40

°N40

.5°N

41°N

41.5

°N42

°N42

.5°N

Most Likely Cluster

Examples: NYleukemia

Census tract level (n=277) leukemia data for the 8 counties in upstate NewYork from 1978− 1982, paired with population data from the 1980 census.

Examples: NYleukemia

77.5°W 77°W 76.5°W 76°W 75.5°W 75°W 74.5°W

42°N

42.2

°N42

.4°N

42.6

°N42

.8°N

43°N

43.2

°N43

.4°N

Most Likely Cluster

Unsupervised as Supervised Learning

Introduced in Hastie et al for density estimation or association rulegeneralizations. Problem must be enlarged with a simulated data set generatedby Monte Carlo techniques

-1 0 1 2

-20

24

6

•

•

•

•

•

•

•

•

•

•

••

•

•

•

•

•

•

•

•

•

••

••

•

••

•

•

•

••

•

•

•

••

•

•

•

•

•

•

•

•

•

•

•

•

•

••

••

•

•

•

•

••

••

•

•

•

••

•

•

••

•

•

•

•

•

•• ••

•

•

• ••

•

•

•

•

••

•

•

•

•

••

•

•

•

•

••

••

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

••

•

•

•

•

•

••

•

••

•

•

••

•

•

•

••

•

• ••

•

•

• •

•

•••• •••

•

•

•

•

••

•

•

•

• •

•

•

•

•••

••

•

•

••

•

•

• •

•

•

•

••

•

•

•

•

•

••

••

•

-1 0 1 2

-20

24

6

•

•

•

•

•

•

•

•

•

•

••

•

•

•

•

•

•

•

•

•

••

••

•

••

•

•

•

••

•

•

•

••

•

•

•

•

•

•

•

•

•

•

•

•

•

••

••

•

•

•

•

••

••

•

•

•

••

•

•

••

•

•

•

•

•

•• ••

•

•

• ••

•

•

•

•

••

•

•

•

•

••

•

•

•

•

••

••

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

••

•

•

•

•

•

••

•

••

•

•

••

•

•

•

••

•

• ••

•

•

• •

•

•••• •••

•

•

•

•

••

•

•

•

• •

•

•

•

•••

••

•

•

••

•

•

• •

•

•

•

••

•

•

•

•

•

••

••

•

•

••

•

•

•

•

•

•

•

•

•

•

•

•

•

•

••

••

•

•

•

•

•

••

•

•

•

•

•

•

•

•

•

••

•

•

•

• •

•

••

•

•

•

•

•

•

•

•

•

•

•

••

•

•

•

•

•

•

••

•

•

•

•

•

•

•

•

•••

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

••

•

•

•

• •

•

•

••

•

•

•

•

••

•

•

•••

•

•

••

••

•

•

•

•

•

•

•

•

•

•

•

•

•

•

••

•

•

•

•

•

•

•

•

•

••

•

•

•

•

•

•

•

•

•

••

••

•

••

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

••

•

• •

•

•

•

•

•

•

•

•

• •

X1X1

X2

X2

FIGURE 14.3. Density estimation via classification. (Left panel:) Training setof 200 data points. (Right panel:) Training set plus 200 reference data points,generated uniformly over the rectangle containing the training data. The trainingsample was labeled as class 1, and the reference sample class 0, and a semipara-metric logistic regression model was fit to the data. Some contours for g(x) areshown.

In epidemiology cases and population naturally provide two classes, for anomaly

detection the background populationıs taken to be some sort of average.

Detecting simple clusters of overdensity

CART

patient rule induction method

Introduced in Hastie et al: Bump Hunting

Algorithm 9.3 Patient Rule Induction Method.

1. Start with all of the training data, and a maximal box containing allof the data.

2. Consider shrinking the box by compressing one face, so as to peel offthe proportion α of observations having either the highest values ofa predictor Xj , or the lowest. Choose the peeling that produces thehighest response mean in the remaining box. (Typically α = 0.05 or0.10.)

3. Repeat step 2 until some minimal number of observations (say 10)remain in the box.

4. Expand the box along any face, as long as the resulting box meanincreases.

5. Steps 1–4 give a sequence of boxes, with different numbers of obser-vations in each box. Use cross-validation to choose a member of thesequence. Call the box B1.

6. Remove the data in box B1 from the dataset and repeat steps 2–5 toobtain a second box, and continue to get as many boxes as desired.

PRIM example

1

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

2

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

3

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

4

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

5

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

6

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

7

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

8

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

12

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

17

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

22

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

27

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

PRIM boxes found

Uniform Background Clustered Background

Outlook, IssuesI Openstreet Maps (OSM)

I Offline use and enlarged mapsI Local Database of map tilesI Keep size reasonable by limiting number of zoom levels and spatial

extent.I Construction of static map from local tiles

Google Map Math

With ˜lat = π · lat/180, the transformation from lat/lon to pixels is given by:

Y =1

2πlog

(1 + sin ( ˜lat)

1− sin ( ˜lat)

)(1)

Y = 2zoom−1 ∗ (1− Y ),X = 2zoom−1 ∗ (X + 1) (2)

The integer part of X ,Y specifies the tile, whereas the fractional part times256 is the pixel coordinate within the Tile itself:

x = 256 ∗ (X − bX ), y = 256 ∗ (Y − bY )

Inverting these relationships is rather straightforward. Eq. (1) leads to

˜lat = 2πn ± sin−1

(exp 2πY − 1

exp 2πY + 1

)+ π, n ∈ Z (3)

whereas inverting Eqs. (2) gives

Y = 1− Y /2zoom−1, X = X/2zoom−1 − 1

For longitude, the inverse mapping is much simpler: Since X = lon/180, we get

lon = 180 ·(

X/2zoom−1 − 1).

Offline use and enlarged mapsI Local Database of map tiles

I Keep size reasonable by limiting number of zoom levels and spatialextent.

I Construction of static map from local tiles

Technology

Identifying and Visualizing Spatiotemporal Clusters on Map Tiles