32
Identifying and Visualizing Spatiotemporal Clusters on Map Tiles Markus L¨ ocher July 2012

Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Embed Size (px)

DESCRIPTION

Scoring unusual events in space and time has been an active and important field of research for decades: How do we (i) distinguish normal fluctuations in a stochastic count process from real additive events, (ii) identify spatiotemporal clusters where the event is most strongly pronounced ? and (iii) how do we efficiently graph these clusters in a map overlay ? Supervised learning algorithms are proposed as an alternative to the computationally expensive scan statistic. The task can be reduced to detecting over-densities in space relative to a background density. We frame the relative density estimation as a binary classification problem. In the light of recent advances of embedding map tiles in statistical software via the library RgoogleMaps we have developed an integrated hotspot visualizer. We can now efficiently identify and visualize spatiotemporal clusters in one environment.

Citation preview

Page 1: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Identifying and Visualizing SpatiotemporalClusters on Map Tiles

Markus Locher

July 2012

Page 2: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

MotivationSpatial Plots in RSpatiotemporal Clusters

RgoogleMapsStatic Google MapsIntegration of maps into R

Hot Spot DetectionSatScanSpatialEpiUnsupervised as Supervised Learning

TreesPRIM

OutlookOpenstreetMapsOffline use and enlarged maps

Page 3: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Motivation I

Page 4: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Motivation II a

The meuse data set gives locations and topsoil heavy metal concentrations, along with a number of soil and landscape variablesat the

observation locations, collected in a flood plain of the river Meuse, near the village of Stein (NL). Heavy metal concentrations are from

composite samples of an area of approximately 15 m x 15 m.

Page 5: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Motivation II b

Page 6: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Examples: pennLC incidence

County-level (n = 67) population/case data for lung cancer in Pennsylvania in2002, stratified on race (white vs non-white), gender and age (Under 40, 40-59,60-69 and 70+). The data also contain county-specific smoking rates.

0.0263 − 0.03640.0364 − 0.05040.0504 − 0.06980.0698 − 0.09670.0967 − 0.134

Page 7: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Back to 1854

Page 8: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Spatiotemporal Clusters

Scoring unusual events in space and time has been an active andimportant field of research for decades: How do we

I distinguish normal fluctuations in a stochastic count process fromreal additive events ?

I identify spatiotemporal clusters where the event is most stronglypronounced ?

I efficiently graph these clusters in a map overlay ?

Supervised learning algorithms are proposed as an alternative to thecomputationally expensive scan statistic.The task can be reduced to detecting over-densities in space relative to abackground density.

Page 9: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Motivation III

Geovisual analytics of spatial scan statisticsCombining geovisual analytics with spatial scan statisticSpatial scan statistics method s are widely used in identifying and verifying spatial clusters (e.g., crime or disease clusters). However, two issues make using the methods and interpreting the results non-trivial: (1) the methods lack cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling parameters. Most spatial scan statistics provide no direct support for making these choices. In order to address these issues, we have developed a novel geovisual analytics approach that coordinates with Kulldorff’s spatial scan statistic as an example. This approach, which we have implemented in Visual Inquiry Toolkit, mitigates the sensitivity problem and enhances the interpretation of SaTScan results.

In the example below, U.S. vehicle theft data in 2000 are scanned with scales of 4%, 6%, 8%, 10%, 20%, 30%, 40%, 50% of total population at risk. The map matrix shows instability of clusters reported at smaller scales. The reliability map displays reliable, high risk clusters of , U.S. vehicle theft in 2000.

Benefit: Geovisualanalytics of spatial scan statistics support s spatial cluster analysis at multiple scales, mitigating scale sensitivity of spatial scan statistics and enhancing the interpretation of the results.

Early Development Lab Prototype Commercial Product

Jan 2009

Funded by: The National Science Foundation and The Department of Homeland Security.

For more information, contact:Alan MacEachren, (814) 865-7491, [email protected]

www.geovista.psu.edu/NEVAC

Multi-scale analysis of spatial clusters with Visual Inquiry Toolkit : Map matrix (left) displays high risk clusters (as highlighted in orange). Some clusters are heterogeneous – containing normal or low risk locations ( in white or blue ). The reliability map below displays reliable, high risk clusters (in dark green)

North-East Visualization and Analytics Center

Page 10: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Static Google Maps API

Page 11: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

zoom level

Page 12: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Quadtree

Page 13: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

map type

Page 14: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

customized maps

Page 15: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

RgoogleMaps

Shortcomings of API:

I URL Size Restriction: Static Map URLs are restricted to 2048characters in size. In practice, you will probably not have need for URLslonger than this, unless you produce complicated maps with a highnumber of markers and paths. Note, however, that certain characters maybe URL-encoded by browsers and/or services before sending them off tothe Static Map service, resulting in increased character usage

I Marker and line styles limited

I Switching Environments slows down development

The RgoogleMaps package serves two purposes:

I Provide a comfortable R interface to query the Google server for staticmaps

I Use the map as a background image to overlay plots within R. Thisrequires proper coordinate scaling.

Note: For dynamic map overlays there are other great packages such as

plotGoogleMaps and googleVis.

Page 16: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

3 Essential Commands

mapSD = GetMap(center=c(32.7073, -117.162), zoom=10,destfile=’SDconv.png’)

Page 17: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

PlotOnStaticMap

PlotOnStaticMap(mapSD, lat=myTrails$lon, lon=myTrails$lon,col=myTrails$col, cex = myTrails$cex)

Page 18: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

PlotPolysOnStaticMap

PlotPolysOnStaticMap(map, shp, lwd=.5, col = shp[,’col’]);shp=importShapefile(shpFile,projection=’LL’);

Page 19: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Scan Statistic

In this spatial surveillance setting, each day we have data collected for a set ofdiscrete spatial locations si. For each locationsi , we have a count ci (e.g.number of disease cases), and an underlying baseline bi . The baseline maycorrespond to the underlying population at risk, or may be an estimate of theexpected value of the count (e.g. derived from the time series of previous countdata).

Our goal, then, is to find if there is any spatial region S (set of locations si ) for

which the counts are significantly higher than expected, given the baselines. For

simplicity, we assume that the locations si are aggregated to a uniform,

two-dimensional, N × N grid G , and we search over the set of all rectangular

regions

Page 20: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

CitySense

This type of spatial surveillance is computationally expensive: O(R · N4)

Page 21: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Examples: pennLC clusters

80°W 79°W 78°W 77°W 76°W 75°W

39.5

°N40

°N40

.5°N

41°N

41.5

°N42

°N42

.5°N

Most Likely Cluster

Page 22: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Examples: NYleukemia

Census tract level (n=277) leukemia data for the 8 counties in upstate NewYork from 1978− 1982, paired with population data from the 1980 census.

Page 23: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Examples: NYleukemia

77.5°W 77°W 76.5°W 76°W 75.5°W 75°W 74.5°W

42°N

42.2

°N42

.4°N

42.6

°N42

.8°N

43°N

43.2

°N43

.4°N

Most Likely Cluster

Page 24: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Unsupervised as Supervised Learning

Introduced in Hastie et al for density estimation or association rulegeneralizations. Problem must be enlarged with a simulated data set generatedby Monte Carlo techniques

-1 0 1 2

-20

24

6

••

••

••

••

••

••

••

••

••

••

••

••

•• ••

• ••

••

••

••

••

••

••

••

••

••

• ••

• •

•••• •••

••

• •

•••

••

••

• •

••

••

••

-1 0 1 2

-20

24

6

••

••

••

••

••

••

••

••

••

••

••

••

•• ••

• ••

••

••

••

••

••

••

••

••

••

• ••

• •

•••• •••

••

• •

•••

••

••

• •

••

••

••

••

••

••

••

••

• •

••

••

••

•••

••

• •

••

••

•••

••

••

••

••

••

••

••

••

• •

• •

X1X1

X2

X2

FIGURE 14.3. Density estimation via classification. (Left panel:) Training setof 200 data points. (Right panel:) Training set plus 200 reference data points,generated uniformly over the rectangle containing the training data. The trainingsample was labeled as class 1, and the reference sample class 0, and a semipara-metric logistic regression model was fit to the data. Some contours for g(x) areshown.

In epidemiology cases and population naturally provide two classes, for anomaly

detection the background populationıs taken to be some sort of average.

Page 25: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Detecting simple clusters of overdensity

Page 26: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

CART

Page 27: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

patient rule induction method

Introduced in Hastie et al: Bump Hunting

Algorithm 9.3 Patient Rule Induction Method.

1. Start with all of the training data, and a maximal box containing allof the data.

2. Consider shrinking the box by compressing one face, so as to peel offthe proportion α of observations having either the highest values ofa predictor Xj , or the lowest. Choose the peeling that produces thehighest response mean in the remaining box. (Typically α = 0.05 or0.10.)

3. Repeat step 2 until some minimal number of observations (say 10)remain in the box.

4. Expand the box along any face, as long as the resulting box meanincreases.

5. Steps 1–4 give a sequence of boxes, with different numbers of obser-vations in each box. Use cross-validation to choose a member of thesequence. Call the box B1.

6. Remove the data in box B1 from the dataset and repeat steps 2–5 toobtain a second box, and continue to get as many boxes as desired.

Page 28: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

PRIM example

1

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

2

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

3

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

4

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

5

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

6

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

7

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

8

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

12

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

17

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

22

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

27

oo

o

o

o

oo

o

o

o

oo

o

o

o

o

o

o

oo

o

o

oo

o

o

o

o

oo

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

oo o

o

o

oo

oo

o

o

oo

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

oo

o

ooo

o

o

o

o

oo

o

o

o

o

o oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

oo

oo

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

oo

o

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

oo

o

o

o

o

Page 29: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

PRIM boxes found

Uniform Background Clustered Background

Page 30: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Outlook, IssuesI Openstreet Maps (OSM)

I Offline use and enlarged mapsI Local Database of map tilesI Keep size reasonable by limiting number of zoom levels and spatial

extent.I Construction of static map from local tiles

Page 31: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Google Map Math

With ˜lat = π · lat/180, the transformation from lat/lon to pixels is given by:

Y =1

2πlog

(1 + sin ( ˜lat)

1− sin ( ˜lat)

)(1)

Y = 2zoom−1 ∗ (1− Y ),X = 2zoom−1 ∗ (X + 1) (2)

The integer part of X ,Y specifies the tile, whereas the fractional part times256 is the pixel coordinate within the Tile itself:

x = 256 ∗ (X − bX ), y = 256 ∗ (Y − bY )

Inverting these relationships is rather straightforward. Eq. (1) leads to

˜lat = 2πn ± sin−1

(exp 2πY − 1

exp 2πY + 1

)+ π, n ∈ Z (3)

whereas inverting Eqs. (2) gives

Y = 1− Y /2zoom−1, X = X/2zoom−1 − 1

For longitude, the inverse mapping is much simpler: Since X = lon/180, we get

lon = 180 ·(

X/2zoom−1 − 1).

Page 32: Identifying and Visualizing Spatiotemporal Clusters on Map Tiles

Offline use and enlarged mapsI Local Database of map tiles

I Keep size reasonable by limiting number of zoom levels and spatialextent.

I Construction of static map from local tiles