18
Trending Places on OpenStreetMap State of the Map, Brussels, 24.9.2016 Stefan Keller Geometa Lab at HSR University of Applied Sciences Rapperswil

Trending Places on OpenStreetMap

Embed Size (px)

Citation preview

Trending Places

on OpenStreetMap

State of the Map, Brussels, 24.9.2016 Stefan Keller Geometa Lab at HSR University of Applied Sciences Rapperswil

Trending Places on

OpenStreetMap • A big data project with a Twitter bot

• @trending_places (and github)

• Goal: Find significant viewing activity

worldwide on the main web map (“slippy

map”) of OpenStreetMap (OSM)

• This activity may be indicative of popular

news or events in that region

Log data

• A web map consists of map tiles at

different zoom levels

• The views of these tiles are logged daily

and published in an anonymized form with

a delay of 2 days

• http://planet.openstreetmap.org/tile_logs/

Log count

for each line in all the logs

{

z, x, y = extract coordinate from line

ip = extract source IP address from line

counter[z, x, y] += 1

source_addresses[z, x, y].append(ip)

}

for each (z, x, y) key in counter

{

if counter[z, x, y] >= 10 {

if count_unique(source_addresses[z, x, y]) >= 3 {

print z, x, y, counter[z, x, y]

}

}

}

File Format (as TSV): date,z,y,z where z=zoom, x/y=TMS index

How?

• For previous 7 days the tile view logs are aggregated up to zoom level 14

• A T-score is calculated to standardize the data

• Values above a certain threshold are filtered out to catch spikes

• These spikes are ranked relative to the mean increase in views overall (compensates growth of OSM)

• Clustering eliminates locations that are near one another

• Tile coordinates are reverse geocoded using Nominatim in order to get geographic names

• A Twitter bot @trending_places announces the top 10 each day arfter 10 a.m. (or en error in case)

Challenges: Crawling

activity on osm

Challenges: Clustering

around a trending place

Challenges: Ranking

trending places

Normalization

Challenges: Reverse

Geocoding • Given a coordinates (from tile boundary)

• Give most relevant geographic name

inside / nearby

• Using place geographic names

=> Nominatim

• (no POIs yet)

Ex. of strong correlation:

Fort McMurray (CA)

1-3 May 2016: Wildfire across approximately 5900 square km

(1/6 Belgium 2x Luxembourg), destroying ~2,400 homes

Ex. of strong correlation #2:

Flüelen (CH)

1 June 2016: Switzerland celebrated the world's longest railway

tunnel (“The Gotthard Base Tunnel”) through the Alps…

Example of strong corr. #3:

San Severino Marche (IT)

24 August 2016: Earthquake of 6.2 on the moment magnitude scale hit

Central Italy. Its epicentre was southeast of Perugia and north of L'Aquila,

in an area near the borders of the Umbria, Lazio, Abruzzo and Marche

regions. As of 16 September 2016, 297 people have been killed

More statistics…

• Processing time: 5h (using SQLite / Python)

• Reporting period: 2016-04-11 - 2016-09-18

• No. reports: 125 (out of 160 days)

• Top 10 countries overall: RU 293, US 131, DE 70, UA 67, FR 46, PL 44, NO 43, ES 35, RO 33, GB 31

• Top 10 place names overall: Saratovsky District (RU) 16, 57.04.53.26 (RU) 13, Stara Emetivka (UA) 13, Tatarstan (RU) 13, Jambyl Province (KZ) 11, Johor Bahru (MY Malaysia) 11, Odessa (UA) 11, Shimen (TW) 11, N.N. 11, Black Point (US) 10

Open questions

• Why so much russian places (and places

from post-Soviet states)?

• Influence of crawling?

• Bias of places with spikes after zero

activity vs. crowded places?

• Other bias?

• Better than T-Score? E.g. w/ Poisson

Distribution (multivariate ARIMA?)

Final open questions…

• Do you know…

– Sea Cliff (US),

– Sitionuevo (CO),

– Athens (GR),

– Sacele (RO), or

– Pretoria (ZA) ………?

• Wonder why !

https://youtu.be/olmL1fUnQAQ

Thanks

Also to Bhavya Chandra (main author, NTU Singapore), Matt Amos, Lukas Martinelli (@lukasmartinelli), Pavel Tyslacki (@tbicr), Joost Schouppe (joostjakob)

Stefan Keller Geometa Lab at HSR University of Applied Sciences Rapperswil (Switzerland) www.hsr.ch/geometalab @sfkeller