Upload
stefan-keller
View
132
Download
0
Embed Size (px)
Citation preview
Trending Places
on OpenStreetMap
State of the Map, Brussels, 24.9.2016 Stefan Keller Geometa Lab at HSR University of Applied Sciences Rapperswil
Trending Places on
OpenStreetMap • A big data project with a Twitter bot
• @trending_places (and github)
• Goal: Find significant viewing activity
worldwide on the main web map (“slippy
map”) of OpenStreetMap (OSM)
• This activity may be indicative of popular
news or events in that region
Log data
• A web map consists of map tiles at
different zoom levels
• The views of these tiles are logged daily
and published in an anonymized form with
a delay of 2 days
• http://planet.openstreetmap.org/tile_logs/
Log count
for each line in all the logs
{
z, x, y = extract coordinate from line
ip = extract source IP address from line
counter[z, x, y] += 1
source_addresses[z, x, y].append(ip)
}
for each (z, x, y) key in counter
{
if counter[z, x, y] >= 10 {
if count_unique(source_addresses[z, x, y]) >= 3 {
print z, x, y, counter[z, x, y]
}
}
}
File Format (as TSV): date,z,y,z where z=zoom, x/y=TMS index
How?
• For previous 7 days the tile view logs are aggregated up to zoom level 14
• A T-score is calculated to standardize the data
• Values above a certain threshold are filtered out to catch spikes
• These spikes are ranked relative to the mean increase in views overall (compensates growth of OSM)
• Clustering eliminates locations that are near one another
• Tile coordinates are reverse geocoded using Nominatim in order to get geographic names
• A Twitter bot @trending_places announces the top 10 each day arfter 10 a.m. (or en error in case)
Visualizing OSM.org's Map
Views • Lukas Martinelli (@lukmartinelli), May 2016
http://lukasmartinelli.ch/python/2015/05/24/parsing-and-visualizing-osm-access-logs.html
• Martin Raifer (@tyr_asd), September 2016 http://www.openstreetmap.org/user/tyr_asd/diary/39434
Challenges: Reverse
Geocoding • Given a coordinates (from tile boundary)
• Give most relevant geographic name
inside / nearby
• Using place geographic names
=> Nominatim
• (no POIs yet)
Ex. of strong correlation:
Fort McMurray (CA)
1-3 May 2016: Wildfire across approximately 5900 square km
(1/6 Belgium 2x Luxembourg), destroying ~2,400 homes
Ex. of strong correlation #2:
Flüelen (CH)
1 June 2016: Switzerland celebrated the world's longest railway
tunnel (“The Gotthard Base Tunnel”) through the Alps…
Example of strong corr. #3:
San Severino Marche (IT)
24 August 2016: Earthquake of 6.2 on the moment magnitude scale hit
Central Italy. Its epicentre was southeast of Perugia and north of L'Aquila,
in an area near the borders of the Umbria, Lazio, Abruzzo and Marche
regions. As of 16 September 2016, 297 people have been killed
More statistics…
• Processing time: 5h (using SQLite / Python)
• Reporting period: 2016-04-11 - 2016-09-18
• No. reports: 125 (out of 160 days)
• Top 10 countries overall: RU 293, US 131, DE 70, UA 67, FR 46, PL 44, NO 43, ES 35, RO 33, GB 31
• Top 10 place names overall: Saratovsky District (RU) 16, 57.04.53.26 (RU) 13, Stara Emetivka (UA) 13, Tatarstan (RU) 13, Jambyl Province (KZ) 11, Johor Bahru (MY Malaysia) 11, Odessa (UA) 11, Shimen (TW) 11, N.N. 11, Black Point (US) 10
Open questions
• Why so much russian places (and places
from post-Soviet states)?
• Influence of crawling?
• Bias of places with spikes after zero
activity vs. crowded places?
• Other bias?
• Better than T-Score? E.g. w/ Poisson
Distribution (multivariate ARIMA?)
Final open questions…
• Do you know…
– Sea Cliff (US),
– Sitionuevo (CO),
– Athens (GR),
– Sacele (RO), or
– Pretoria (ZA) ………?
• Wonder why !