21
Fusing OpenStreetMap with Wikipedia Ulmon GmbH 08/05/2014 Linuxwochen Wien

Fusing openstreetmap with wikipedia

  • Upload
    ulmon

  • View
    55

  • Download
    3

Tags:

Embed Size (px)

DESCRIPTION

Ulmon's recipe for a travel guide is to fuse multiple open sources of data that you may otherwise use individually to plan your vacation, and present them as a coherent package. We are trying to fuse this data in such a way that the resulting whole is more valuable than the sum of its parts. Our main sources of map data and knowledge about places are OpenStreetMap and Wikipedia respectively. This talk is about the challenges posed by connecting these two, and our strategies of coping with them.

Citation preview

Page 1: Fusing openstreetmap with wikipedia

Fusing OpenStreetMap with WikipediaUlmon GmbH

08/05/2014 Linuxwochen Wien

Page 2: Fusing openstreetmap with wikipedia

Hello from

08/05/2014 Linuxwochen Wien

Page 3: Fusing openstreetmap with wikipedia

Ulmon’s recipe for a travel guideFuse sources of data to create a whole more valuable than its parts

08/05/2014 Linuxwochen Wien

Page 4: Fusing openstreetmap with wikipedia

Wikipedia and OSM in CityMaps2Go

08/05/2014 Linuxwochen Wien

Page 5: Fusing openstreetmap with wikipedia

What about unmatchable WIKI?

08/05/2014 Linuxwochen Wien

Page 6: Fusing openstreetmap with wikipedia

Wikipedia tag in OpenStreetMap

08/05/2014 Linuxwochen Wien

http://taginfo.openstreetmap.org

Page 7: Fusing openstreetmap with wikipedia

Wikipedia tag statistics

Tag name Number of valueswikipedia 339,148

wikipedia:ru 30,457 wikipedia:en 16,432 wikipedia:de 13,923 wikipedia:es 4,706

404,666

Total Wikipedia entries with location:1,621,704 in 15 languages

798,965 English

08/05/2014 Linuxwochen Wien

Page 8: Fusing openstreetmap with wikipedia

The Confusion of Tongues

08/05/2014 Linuxwochen Wien

Page 9: Fusing openstreetmap with wikipedia

Multiple OSM candidates for one Wiki

08/05/2014 Linuxwochen Wien

Page 10: Fusing openstreetmap with wikipedia

Multiple fitting Wiki entries

08/05/2014 Linuxwochen Wien

Page 11: Fusing openstreetmap with wikipedia

Wiki articles with no OSM object

08/05/2014 Linuxwochen Wien

Page 12: Fusing openstreetmap with wikipedia

What data to include?

… for an offline guide

178MB!

08/05/2014 Linuxwochen Wien

Page 13: Fusing openstreetmap with wikipedia

08/05/2014 Linuxwochen Wien

Ulmon’s matching algorithm…StephansdomStröckStephansplatzStephansplatz (U3 station)Stock-im-Eisen-PlatzCafé WeinwurmDO&CO am StephansplatzHaas-HausAida…

Distance: 0.9

Name: 1.0

Type: 0.0

?

?? ?

?

?

Page 14: Fusing openstreetmap with wikipedia

Comparing Names

• Edit distance (Levenshtein distance)• Soundex• Dice coefficient

08/05/2014 Linuxwochen Wien

Page 15: Fusing openstreetmap with wikipedia

Type score

• Compare OSM tags with Dbpedia types– Manual rules– Word similarity– Future: Synonymic analysis based on

Wordnet

08/05/2014 Linuxwochen Wien

Page 16: Fusing openstreetmap with wikipedia

Decision tree

• Generated using the J48 algorithm of the Weka toolkit

• How to get learning data?– Manual creation– Parsing wikipedia tags from OSM

08/05/2014 Linuxwochen Wien

Page 17: Fusing openstreetmap with wikipedia

Ulmon’s matching performance

• Current– Total wiki entries: 810K (674K English)– Matched entries: 429K

• Future– Total wiki entries: 1.6M– Matched entries (extrapolation): 850K

08/05/2014 Linuxwochen Wien

Page 18: Fusing openstreetmap with wikipedia

Multiple OSM candidates for one Wiki

08/05/2014 Linuxwochen Wien

Page 19: Fusing openstreetmap with wikipedia

Multiple fitting Wiki entries

08/05/2014 Linuxwochen Wien

Page 20: Fusing openstreetmap with wikipedia

Open questions

• Reduce false positives– Current: 10%, desired < 3%

• Get more matching!• Reduce the amount of data

08/05/2014 Linuxwochen Wien

Page 21: Fusing openstreetmap with wikipedia

Thank you for your attention!Come visit us at www.ulmon.com

08/05/2014 Linuxwochen Wien