Upload
tu-berlin-fb-nue
View
1.127
Download
0
Embed Size (px)
Citation preview
Pascal Kelm
Communication Systems Group
Technische Universität Berlin
Thursday, 24 January 2013
www.nue.tu-berlin.de
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Overview 2
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Motivation – Where in the world is it? 3
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Example 4
http://www.flickr.com/photos/zebandrews/7414117752/in/pool-18038320@N00/
Fact: only 3% of the content in
online sharing plattforms is
available with geographic
coordinates (latitude, longitude)
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
State of the Art 5
Textual informationTags: Paris, France, twilight, grand blue, Europe,
Hasselblad, film, …
Visual information
Gazetteers- like geonames.org
Textual similarity- Finding the similarity
to a group of typonyms
Low-level features- Propagate the location
by finding a visual similar
Image
-Features: texture, color,
shape…
Local features- interesting points on the
object can be extracted to
provide a "feature
description“ of the object
- Features: SIFT, SURF
etc.
How would you estimate the location of an unknown content?
• [Pascal Kelm: “Where in the World?: The State of Automatic Geotagging of Video”, invited lecture, DGA workshop 2012]
• [Pascal Kelm et al.: “Georeferencing in Social Networks“ in Social Media Retrieval, Springer, 2012]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Relevant Research 1
2008: James Hays, Alexei A. Efros. IM2GPS: estimating geographic
information from a single image. Proceedings of the IEEE Conf. On
Computer Vision and Pattern Recognition (CVPR, „Where am I ?“)
Purely data-driven scene matching approach (over 6 million GPS-
tagged images, 5 low-level descriptors)
Visual ambiguity
Low precision, high computational cost
(cluster of 400 processors 3 days)
6
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Relevant Research 2
2009: Pavel Serdyukov, Vanessa Murdock, Roelof van Zwol: Placing
Flickr Photos on a Map. In: 32nd International ACM SIGIR
Textual annotated language model (ranking)
Geographical / textual ambiguity
High precision
High computational cost
7
Images with “palma" tag falsely mapped near
Palma de Mallorca, Spain
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Research Question
What is the limitation of an automatic algorithm?
Which feature (text, video) performs best?
Is a fusion possible to eliminate geographical ambiguity?
Do I need a CPU-cluster to estimate the location?
Low performance low precision?
Is it possible for a human to estimate the location of a
video using textual, visual and audio information?
8
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Placing Task
Organizers:
Pascal Kelm, TU Berlin
Adam Rae, Yahoo! Research
9
The task requires participants to assign
geographical coordinates to each provided
test video. Participants can make use of
metadata and audio and visual features as
well as external resources.
[Adam Rae, Pascal Kelm “Working Notes for the Placing Task at MediaEval 2012” Working Notes Proceedings (ISSN 1613-
0073) of the MediaEval 2012]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Image Distribution
Flickr Database:
3,6 million training images
10.000 trainings videos
5091 test videos
Descriptors:1. Color and Edge Directivity Descriptor
2. Gabor
3. Fuzzy Color and Texture Histogram
4. Color Histogram
5. Scalable Color
6. Auto Color Correlogram
7. Tamura
8. Edge Histogram
9. Color Layout
Metadata:
All Inforamtion about
uploader + video
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Overview Framework 11
National borders extracted from the metadata
Textual and visual features are used in a hierarchical
framework to predict the most likely location
[Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “Multimodal Geo-tagging in Social Media Websites using Hierarchical
Spatial Segmentation” Proceedings of the 20th ACM SIGSPATIAL 2012]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Collaborative Systems: Example 12
這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。
這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。…
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Geographical Ambiguity
這是我上次去巴黎。在那裡,我得到了我的城堡在迪斯尼樂園看。…
Which language is it?
Chinese
This was my last trip to Paris. I visited the castle in Disneyland…
Which words gives us information? Tags?
Trip, Paris, Castle, Disneyland
Which of these nouns have got geographical information?
Paris, Disneyland
13
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Geographical Ambiguity 14
Paris
France
Canada
Puerto Rico
…
Disneyland
China
USA
France
…
R(ci) = Rank sum
ci = Countries
N = Number of toponym
1
0
1
0
0
det
)(
...
)(
maxargN
j
mj
N
j
j
ected
cR
cR
c
• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “A Hierarchical, Multi-modal Approach for Placing Videos on the Map
using Millions of Flickr Photographs” ACM Multimedia 2011]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Overview Framework 15
National borders extracted from the metadata
Textual and visual features are used in a hierarchical
framework to predict the most likely location
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Example 16
http://www.flickr.com/photos/62285085@N00/3484324495
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Textual Region Model
Segmenting the world map into regions according to the
meridians and parallels
Stemming: reducing inflected words to their root form
17
Bream Vortex
Swimming
Ocean
Beach
Springs Vortex
Scuba Diving
Scuba Underwater
…
TextBounds Crossing, Florida, USA
Bream Vortex
Swim
Ocean
Beach
Springs Vortex
Scuba Dive
Scuba Underwat
…
Porter Stemmer
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Textual Region Model
Term-location-distribution:
Term frequency-inverse document frequency:
18
Vt
lt
lt
N
NltP
'
,'
,
1
1)|(
t
lttn
NNtfidf log
,
N
i
iltPdlP
0
)|(logmax)|(
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Textual Region Model 19
Bernoulli model:
t = Tag
C= Class / Region
Bream Vortex
Swim
Ocean
Beach
Springs Vortex
Scuba Dive
Scuba Underwat
…
Vt
ct
ct
N
NctP
'
,'
,
1
1)|(
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Visual Region Model 20
Returns the visually most similar areas, which are
represented by a mean feature vector of all training images
and videos of the respective area
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
What is meant by Spatial Segmentation?
World map is iteratively divided into segments of
different sizes
Each segment is considered as classes for our probabil-
istic model
21
• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “How Spatial Segmentation improves the Multimodal Geo-Tagging”
Working Notes Proceedings of the MediaEval 2012]
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Fusion: Example
Confidence scores of the visual approach (right)
restricted to be in the most likely spatial segment
determined by the textual approach (left)
22
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Results 23
[UNICAMP] O. A. B. Penatti, L. T. Li, J. Almeida, R. da S. Torres. A visual approach for video geocoding using bag-of-scenes. ICMR
'12
[QMUL] X. Sevillano, T. Piatrik, K. Chandramouli, Q. Zhang, E. Izquierdoy. Geo-tagging online videos using semantic expansion and
visual analysis.
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Conclusion
hierarchical approach for automatic estimation
of geo-tags in social media website
detailed analysis of textual and visual features
using different spatial granularities (national
borders detection)
fusion of textual and visual methods is
important to eliminate geographical ambiguities
reduces the computing time in the subsequent
classification step
correctly located within a radius of 10 km for
half of the test set
24
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Web demonstrator 25
http://geotagging.de.im
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Geo-Location Human Baseline Project 26
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Geo-Location Human Baseline Project 27
• [Gottlieb, Choi, Kelm, Friedland, Sikora: “Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video Geo-
Location”, in ACM Workshop on Crowdsourcing for Multimedia held in conjunction with ACM Multimedia 2012]
•[Gottlieb, Choi, Kelm, Friedland, Sikora: “On Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video
Geolocation”, in MULTIMEDIA COMMUNICATIONS TECHNICAL COMMITTEE IEEE COMMUNICATIONS SOCIETY , Vol. 8, No.
1, January 2013]
http://geotagging.de.im/game.php
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Object Detection 28
Frame 35
Frame 370
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Augmented Object Detection
OpenCV for Android
FAST
ORB
BRISK
SURF
CPU: 192 ms
GPU: 87 ms
Android: 9990 ms
29
Geo-referenced
Database
business card
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Object Detection 30
Depth Map Matching Map
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Graph-based Object Detection
Matching
31
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
DFG Proposal 32
Housebreaking
Cyber-Stalking
Cyber-Stealing
Cyber-Mobbing
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
DFG Proposal: Geo-Privacy 33
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Question 34
Thanks for your attention.
Dipl.- Ing. Pascal Kelm
Communication Systems Group
Technische Universität Berlin
Sekr. EN1, Einsteinufer 17
10587 Berlin, Germany
E-mail: [email protected]
Telefon: (+49) 30 / 314 28504
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
DFG: Geo-Tagging 35
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Spatial Segmentation 36
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Twitter-based Placing Sub-Task (New York) 37
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Spatial Segmentation 38
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Extracted geo. items
00001: hawaii, kauai, usa
39
hawaii
usa
kauii
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Textual Features + Naive Bayes 40
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Visual Features
What will you do if you do not have any textual information?
41
Kelm: “Where in the World?: The State of Automatic Geotagging of Video”
Pic1Pic3Pic2Region
2
Fusion 42
Region
1
Region
2
Region
3
Region
4
Region
5
Region
6
Region
7
Region
8
Region
N
Region
1
Region
2
Region
3
Region
4
Region
5
Region
6
Region
7
Region
8
Region
N…
…
Textual Region Model
Visual Region Model
Geographical Boundaries Extraction
Region
3
Region
4
Region
5
Region
6
Ranking