Kelm überblick 2013

Pascal Kelm

[email protected]

Communication Systems Group

Technische Universität Berlin

Thursday, 24 January 2013

www.nue.tu-berlin.de

Kelm: “Where in the World?: The State of Automatic Geotagging of Video”

Overview 2


Motivation – Where in the world is it? 3


Example 4

http://www.flickr.com/photos/zebandrews/7414117752/in/pool-18038320@N00/

Fact: only 3% of the content in

online sharing plattforms is

available with geographic

coordinates (latitude, longitude)










State of the Art 5

Textual informationTags: Paris, France, twilight, grand blue, Europe,

Hasselblad, film, …

Visual information

Gazetteers- like geonames.org

Textual similarity- Finding the similarity

to a group of typonyms

Low-level features- Propagate the location

by finding a visual similar

Image

-Features: texture, color,

shape…

Local features- interesting points on the

object can be extracted to

provide a "feature

description“ of the object

- Features: SIFT, SURF

etc.

How would you estimate the location of an unknown content?

• [Pascal Kelm: “Where in the World?: The State of Automatic Geotagging of Video”, invited lecture, DGA workshop 2012]

• [Pascal Kelm et al.: “Georeferencing in Social Networks“ in Social Media Retrieval, Springer, 2012]


Relevant Research 1

2008: James Hays, Alexei A. Efros. IM2GPS: estimating geographic

information from a single image. Proceedings of the IEEE Conf. On

Computer Vision and Pattern Recognition (CVPR, „Where am I ?“)

Purely data-driven scene matching approach (over 6 million GPS-

tagged images, 5 low-level descriptors)

Visual ambiguity

Low precision, high computational cost

(cluster of 400 processors 3 days)

6


Relevant Research 2

2009: Pavel Serdyukov, Vanessa Murdock, Roelof van Zwol: Placing

Flickr Photos on a Map. In: 32nd International ACM SIGIR

Textual annotated language model (ranking)

Geographical / textual ambiguity

High precision

High computational cost

7

Images with “palma" tag falsely mapped near

Palma de Mallorca, Spain


Research Question

What is the limitation of an automatic algorithm?

Which feature (text, video) performs best?

Is a fusion possible to eliminate geographical ambiguity?

Do I need a CPU-cluster to estimate the location?

Low performance low precision?

Is it possible for a human to estimate the location of a

video using textual, visual and audio information?

8


Placing Task

Organizers:

Pascal Kelm, TU Berlin

Adam Rae, Yahoo! Research

9

The task requires participants to assign

geographical coordinates to each provided

test video. Participants can make use of

metadata and audio and visual features as

well as external resources.

[Adam Rae, Pascal Kelm “Working Notes for the Placing Task at MediaEval 2012” Working Notes Proceedings (ISSN 1613-

0073) of the MediaEval 2012]


Image Distribution

Flickr Database:

3,6 million training images

10.000 trainings videos

5091 test videos

Descriptors:1. Color and Edge Directivity Descriptor

2. Gabor

3. Fuzzy Color and Texture Histogram

4. Color Histogram

5. Scalable Color

6. Auto Color Correlogram

7. Tamura

8. Edge Histogram

9. Color Layout

Metadata:

All Inforamtion about

uploader + video


Overview Framework 11

National borders extracted from the metadata

Textual and visual features are used in a hierarchical

framework to predict the most likely location

[Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “Multimodal Geo-tagging in Social Media Websites using Hierarchical

Spatial Segmentation” Proceedings of the 20th ACM SIGSPATIAL 2012]


Collaborative Systems: Example 12

這是我上次去巴黎。在那裡，我得到了我的城堡在迪斯尼樂園看。

這是我上次去巴黎。在那裡，我得到了我的城堡在迪斯尼樂園看。…


Geographical Ambiguity

這是我上次去巴黎。在那裡，我得到了我的城堡在迪斯尼樂園看。…

Which language is it?

Chinese

This was my last trip to Paris. I visited the castle in Disneyland…

Which words gives us information? Tags?

Trip, Paris, Castle, Disneyland

Which of these nouns have got geographical information?

Paris, Disneyland

13


Geographical Ambiguity 14

Paris

France

Canada

Puerto Rico

…

Disneyland

China

USA

France

…

R(ci) = Rank sum

ci = Countries

N = Number of toponym

1

0

1

0

0

det

)(

...

)(

maxargN

j

mj

N

j

j

ected

cR

cR

c

• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “A Hierarchical, Multi-modal Approach for Placing Videos on the Map

using Millions of Flickr Photographs” ACM Multimedia 2011]


Overview Framework 15

National borders extracted from the metadata

Textual and visual features are used in a hierarchical

framework to predict the most likely location


Example 16

http://www.flickr.com/photos/62285085@N00/3484324495








Textual Region Model

Segmenting the world map into regions according to the

meridians and parallels

Stemming: reducing inflected words to their root form

17

Bream Vortex

Swimming

Ocean

Beach

Springs Vortex

Scuba Diving

Scuba Underwater

…

TextBounds Crossing, Florida, USA

Bream Vortex

Swim

Ocean

Beach

Springs Vortex

Scuba Dive

Scuba Underwat

…

Porter Stemmer



Term-location-distribution:

Term frequency-inverse document frequency:

18

Vt

lt

lt

N

NltP

'

,'

,

1

1)|(

t

lttn

NNtfidf log

,

N

i

iltPdlP

0

)|(logmax)|(


Textual Region Model 19

Bernoulli model:

t = Tag

C= Class / Region

Bream Vortex

Swim

Ocean

Beach

Springs Vortex

Scuba Dive

Scuba Underwat

…

Vt

ct

ct

N

NctP

'

,'

,

1

1)|(


Visual Region Model 20

Returns the visually most similar areas, which are

represented by a mean feature vector of all training images

and videos of the respective area


What is meant by Spatial Segmentation?

World map is iteratively divided into segments of

different sizes

Each segment is considered as classes for our probabil-

istic model

21

• [Pascal Kelm, Sebastian Schmiedeke, Thomas Sikora “How Spatial Segmentation improves the Multimodal Geo-Tagging”

Working Notes Proceedings of the MediaEval 2012]


Fusion: Example

Confidence scores of the visual approach (right)

restricted to be in the most likely spatial segment

determined by the textual approach (left)

22


Results 23

[UNICAMP] O. A. B. Penatti, L. T. Li, J. Almeida, R. da S. Torres. A visual approach for video geocoding using bag-of-scenes. ICMR

'12

[QMUL] X. Sevillano, T. Piatrik, K. Chandramouli, Q. Zhang, E. Izquierdoy. Geo-tagging online videos using semantic expansion and

visual analysis.


Conclusion

hierarchical approach for automatic estimation

of geo-tags in social media website

detailed analysis of textual and visual features

using different spatial granularities (national

borders detection)

fusion of textual and visual methods is

important to eliminate geographical ambiguities

reduces the computing time in the subsequent

classification step

correctly located within a radius of 10 km for

half of the test set

24


Web demonstrator 25

http://geotagging.de.im


Geo-Location Human Baseline Project 26


Geo-Location Human Baseline Project 27

• [Gottlieb, Choi, Kelm, Friedland, Sikora: “Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video Geo-

Location”, in ACM Workshop on Crowdsourcing for Multimedia held in conjunction with ACM Multimedia 2012]

•[Gottlieb, Choi, Kelm, Friedland, Sikora: “On Pushing the Limits of Mechanical Turk: Qualifying the Crowd for Video

Geolocation”, in MULTIMEDIA COMMUNICATIONS TECHNICAL COMMITTEE IEEE COMMUNICATIONS SOCIETY , Vol. 8, No.

1, January 2013]

http://geotagging.de.im/game.php


Object Detection 28

Frame 35

Frame 370


Augmented Object Detection

OpenCV for Android

FAST

ORB

BRISK

SURF

CPU: 192 ms

GPU: 87 ms

Android: 9990 ms

29

Geo-referenced

Database

business card


Object Detection 30

Depth Map Matching Map


Graph-based Object Detection

Matching

31


DFG Proposal 32

Housebreaking

Cyber-Stalking

Cyber-Stealing

Cyber-Mobbing


DFG Proposal: Geo-Privacy 33


Question 34

Thanks for your attention.

Dipl.- Ing. Pascal Kelm

Communication Systems Group

Technische Universität Berlin

Sekr. EN1, Einsteinufer 17

10587 Berlin, Germany

E-mail: [email protected]

Telefon: (+49) 30 / 314 28504

mailto:[email protected]




DFG: Geo-Tagging 35


Spatial Segmentation 36


Twitter-based Placing Sub-Task (New York) 37


Spatial Segmentation 38


Extracted geo. items

00001: hawaii, kauai, usa

39

hawaii

usa

kauii


Textual Features + Naive Bayes 40


Visual Features

What will you do if you do not have any textual information?

41


Pic1Pic3Pic2Region

2

Fusion 42

Region

1

Region

2

Region

3

Region

4

Region

5

Region

6

Region

7

Region

8

Region

N

Region

1

Region

2

Region

3

Region

4

Region

5

Region

6

Region

7

Region

8

Region

N…

…


Visual Region Model

Geographical Boundaries Extraction

Region

3

Region

4

Region

5

Region

6

Ranking

Documents

Kelm überblick 2013