MediaEval 2016 - RECOD at Placing Task

RECOD @ Placing Task of MediaEval 2016

A Ranking Fusion Approach for Geographic-Location Prediction of

Multimedia Objects

Javier A. V. Munoz1, Lin Tzy Li1, Ícaro C. Dourado1, Keiller Nogueira5, Samuel G. Fadel1, Otávio A. B. Penatti1,2, Jurandy Almeida1,3, Luís A. M. Pereira1, Rodrigo T. Calumby1,4,

Jefersson A. dos Santos5, and Ricardo da S. Torres1

1RECOD Lab, Institute of Computing, University of Campinas (UNICAMP), Campinas – SP2Advanced Technologies, SAMSUNG Research Institute Brazil, Campinas – SP3Inst. of Science and Technology, Federal University of São Paulo (UNIFESP), S. J. dos Campos – SP4Dept. of Exact Sciences, University of Feira de Santana (UEFS), Feira de Santana – BA5Dept. of Computer Science, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte - MG

Brazil

GP-Agg

General Pipeline

Feature Extractor 1 Ranked List 1

Ranking

Feature Extractor 2 Ranked List 2

Feature Extractor m Ranked List m

RLDA

Rank Fusion

ID1 LAT1 LONG1ID2 LAT2 LONG2… … ...IDX LATX LONGX

Final Ranked List

Ranked list 1~m

ID1 LAT1 LONG1ID2 LAT2 LONG2… … ...IDX LATX LONGX

Top X items' lat/long defined as points of a OPF cluster

v: (lat1,long1)

edge (v, u): connect k-nn d(v, u)

d(v, u): geo-distance between v and u

k=3

u: (lat2,long2)

RLDA: Ranked List Density Analysis Based on OPF (Optimum-Path Forest): find the maximum point in a probability density function

GP-Agg Framework

Training Phase

● RLDA is included in set of rank aggregation functions● RLDA is the unique function that uses geographical location in the combination

GP-Agg Framework

Testing Phase

GP-Agg parameters

Parameter Value

Number of generations

20

Genetics operators

Reproduction, Mutation, Crossover

Fitness functions

FFP1, WAS

Rank Agg. methods

CombMAX, CombMIN, CombSUM, CombMED, CombANZ, CombMNZ, RLSim, BordaCount, RRF, MRA, RLDA

GP individual example:

Fitness Functions

FFP1=∑i=1

n

ak r ( l i)×k i×ln−1(i+k 2)

● is the relevance assigned to an element● An element in a ranked list is considered relevant if it is in a range of 1Km● k

1, k

2 are scaling factors, k

1 = 6 and k

2 = 1.2

r ( l i)∈{0,1}

; score (i)=1−log (1+d (i))log(1+Rmax)

WAS=∑i=1

n

score (i)

n

● Rmax

: the maximum distance between any two points on the Earth surface● d(i): geographic distance between the predicted location and the ground truth

Features

Descriptors

Textual metadata

BM25*, TF-IDF*, IBS*, LMD*

Photos edgehistogram (EHD), scalable-color (SCD), GIST, cedd, col, jhist, tamura, BIC, GoogleNet

Videos HMP

* All from Lucene package (http://lucene.apache.org/core/)

Submissions configuration

Type Photos Videos

Run 1 Textual GP-Agg(all desc.) GP-Agg(all desc.)

Run 2 Non-textual GP-Agg(all desc.) RLDA(HMP)

Run 3 All GP-Agg(all desc.) GP-Agg(all desc.)

Run 4 Free RLDA(IBS, BM25, LMD, TF-IDF) LMD

Submissions configuration

Type Photos Videos

Run 1 Textual GP-Agg(all desc.) GP-Agg(all desc.)

Run 2 Non-textual GP-Agg(all desc.) RLDA(HMP)

Run 3 All GP-Agg(all desc.) GP-Agg(all desc.)

Run 4 Free RLDA(IBS, BM25, LMD, TF-IDF) LMD

Results on Photos (%)

Run 1

Run 2

Run 3

Run 4

0 10 20 30 40 50 60 70

0.59

0.09

0.56

0.56

6.07

0.87

5.97

5.94

21.06

2.36

20.83

20.73

38

4.47

37.72

37.47

59.69

21.46

59.89

59.28

10m

100m

1km

10km

100km

1000km

Results on Videos (%)

Run 1

Run 2

Run 3

Run 4

0 10 20 30 40 50 60

0.45

0

0.51

0.37

5.74

0.03

5.82

4.03

18.69

0.15

18.46

13.51

41.56

2.46

41.2

33.02

54.51

13.54

54.77

47.67

10m

100m

1km

10km

100km

1000km

Results (error)

Photos (km) Videos (km)

Avg Median Avg Median

Run 1 2872 254.67 3204.80 566.96

Run 2 5664.14 5821.07 6085.23 6085.63

Run 3 2797.58 261.66 3123.38 571.24

Run 4 2919.30 279.37 3739.79 1236.51

Conclusions

● GP-Agg automatically discover semi-optimal combinations of ranked lists and aggregation functions

● Including RLDA into GP-Agg improved results over best configuration submitted in past year

● Use of CNN-based features for photos improved significantly results over past year in non-textual submission

Future Work

● Use more CNN-based features for visual content in photos and videos– PlaNet¹-inspired fine tuning

¹Tobias Weyand, Ilya Kostrikov, James Philbin, PlaNet - Photo Geolocation with Convolutional Neural Networks, ECCV 2016.

Acknowledgments

● FAPESP● CNPq ● CAPES● Samsung● MediaEval 2016