Upload
multimediaeval
View
124
Download
2
Embed Size (px)
Citation preview
RECOD @ Placing Task of MediaEval 2016
A Ranking Fusion Approach for Geographic-Location Prediction of
Multimedia Objects
Javier A. V. Munoz1, Lin Tzy Li1, Ícaro C. Dourado1, Keiller Nogueira5, Samuel G. Fadel1, Otávio A. B. Penatti1,2, Jurandy Almeida1,3, Luís A. M. Pereira1, Rodrigo T. Calumby1,4,
Jefersson A. dos Santos5, and Ricardo da S. Torres1
1RECOD Lab, Institute of Computing, University of Campinas (UNICAMP), Campinas – SP2Advanced Technologies, SAMSUNG Research Institute Brazil, Campinas – SP3Inst. of Science and Technology, Federal University of São Paulo (UNIFESP), S. J. dos Campos – SP4Dept. of Exact Sciences, University of Feira de Santana (UEFS), Feira de Santana – BA5Dept. of Computer Science, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte - MG
Brazil
GP-Agg
General Pipeline
Feature Extractor 1 Ranked List 1
Ranking
Feature Extractor 2 Ranked List 2
Feature Extractor m Ranked List m
RLDA
Rank Fusion
ID1 LAT1 LONG1ID2 LAT2 LONG2… … ...IDX LATX LONGX
Final Ranked List
Ranked list 1~m
ID1 LAT1 LONG1ID2 LAT2 LONG2… … ...IDX LATX LONGX
Top X items' lat/long defined as points of a OPF cluster
v: (lat1,long1)
edge (v, u): connect k-nn d(v, u)
d(v, u): geo-distance between v and u
k=3
u: (lat2,long2)
RLDA: Ranked List Density Analysis Based on OPF (Optimum-Path Forest): find the maximum point in a probability density function
GP-Agg Framework
Training Phase
● RLDA is included in set of rank aggregation functions● RLDA is the unique function that uses geographical location in the combination
GP-Agg Framework
Testing Phase
GP-Agg parameters
Parameter Value
Number of generations
20
Genetics operators
Reproduction, Mutation, Crossover
Fitness functions
FFP1, WAS
Rank Agg. methods
CombMAX, CombMIN, CombSUM, CombMED, CombANZ, CombMNZ, RLSim, BordaCount, RRF, MRA, RLDA
GP individual example:
Fitness Functions
FFP1=∑i=1
n
ak r ( l i)×k i×ln−1(i+k 2)
● is the relevance assigned to an element● An element in a ranked list is considered relevant if it is in a range of 1Km● k
1, k
2 are scaling factors, k
1 = 6 and k
2 = 1.2
r ( l i)∈{0,1}
; score (i)=1−log (1+d (i))log(1+Rmax)
WAS=∑i=1
n
score (i)
n
● Rmax
: the maximum distance between any two points on the Earth surface● d(i): geographic distance between the predicted location and the ground truth
Features
Descriptors
Textual metadata
BM25*, TF-IDF*, IBS*, LMD*
Photos edgehistogram (EHD), scalable-color (SCD), GIST, cedd, col, jhist, tamura, BIC, GoogleNet
Videos HMP
* All from Lucene package (http://lucene.apache.org/core/)
Submissions configuration
Type Photos Videos
Run 1 Textual GP-Agg(all desc.) GP-Agg(all desc.)
Run 2 Non-textual GP-Agg(all desc.) RLDA(HMP)
Run 3 All GP-Agg(all desc.) GP-Agg(all desc.)
Run 4 Free RLDA(IBS, BM25, LMD, TF-IDF) LMD
Submissions configuration
Type Photos Videos
Run 1 Textual GP-Agg(all desc.) GP-Agg(all desc.)
Run 2 Non-textual GP-Agg(all desc.) RLDA(HMP)
Run 3 All GP-Agg(all desc.) GP-Agg(all desc.)
Run 4 Free RLDA(IBS, BM25, LMD, TF-IDF) LMD
Results on Photos (%)
Run 1
Run 2
Run 3
Run 4
0 10 20 30 40 50 60 70
0.59
0.09
0.56
0.56
6.07
0.87
5.97
5.94
21.06
2.36
20.83
20.73
38
4.47
37.72
37.47
59.69
21.46
59.89
59.28
10m
100m
1km
10km
100km
1000km
Results on Videos (%)
Run 1
Run 2
Run 3
Run 4
0 10 20 30 40 50 60
0.45
0
0.51
0.37
5.74
0.03
5.82
4.03
18.69
0.15
18.46
13.51
41.56
2.46
41.2
33.02
54.51
13.54
54.77
47.67
10m
100m
1km
10km
100km
1000km
Results (error)
Photos (km) Videos (km)
Avg Median Avg Median
Run 1 2872 254.67 3204.80 566.96
Run 2 5664.14 5821.07 6085.23 6085.63
Run 3 2797.58 261.66 3123.38 571.24
Run 4 2919.30 279.37 3739.79 1236.51
Conclusions
● GP-Agg automatically discover semi-optimal combinations of ranked lists and aggregation functions
● Including RLDA into GP-Agg improved results over best configuration submitted in past year
● Use of CNN-based features for photos improved significantly results over past year in non-textual submission
Future Work
● Use more CNN-based features for visual content in photos and videos– PlaNet¹-inspired fine tuning
¹Tobias Weyand, Ilya Kostrikov, James Philbin, PlaNet - Photo Geolocation with Convolutional Neural Networks, ECCV 2016.
Acknowledgments
● FAPESP● CNPq ● CAPES● Samsung● MediaEval 2016