CERTH @ MediaEval 2014 Social Event Detection Task

CERTH @ MediaEval 2014

Social Event Detection Task

Marina Riga, Georgios Petkos, Symeon Papadopoulos

Emmanouil Schinas, Yiannis Kompatsiaris

Information Technologies Institute / CERTH

#2

1st subtask: Approach (1/2)• Same event model predicts if a pair of images belong to the same cluster (based on set of per-modality similarities)• Organize images in graph according to the predictions of the same event model• Select candidate neighbours using appropriate indices for scalability• Cluster the graph using a community detection algorithm (SCAN) to obtain the clusters

#3

1st subtask: Approach (2/2)

• Key observation about errors of the same event model: False positive predictions of the same event model are much more important for the task than false negatives

• Tune the same event model (by adapting the threshold) so that we obtain a higher true positives rate at the cost of a somewhat lower true negatives rate

#4

1st subtask: Experimental details

• Features used as input to the same event model:o User o Textualo Temporalo Locationo Visual features: SURF+VLAD , Overfeat

• For each of them we create appropriate indices and use them for candidate neighbour selection

• For the same event model we found that if we changed the classification threshold from 0.5 to 0.995 the true positives rate was almost perfect (0.9999) and the true negatives rate was around 0.95.

#5

1st subtask: Runs and results

RunVisual

FeaturesCandidate Neighbours

SelectionModel

ThresholdF1 NMI

1 No Blocking (500) Regular 0.4514 0.7594

2 Yes Blocking (500) Regular 0.4515 0.7592

3 Yes Blocking (500) Adapted 0.8312 0.9627

4 YesBlocking (500)

+ 2 step neighborhoodAdapted 0.9133 0.9808

5 Yes No blocking Adapted 0.9161 0.9818

#6

2nd subtask: Approach

• Fetch images from Flickr using keywords that appear in the queries. Also, we fetch a large set of general images that are not related to some specific query

• Use them to build language models

• Perform classification of events using the language models

• We perform the classification of an event i (or image) in two alternative ways:1. pspecific(i) / pgeneral (i) > θ2. Pspecific,general(i) / pgeneral (i) > θ

• For location criteria, we use the explicit geolocation information, if it exists. Otherwise, we utilize a grid-based approach that is building a distinct language model for each cell and simply pick the most likely one [1]

• Eventually, we use the classified events to retrieve those that match the queries’ criteria

[1] A. Popescu. CEA list’s participation at MediaEval 2013 Placing Task.

#7

2nd subtask: Runs and results

Run Threshold Classification model F1 Precision Recall

1 1 1 (per cluster) 0.3431 0.3458 0.6101

2 1 2 (per cluster) 0.2723 0.2669 0.7505

3 Set using dev. set 1 (per cluster) 0.4043 0.4120 0.5556

4 Set using dev. set 2 (per cluster) 0.4604 0.7080 0.3915

5 Set using dev. set 1 (per item) 0.2806 0.3569 0.3798

Considering only queries that involve location criteria we achieved an F1 of 0.6331

#8

Thank you!

Questions?

Comments?

Suggestions?

Acknowledgements:

Software

CERTH @ MediaEval 2014 Social Event Detection Task