Workshop on Social Events in Web Multimedia, ICMR 2014
Social Event Detection at MediaEval:
a three-year retrospect of tasks and results
Georgios Petkos, Symeon Papadopoulos, Vasileios Mezaris, Raphael Troncy, Philipp Cimiano, Timo Reuter, Yiannis Kompatsiaris
ICMR 2014, SEWM Workshop Vasileios Mezaris#2
Overview
• The problem of social event detection.• The social event detection task.• Evolution of the task and datasets.• Overview of approaches pursued by participants and results.• Outlook.
ICMR 2014, SEWM Workshop Vasileios Mezaris#3
entertainment
personal
news
wedding / birthday / drinks
concert / play / sports
demonstration / riot / speech
Social events?
Attended by people and represented by multimedia content shared online
ICMR 2014, SEWM Workshop Vasileios Mezaris#4
Pope Francis
Pope Benedict
2007: iPhone release
2008: Android release
2010: iPad release
http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
ICMR 2014, SEWM Workshop Vasileios Mezaris
Social event detection
Social event detection involves the discovery and retrieval of social events in collections of multimedia.
COLLECTION
SOCIALEVENT DETECTION
EVENT SET
E1
E2
EN
#5
ICMR 2014, SEWM Workshop Vasileios Mezaris
The social event detection task
• As part of the well-known MediaEval benchmarking activity, a task on social event detection has been running for 3 years (2011-2013).
• The interest on the task has grown significantly:– 2011: 7 participants– 2012: 5 participants– 2013: 11 participants
#6
ICMR 2014, SEWM Workshop Vasileios Mezaris
SED 2011: The task
• Two challenges were defined in 2011.
• In both, participants were provided with a set of images collected from Flickr and were asked to surface events of a particular type at particular locations:
1. Soccer matches in Barcelona and Rome.2. Concerts in Paradiso and Parc Del Forum.
• Two differences between the two challenges:1. In the first, both a topical and a location criterion are defined,
whereas in the second only a location criterion is defined.2. The specificity of the location of interest is different.
#7
ICMR 2014, SEWM Workshop Vasileios Mezaris
SED 2011: The dataset, ground truth and evaluation• 73,645 photos collected from Flickr.
• All photos were originally geo-tagged and were taken at 5 different cities in May 2009 (geo-tags were removed for 80% of the pictures in the provided dataset).
• The ground truth was generated by utilizing machine tags provided by event directories as well as an automatic cluster-based framework.
• Evaluation measures:1. F-score2. NMI
#8
ICMR 2014, SEWM Workshop Vasileios Mezaris
SED 2012: The task• Three challenges, similar to those of the first year, were
defined in 2012.
• Again, participants were provided with a set of images collected from Flickr and were asked to surface events of a particular type at particular locations:
1. Technical events (e.g. exhibitions and fairs) that took place in Germany.
2. Soccer events in Hamburg and Madrid.3. Demonstration and protest events of the Indignados movement in
Madrid.
• Characteristic of the challenges:1. Theme and location of queries quite different.2. Notion of “technical events” somewhat fuzzy.3. Indignados events are spontaneously organized.
#9
ICMR 2014, SEWM Workshop Vasileios Mezaris
SED 2012: The dataset, ground truth and evaluation• 167,332 photos collected from Flickr.
• Again, all photos were originally geo-tagged but geo-tags were removed for 80% of the pictures in the provided dataset.
• The ground truth was again generated by utilizing machine tags provided by event directories as well as an automatic cluster-based framework.
• Evaluation measures:1. F-score2. NMI
#10
ICMR 2014, SEWM Workshop Vasileios Mezaris
SED 2013: The task
Two completely new challenges were defined:
1. Produce a complete clustering of the image dataset according to events.– Extension: Assign a set of videos to the clusters generated from
the image dataset.
2. Classify event media as either representing a social event or not and for those that do represent a social event identify the type of event (eight event types were defined).
#11
ICMR 2014, SEWM Workshop Vasileios Mezaris
SED 2013: The dataset, ground truth and evaluation
Separate dataset, ground truth and evaluation for each challenge:
• Challenge 1:– Dataset: 427,370 pictures from Flickr and 1,327 videos from
YouTube corresponding to 21,169 events.– Ground truth: obtained from last.fm and upcoming machine tags.– Evaluation: F-score and NMI.
• Challenge 2:– Dataset: 27,754 training images and 29,411 test images collected
from Instagram.– Ground truth: obtained by manual annotation.– Evaluation: NMI.
#12
ICMR 2014, SEWM Workshop Vasileios Mezaris
Evolution of the task
Two distinct eras of the task:
1. First two years. Datasets contained both event and non-event images and the task was to retrieve sets of images matching these criteria.
2. Third year. Broken into two subtasks:1. Full clustering. 2. Detection of event type in individual images.(no filtering subtask though)
Also, datasets have become larger and richer.
#13
ICMR 2014, SEWM Workshop Vasileios Mezaris
Approaches: First era, 2011-2012 (1/4)
At a very high level there are two types of approaches:
1. Matching images to event descriptions retrieved from online event directories.
2. Applying a sequence of filtering or classification and clustering steps on the datasets.
#14
ICMR 2014, SEWM Workshop Vasileios Mezaris
Approaches: First era, 2011-2012 (2/4)
• Methods in the first class differ in the way that matching is carried out:
– Indexing and querying in Lucene.– Probabilistic matching.
• Methods in the second class are much more popular and:– Some utilize external sources, e.g. DBPedia or the Google geocoding
API to enrich the matching criteria.– For most of them time and location (sometimes inferred by the
textual metadata when geo-tags are not available) are the primary criteria for clustering.
– Alternatively, some approaches treat the problem as a multimodal clustering problem utilizing a learned similarity metric
#15
ICMR 2014, SEWM Workshop Vasileios Mezaris
Approaches: First era, 2011-2012 (3/4)
– For challenge 1 best approach performs early classification of images into cities and then groups images into buckets containing same day and city photos.
– For challenge 2, two matching-based approaches achieved the best results (most likely because the type of event makes it more likely to find relevant info in online directories).
#16
2011 Challenge 1 Challenge 2
F-score NMI F-score NMI
Brenner et al. 68.70 0.410 33.00 0.500
Hintsa et al. - - 68.67 0.678
Liu et al. 59.13 0.247 68.95 0.617
Nguyen et al. 10.13 0.026 12.44 -0.01
Papadopoulos et al. 77.37 0.630 64.00 0.379
Ruocco et al. 58.65 0.475 66.05 0.644
Wang et. al 64.90 0.236 50.44 0.448
ICMR 2014, SEWM Workshop Vasileios Mezaris
Approaches: First era, 2011-2012 (4/4)
The best approach by Vavliakis et al. involves the following steps:1. City classification.2. For the images of each city, topic modeling using LDA is performed.3. The topic model is used to match the photos that are relevant to the queries.4. Events are identified by finding for each topic and city of interest the days for which
there a number of images above some threshold.
#17
2012 Challenge 1 Challenge 2 Challenge 2
F-score NMI F-score NMI F-score NMI
Zeppelzauer et al. 2.15 0.020 29.99 0.200 47.58 0.310
Vavliakis et al. 84.58 0.724 90.76 0.850 89.83 0.738
Schinas et al. 18.66 0.187 74.64 0.674 66.87 0.465
Brenner et al. - - 72.66 0.65 - -
Dao et al. 70.15 0.601 - - 60.96 0.446
ICMR 2014, SEWM Workshop Vasileios Mezaris
Approaches: Second era, 2013 (1/4)
For the first challenge, there are two main types of approaches:1. Sequence of unimodal clustering operations.2. Multimodal clustering using a learned similarity measure.
However, there are also some rather distinct approaches, e.g.:1. An approach that applies a Chinese Restaurant Process to perform a
stochastic clustering of images.2. An approach that utilizes WordNet to compute appropriate semantic
similarity measures.
#18
ICMR 2014, SEWM Workshop Vasileios Mezaris
Approaches: Second era, 2013 (2/4)
• Results are better that in the 2 previous years (probably because a filtering step is not required)
• The best approach computes one affinity matrix per modality, averages them and uses the average for clustering as part of a DBScan or spectral clustering procedure.
#19
2013 – Challenge 1F-score NMI
Rafailidis et al. 0.570 0.873
Samangooei et al. 0.946 0.985
Schinas et al. 0.704 0.910
Vizuete et al. 0.883 0.973
Nguyen et al. 0.932 0.984
Zeppelzauer et al. 0.780 0.940
Sutanto et al. 0.812 0.954
Wistuba et al. 0.878 0.965
Papaoikonomou et al. 0.236 0.664
Gupta et al. 0.142 0.180
Brenner et al. 0.780 0.712
ICMR 2014, SEWM Workshop Vasileios Mezaris
Approaches: Second era, 2013 (3/4)
• For the second challenge, all approaches adopt a classification procedure.
• They differ in the set of features that they utilize. For instance:– One approach utilizes scalable Laplacian Eigenmaps to obtain in a semi-
supervised manner, an appropriate representation of the images.– Another approach used semantic similarity features based on WordNet.
#20
ICMR 2014, SEWM Workshop Vasileios Mezaris
Approaches: Second era, 2013 (4/4)
The best performing approach uses an SVM classifier and a very rich set of textual features, including a set of ontological features (visual features are not used).
#21
2013 – Challenge 2F-Score (per category) F-Score (Event/Non-event)
Schinas 0.334 0.716
Nguyen 0.449 0.854
Sutanto 0.131 0.537
Brenner 0.332 0.721
ICMR 2014, SEWM Workshop Vasileios Mezaris
Outlook for the SED task• Remarkable number of participants in the last year and
appearance of quite novel approaches.
• The SED task is organized in 2014 as well! Three challenges this year:
– Full clustering.– Retrieval / filtering.– Summarization / labelling of events.
• Registration opens soon!
#22
ICMR 2014, SEWM Workshop Vasileios Mezaris
Outlook for the problem of social event detection• We haven’t seen any approach for dealing with the problem of
social event detection “into the wild”:– Examined image collections so far had a high ratio of event to non-
event photos; the application to a random collection of images would most likely produce poor results.
– Classification of images as event or non-event related is important for dealing with the more general scenario.
– Additionally, accurate event/non-event classifiers may assist for obtaining more focused crawling mechanisms.
• Combination of agnostic approaches (such as clustering) and approaches that utilize event directories.
• More extensive usage of visual content, rather than mostly of metadata.
#23
ICMR 2014, SEWM Workshop Vasileios Mezaris
Acknowledgments
#24