View
506
Download
3
Category
Tags:
Preview:
DESCRIPTION
Presentation of the SED2012 dataset @ MMSys 2013, Oslo, Norway
Citation preview
The 2012 Social Event Detection DatasetSymeon Papadopoulos1, Emmanouil Schinas1, Vasileios Mezaris1, Raphaël Troncy2, Yiannis Kompatsiaris1
1 CERTH-ITI, Thessaloniki, Greece2 EURECOM, Sophia Antipolis, France
Oslo, 28 Feb - 1 Mar 2013
2
SED2012 Overview
• Large collection (>160K) of CC-licensed Flickr photos and some of their metadata
• Event annotations for 149 target events (of specific categories and locations of interest)
• Primary use: Social event detection– Used in the context of MediaEval 2012 (SED task)
• Secondary uses: image geotagging, distractors in CBIR, city summarization
3
Dataset Overview
Flickr photo collection• 167,332 photos• 4,422 unique contributors• Creative Commons licenses
Event Annotations• Challenge 1: Technical events in Germany• Challenge 2: Soccer events in Hamburg and Madrid• Challenge 3: Indignados movement events in Madrid
4
Data Collection Process
• Flickr API: http://www.flickr.com/services/api/• Used method flickr.photo.search with five
geographical centres: Barcelona, Cologne, Hamburg, Hannover, Madrid
• Time period: Jan 2009 – Dec 2011• All photos CC licensed• 403 photos from the
EventMedia collectionR. Troncy, B. Malocha, and A. Fialho. Linking Events with Media. In 6th Intern. Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 2010
5
Photo Distribution
Place distribution
Yearly distribution
Language distribution
6
Dataset Collection MotivationSelection of five cities (three German, two Spanish):• Include large number of non-English text metadata (cf.
language distribution table)• Ensure existence of numerous events for the target types • Include distractor images:
– Challenge 2: Cologne, Hannover distractor for Hamburg, Barcelona distractor for Madrid
– Challenge 3: Barcelona distractor for Madrid
Selection of only geotagged photos:• Ease of annotation
Selection of only CC-licensed photos:• Reuse of collection for research
7
Tag Statistics (1/2)
51,611 unique tags
prevalence of location specific tags
event-specific tags
number of users using the tag
8
Tag Statistics (2/2)
barcelonaspain
madrid>20K photos have no tags
83.9% less than or equal to 10 tags >40K tags appear less than 10 times
>57% of tags appear once or twice
9
User Statistics
30 most active users contribute ~30% of dataset
60% of users less than 10 photos
10
Ground Truth Creation• Manual annotations by use of CrEve
– web-based annotation– two-round annotation by five annotators (three in the
first, two in the second)– interactive annotation (search & annotate)– each round terminated as soon as no new event-related
photos discovered– approximate effort: 100 person-hours
• Annotations for Challenge 1 enriched by EventMedia (403 photos featuring technical events in Germany)
C. Zigkolis, S. Papadopoulos, G. Filippou, Y. Kompatsiaris, A. Vakali. Collaborative Event Annotation in Tagged Photo Collections. Multimedia Tools & Applications, 2012
11
Ground Truth Statistics (1/3)
10 events related with >100 photos
~27% of events associated with 1 or 2 photos
12
Ground Truth Statistics (2/3)106 events are captured by single users
9 events captured by more than 10 people
erroneous timestamps in photos
The majority of events last for less than a day (typical for soccer)
13
Ground Truth Statistics (3/3)Madrid events
Vicente Calderon stadium
Puerta del SolSantiago Bernabeu stadium
Stadium of Butarque
14
Technical Event ExamplesPHP Unconf. 2010 Gamescom 2009
CeBIT 2010 Convention Camp 2011
15
Soccer Event ExamplesReal Madrid – Milan (2010) World Cup 2010
St. Pauli – HSV (2010) Spain – Colombia (2011)
16
Indignados Event ExamplesInaugural march, 15 May Large gathering, 20 May
Gathering, 15 Oct Demonstration, 17 Nov
17
Evaluation• F-measure (macro), Precision, Recall
– goodness of retrieved photos, but not how well they were clustered into events
• Normalized Mutual Information (NMI)– compares automatically extracted clustering of
photos into events with the ground truth• Evaluation script is made available together
with the dataset.• Implementation of event detection available:
http://mklab.iti.gr/project/sed2012_certh
Questions
@sympapadopoulos www.slideshare.net/sympapadopoulos
Recommended