The MediaEval 2012 Affect Task: Violent Scenes Detectio

Affect Task: Violent scenes detectionTask overview

October, 03 2012

MediaEval 2012Guillaume, Mohammad, Cédric & Claire-Hélène

Second year! Derives from a Technicolor Use Case

Helping users choose movies that are suitable for children in their family by proposing a preview of the most violent segments

Very same definition

“Physical violence or accident resulting in human injury or pain”

As objective as possible

But: Dead people without seeing how they appear to be dead => Not annotated

Somebody hurting himself while shaving => Annotated

Does not match the use case…

2

Task definition

04/10/23

Two types of runs Primary and required run at shot level,

i.e. a decision violent/non violent should be provided for each movie shot

Optional run at segment level, i.e. violent segments (starting and ending times) should be extracted by the

participants

Scores are required to compute the official measure

Rules Any features automatically extracted from the DVDs can be used This includes audio, video and subtitles No external additional data (e.g. from the internet)

3

Task definition

04/10/23

18 Hollywood movies purchased by participants Of different genre (from extremely violent to non violent) both in the

learning and test sets.

4

Data set

04/10/23

5

Data set – development set

04/10/23

Movie Duration Shot # Violence duration (%)

Violent shots (%)

Armageddon 8680.16 3562 14.03 14.6

Billy elliot 6349.44 1236 5.14 4.21

Eragon 5985.44 1663 11.02 16.6

Harry Potter 5 7953.52 1891 10.46 13.43

I am Legend 5779.92 1547 12.75 20.43

Leon 6344.56 1547 4.3 7.24

Midnight Express 6961.04 1677 7.28 11.15

Pirates Carib. 8239.44 2534 11.3 12.47

Reservoir Dogs 5712.96 856 11.55 12.38

Saving private Ryan

9751.0 2494 12.92 18.81

The Six Senth 6178.04 963 1.34 2.80

The wicker man 5870.44 1638 8.36 6.72

Kill Bill1 5626.6 1597 17.4 24.8

The Bourne Identity

5877.6 1995 7.5 9.3

The wizard of Oz 5415.7 908 5.5 5.0

TOTAL 100725.8 (27h58min)

26108 9.39 11.99

6

Data set – test set

04/10/23

Movie Duration Shot #

Violence duration (%)

Violent shots (%)

Dead Poets Society

7413.24 1583 0.75 2.15

Fight Club 8005.72 2335 7.61 13.28

Independence Day

8834.32 2652 6.4 13.99

TOTAL 24253.28 (6h44min)

6570 4.92 9.80

Groundtruth manually created by 7 human assessors: Segments containing violent events according to the definition

One unique violent action per segment wherever possible Or tag ‘multiple_action_scenes’

7 high level video concepts: Presence of blood Presence of fire Presence of guns or assimilated weapons Presence of cold arms (knives or assimilated weapons) Fights (1 against 1, small, large, distant attack) Car chases Gory scenes (graphic images of bloodletting and/or tissue damage)

3 high level audio concepts: Gunshots, cannon fire Screams, effort noise Explosions

Automatically generated shot boundaries with keyframes

7

Annotations & additional data

04/10/23

Results

Official measure : Mean Average Precision @100 Average precision at the 100 top ranked violent shots, over the 3 test

movies

For comparison purpose with 2011, the MediaEval Cost

where

and are the estimated probabilities of false alarm and missed detection

Additional metrics: false alarm rate, miss detection rate, precision, recall, F-measure, MAP@20,

MAP Detection error trade-off (DET) curves

9

Evaluation metrics

04/10/23

missmissfafa PCPCC

10

1

miss

fa

C

C

faP missP

10

Task participation

04/10/23

Survey: 35 teams manifested interest for the task (among which 12 were very

interested) 2011: 13 teams

Registration: 11 teams = 6 core partipants + 1 organizers team + 4 additional teams At least, 3 joint submissions - 16 research teams - 9 countries 3 teams already worked on the detection of violence in movies 2011: 6 teams = 4 + 2 organizers, 1 joint submission, 4 countries

Submission: 7 teams + 1 organizers team We have lost 3 teams (corpus availability, economical issues, low

performance) Grand total of 36 runs: 35 at shot level and 1 brave submission at

segment level! 2011: 29 runs at shot level, 4 teams + 2 organizers teams

Workshop participation: 6 teams 2011: 3 teams

11

Task baseline – random classification

04/10/23

Movie MAP@100

Dead Poets Society 2.17

Fight Club 13.27

Independence Day 13.98

Total 9.08

12

Task participation

04/10/23

RegistrationCountr

y

Run submissi

on

2011participation

Workshop Participati

onMAP@100

MediaEvalCost

ARF Austria1 (shot)

X65.05 3.56

1 (segment) 54.82 5.13

DYNI – LSIS France 5 X 12.44 7.96

NII - Video Processing Lab

Japan 5 X 30.82 1.28

Shanghai-Hongkong China 5 X 62.38 5.52

TUB - DAI Germany 5 X X 18.53 4.20

TUMGermany-

Austria5 X 48.43 7.83

LIG - MRIM France 4 X X 31.37 4.16

TEC*France-

UK5 X X 61.82 3.56

Total8 teams (23%)

36 5 6 (75%)

Rand. classification 9.8

*: task organizerBest run according to the MAP@100.

13

Task participation

04/10/23


RegistrationCountr

y

Run submissi

on

2011participation


onMAP@100

MediaEvalCost

ARF Austria1 (shot)

X65.05 3.56

1 (segment) 54.82 5.13



Japan 5 X 30.82 1.28



TUMGermany-

Austria5 X 48.43 7.83


TEC*France-

UK5 X X 61.82 3.56

Total8 teams (23%)

36 5 6 (75%)


14

Task participation

04/10/23


RegistrationCountr

y

Run submissi

on

2011participation


onMAP@100

MediaEvalCost

ARF Austria1 (shot)

X65.05 3.56

1 (segment) 54.82 5.13



Japan 5 X 30.82 1.28



TUMGermany-

Austria5 X 48.43 7.83


TEC*France-

UK5 X X 61.82 3.56

Total8 teams (23%)

36 5 6 (75%)


Features: Mainly classic low-level features either audio or video Mainly computed at frame level

Classification step: Mainly supervised machine learning systems

Mostly SVM-based, 1 NN, 1BN Two systems based on similarity computation (KNN)

Multimodality: Is audio, video, audio and video more informative? No real convergence No use of text features

Mid-level concepts: YES! This year, they were largelly used (4 teams out of 8) Seems promising, for some of them (except blood) But how to use them? (as additional features, as an intermediate step)

Test set: seems that… It worked better on Independence Day and Dead Poets Society was more difficult. Due to some similarity with other movies in the dev set? Generalization issue?

15

Learned points

04/10/23

16

DET curves (best run per participant-MAP@100)

04/10/23

17

Recall vs. Precision (best run per participant – MAP@100)

04/10/23

Success of the task Increased number of participants Attracked people from the domain Quality of results has deeply increased

MediaEval2013 Which task definition? How to go one step further in the multimodality?

Text is still not used

Who will join the organizers’ group for next year?

18

Conclusions & perspectives

04/10/23

Technology

The MediaEval 2012 Affect Task: Violent Scenes Detectio