HUMAINE - WP5 Belfast04 1 Experience with emotion labelling in naturalistic data L. Devillers, S. Abrilian, JC. Martin, LIMSI-CNRS, Orsay E. Cowie, C

HUMAINE - WP5 Belfast04 1

Experience with emotion labelling in naturalistic data

L. Devillers, S. Abrilian, JC. Martin, LIMSI-CNRS, Orsay

E. Cowie, C. Cox, I. Sneddon, QUB, Belfast


QUB - LIMSI

QUB and LIMSI are developing complementary approaches (coding scheme and tools) for annotating naturalistic emotional behaviour in English and French TV videos.

This cooperation will enable:- to study cross cultural issues in naturalistic emotions - to compare and eventually combine discrete/continuous

coding schemes.

QUB and LIMSI have already exchanged some data and begun to annotate them


Outline

1. Challenges in annotating naturalistic emotion2. Experiments of emotion labelling on audio and audio-visual

data: call centers, movies, TV3. Experiment of emotion labelling on EMoTV14. On-going work: new emotions and metadata

coding scheme 5. Illustrative examples (ANVIL + Feeltrace)

1. EMoTV12. Belfast Naturalistic database


1 – Challenges in annotating naturalistic emotion

Goals: Detection of « real-life » emotion Simulation of « real-life » emotion with ECAs

Which emotions modelled ? Which context annotated ? Which representation ?

Descriptors of emotional states Verbal categories (Ekman, Plutchick) Abstract dimensions (Osgood, Cowie) Appraisal-based description (Sherer), interaction model OCC

(Ortony)


Categories and dimensions:Redundancy/Complementarity

Verbal categories: Applied to segments: speaker turns, sub-units Choose among a finite list, a task-dependent labels list: finance,

emergency, interviews TV. Limited in number to be tractable

Dimensions: contiuous (Feeltrace Cowie, Schröder), scale (Craggs, Devillers&Vidrascu, Abrilian et al) Segments: sequence, sub-units of the sequence 3 dimensions: Activation (Intensity) / Valence / Control

Dimensions match with categories (coarse classes) but not allow to distinguish between by example fear and anger

Study redundancy and complementarity of verbal categories and dimensions


Context annotation for naturalistic emotion

We have to take into account of the contextual information

for naturalistic emotion at multiple levels Different contextual information are needed for different type of

application and modalities A tentative proposal of relevant contextual cues is on-going for EMoTV corpus. This scheme can be refined through work with different databases


In practice

Iterative annotation protocol: Definition

emotion labels abstract dimensions: Task-dependent labels Universal dimensions segmental units: overall sequence, utterance, sub-units, words, etc.

Annotation One label or combined labels with abstract dimensions per segment Meta-annotation: context situation, appraisal related descriptors

Validation inter-annotator agreement perceptual tests


2 – Naturalistic data: audio and audio-visual

Audio: Call Centers Pros: Natural H-H Interaction, Cons: social aspects, phone channel Task-dependent emotion: financial aspects, emergency, etc

Audio-visual: TV, moviesTV: +/- natural dependent on the type of TV broadcastings (Games,

reality-shows, news, interviews, etc), Live/Non live, recording context, etc EMoTV1, Interviews, very variable emotional content behavior

Realistic fictions: less naturalistic Emotions (such as fear, distress) in abnormal dangerous situation are

impossible to collect.

Goals: Call centers and movies: detection of emotion (audio-based) EMoTV: provocative corpus for stuying ECA specification


Task-dependent annotations (1)FP5 - Amities Project

collaboration LIMSI/ENST Call center 1: Stock Exchange Customer Service Center Fear of losing money!

fear/apprehension, anger/irritation, satisfaction, excuse, neutral2 annotators - 12 % of speaker turns with emotion - kappa 0.8 100 dialogs, 5000 speaker turns

[Devillers, Vasilescu, LREC 2004, Speech prosody 2004, ICPhS 2003]

Call center 2: Capital Bank Service Center Fear of missing money!

fear/apprehension, anger/irritation, satisfaction, excuse, neutral 2 annotators on 1K speaker turns randomly extracted – 10 % of speaker

turns with emotion - kappa 0.5 250 dialogs, 5000 speaker turns extracted [Devillers, Vasilescu, LREC 2004]


Task-dependent annotations (2)Collaboration LIMSI-Emergency call center

Call center 3: Emergency Service Center Fear of being sick, real fear/panic, call for help Larger emotional behaviour than for financial call centers - 18 classes obtained after

labels selection among Humaine emotion list (R. Cowie) - 5 persons (transcribers), majority voting: anxiety, stress, fear, panic, annoyance, cold anger, hot anger,

disappointement, sadness, despair, hurt, dismay, embarassment, relief, interest, amusement, compassion, surprise + negative, positive and neutral.

Annotation of 20h (on-going process with Transcriber tool, refinement of the labels list) 1/ manual segmentation (sub-speaker-turn segments)

2/ segment annotation with major/minor emotion, with abstract dimensions (scale) 3/ meta-annotation: contextual information: motif for the call, patient lies (kin), etc. audio information: quality, accent, pathological voice, etc.

PhD student: Laurence Vidrascu (LIMSI)


Task-dependent annotations(3) collaboration ENST/LIMSI/THALES

Fiction: Fear manifestation in realistic movies Video surveillance application Fear of aggression !

Fear vs other neg emotions, other emotions, neutral Valence, Intensity, Control Video help for fear emotion

annotation (providing context) context: ex: aggressor, victim POSTER – I Vasilescu

[clavel, vasilescu, devillers, ehrette, ICSLP 2004]

PhD student: Chloé Clavel


EmoTV1 – Large number of topics – interviews TV from news 51 clips, various context, 14 emotion labels, multimodal annotation [Ref] S. Abrilian, L. Devillers, JC Martin, S. Buisine, SummerSchoolWP5

Task-dependent annotations (4) EmoTV – FP6- HUMAINE


3 – Experience of labellingon EmoTV1

Study the influence of the modalities on the perception

of emotions Two independent annotators: master students in psychology-

coder1 (male), coder2 (female) Annotations with Anvil tool (Kipp 2001) for 3 conditions:

Audio without video Video without audio Audio with video


Segmentation and annotationprotocol for the 3 conditions

Instructions: detect emotional events

Segmentation (free) followed by agreed segmentationAnnotation scheme combining:

Labels (free-choice) Two abstract dimensions

Intensity (from very low to very high)Valence (from negative to positive)

Context: theme, what-for, etc (for the audio+video)

[Ref: Buisine, Abrilian, Devillers, Martin, poster WP3]


1.1. Segmenting the extracts 2 independent coders

Separate segmentation of audio and video extracts

Unifying the segments2.2.

Intersection for videos

Union for audio corpus

Labeling the segments3.3.

Analyses inter-coder reliability

For categories of labels (Cohen’s kappa)on audio and video corpus

Step1: Audio-only and Video-only conditions


1.1. Segmenting the extracts 2 independent coders

Separate segmentation of audio-video extracts

Unifying the segments2.2.

Labeling the segments3.3.

Analyses inter-coder reliability

For categories of labels (Cohen’s kappa)On audio-video corpus

Step2: Audio+Video conditions


Segmentation (free for two annotators): Audio-only and Video-only

2 times more segments for video than for audio condition for both annotators

Automatic decision for obtaining a common set of segments (decision semantically motivated)

Intersection for video condition: 295 segmentsUnion for audio condition: 181 segments

Audio+video Agreed on a common set of 281 emotional segments

The use of audio-only segmentation for audio+video is not straightforward Audio+Video segments are included in audio-only segments.

Analyse: Speech vs. audio visual differences for segmentation


Emotional labels

From the three experiments of annotation: a list of 176 different labels after normalization -> classified into a set of 14 labels

anger, despair, disgust, doubt, exaltation, fear, irritation, joy, neutral, pain, sadness, serenity, surprise and worry.


Analyse: Speech vs. audio visual differences for annotation

Inter-coder agreements: Kappa values (on segments) Emotional Labels (14 values): audio-video 0.37, video 0.43,

audio 0.54 2 abstract dimensions:

Intensity: low inter-coder agreements except audio Video and Audio+Video very low, audio 0.69

Valence (Neg/?/Pos): high level of agreement for audio, Audio+Video 0.3 and Video 0.4, Audio 0.75

Low kappa for valence: Positive/Negative confusionAudio+Video 11%, Video 7%, Audio 3%

Real-life emotions -> blended, ambiguous, difficult to annotate


Emotion annotation agreement for the 3 conditions (1)

VIDEO condition

joy35%

neutral1%

surprise2%

irritation5% worry

9%exaltation

5%

despair7%

disgust1%

doubt5%

pain0%

fear0%

sadness10%

anger20%

AUDIO condition

joy40%

0%

anger17%

doubt4%

disgust1%

pain2%

despair6%

worry1%irritation

8%

exaltation6%

neutral8%

fear0%

sadness7%

surprise0%

AUDIO-VIDEO condition

joy36%

serenity3%

sadness3%

fear1%

surprise0%

neutral12%

despair5%

irritation6% worry

0%

doubt0%

anger19%

pain7%

exaltation8%

disgust0%


Emotion annotation agreement for the 3 conditions (2)

Anger, Irritation, Joy, Exaltation, Sadness -> high level of agreement for Video, Audio + Audio-Video conditions

Surprise, Worry -> for Video condition (visual cues)

Doubt -> for Audio or Video conditions, not for Audio+Video

Pain -> for Audio and Audio+Video (acoustic cues)

Serenity -> only for Audio+Video condition (much more subtle)

Neutral -> 1% of agreement for Video condition


0

1

2

3

4

5

6

7

angerpaindoubt

disgust

despair

exaltation

irritation

joy

neutral

serenity

surprise

sadness

Série1

Série2

Valence and EmotionAudio-video


Clip 29: Joy/Disgust valence ?


Emotion perception - high subjectivity: examples

Different perception between coder1(male)/coder2(female):in the same valence classe:

ex: clip3 audio/video condition: anger/sadness, anger/despair -> blended emotionbetween negative/positive classes: ex: a woman cries for joy (relief) clip 4 audio condition: sadness/sadness video condition: sadness/don’t know audio-video condition: joy/sadness -> cause-effect conflicts


Clip 4: Joy (relief)/ Sadness: valence ?


Assessment of annotations:Next steps

Inter-annotation agreement Kappa low (14 classes) -> ambiguous annotated data but also rich data Study of disagreements in order

to define the different type of complex or blended emotions: low-intensity, cause-effect, masked, sequential(transition), ambiguous, etc.

Define hierarchical levels of annotations

Perceptual tests: Multilingual cross-cultural perceptual tests For validating annotation labels and type of emotions For studying the emotional perceptual abilities of coders: personality,

sensibility to different emotional cues in audio, face, gesture, etc Multilingual cross-cultural perceptual tests

Collaboration WP3-WP5 Unige, QUB, LIMSI


Emotion categories: fine to coarse grain

Hierachical level of annotation: fine to coarse grained-labels

Fine (19 classes) Medium (10 classes) Coarse (7 classes) Anger anger, irritation Anger Irritation Fear fear, worry Fear Worry Disgust Disgust Disgust Sadness sadness, despair,

disappointment, Sadness

Despair Disappointment Joy joy, exaltation, pleased,

serenity Joy

Exaltation Pleased Serenity Surprise Surprise Surprise Shame Shame, embarrassment neutral/other Embarrassment Doubt Doubt Pride Pride neutral/other neutral/other

Pain Pain


4- New annotation scheme (on-going)

Annotation of the global sequence and emotional segments of the sequence with:

non-basic emotion patterns: blended, masked, sequential, etc.

two emotion labels (major/minor)

activation, Intensity, Control, Valence (scale 1-5)

discrete temporal pattern: describing temporal evolution inside segments

Contextual annotations included derived appraisal-based descriptions: event causes emotion,

Global multimodal descriptors: Audio, face (eyes, gaze), torso, gesture (free-text fields)

Emotions and Metadata Coding Scheme: Annotation guide -> WP5 exemplar


Intra-emotional segment temporal evolution

Abstract dimensions much more suitable than categorical description to describe gradual and mixed emotional behavior.

In ANVIL scheme, temporal evolution is given by the sequence of emotional segments (some are transitions) but intra-segment dynamic is lacking

On-going study: temporal evolution + categorical labels

Feetrace continuous dimension annotation (LIMSI/QUB)

Discrete temporal pattern describing intra segment evolution (LIMSI/Paris 8) such as:


Context annotation for naturalistic emotion (on-going)

A tentative proposal of relevant contextual annotations: Emotion-context (some derived from appraisal-related descriptors)

Cause-emotion: text-free Time-of-emotion: immediate, near past, past, future Relation person-emotion: society subject, true story by self, by kin Degree-of-implication: low, normal, high

Overall communicative goal What-for: to claim, to share a feeling, etc To-whom: public, kin, society, etc

Scene descriptors: theme, type of interaction, Character descriptors: age, gender, race Recording: quality, camera/person position, channel and time


5- Example EMoTV Clip 3


Global sequence annotation


Illustration of segmentationproblems

Segmentation/Annotation (clip 3) (Summer School Belfast)

Coder1 4-9 9-20 20-29 29-31 anger anger anger anger despair despair despair sadness

Coder2 4-24 24-34anger despair

sadness Coder3 4-7 7-11 11-24 24-32 ? despair irritation anger anger Coder44-7 7-10 10-13 20-23 26-31 anger anger anger dispointment

Final 4-7 7-20 20-33On-going study to find adequate rules to segment audio-video

sequence in emotional units


Emotional annotations per segment by several coders

French coders - New scheme Majeur/Minor (I, A, C, V) 1-5 Coder1 4-7 7-20 20-31 anger anger anger

sadness despair

(4,4,3,2) (4,4,2,1) (5,5,2,1) blended blended

Coder2 4-7 7-20 20-31 4-7 7-20 20-31 worry anger despair worry 0,66 anger 0.5 despair 0.5

sadness anger anger 0.34 sadness 0.34 anger 0.5

(4,4,3,2) (4,4,2,1) (5,4,1,1) disgust 0.16blended blend (3,3,3,2) (4,4,2,1) (5,4,1,1)

Coder34-7 7-20 20-31 worry anger despair

disgust anger

(2,2,3,2) (4,4,2,1) (5,4,1,1)blended blended

Instead of a priori choice -> weighted vector of categories could be kept


Feeltrace annotations combined with ANVIL labels

-1,00E+00

-8,00E-01

-6,00E-01

-4,00E-01

-2,00E-01

0,00E+00

2,00E-01

4,00E-01

6,00E-01

8,00E-01

1,00E+00

Tim

e

Valence

Activation

despairanger

-1.20E+00

-1.00E+00

-8.00E-01

-6.00E-01

-4.00E-01

-2.00E-01

0.00E+00

2.00E-01

4.00E-01

6.00E-01

Time Valence

Activation

Coder1

Coder2

worry

French Coders


Clip3 annotated by QUB team with Feeltrace

Coder: Cate

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Time

Valence

Activation

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Time

Valence

Activation

Coder: Ellen

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

Time

Valence

Activation

Coder: IanHigh similarity between Feeltrace Annotations (QUB and LIMSI) for this clip


Clip3: global sequence annotations from LIMSI/QUB

Coder: Cate Coder: Ellen Coder: Ian

High similarity between global label annotations (QUB and LIMSI) for this clip

Angry StrongSad MediumHurt Medium

Angry StrongHurt StrongDespair Strong

Angry MediumResentful MediumDespair Weak

Coder: SarkisAnger(5, 5, 2, 1)

Coder: Jean-ClaudeAnger(5, 4, 1, 1)

Coder: LaurenceAnger(5, 4, 2, 1)

QUB

LIMSI


Belfast naturalistic CorpusCollaboration QUB/LIMSI

Example clip 61b A+


Weighted vectors combining emotion annotations from coders

French coders (Clip61) - Majeur/Minor (I, A, C, V) scale 1 -5

3 coders: Instead of a priori choice -> weighted vector of categories per segment

0-2.84 3.28-3.48 3.64-4.08 4.48-6joy 0.66 exaltation 0.5 joy 0.5 joy 0.75pleased 0.34 joy 0.25 pleased 0.25 pride 0.25(4,4,3,4) pride 0.25 pride 0.25 (4,4,4,5) (4,4,4,5) (4,4,3,5)

6.16-7.52 7.68-10-52 11.04-12.24 12.24-14.76 exaltation 0.5 joy 0.2 doubt 0.5 pride 0.5 joy 0.25 serenity 0.2 pleased 0.5 pleased 0.25

pride 0.25 pleased 0.2 (3,3,4,4) serenity 0.25(5,5,5,5) doubt 0.2 (4,4,5,4) worry 0.2 (3,3,3,3)


Clip 61b: Feeltrace

-4,00E-01

-2,00E-01

0,00E+00

2,00E-01

4,00E-01

6,00E-01

8,00E-01

1,00E+00

Tim

e

Time

Valence

Activation

-1,00E-01

0,00E+00

1,00E-01

2,00E-01

3,00E-01

4,00E-01

5,00E-01

6,00E-01

7,00E-01

8,00E-01

Tim

e

Valence

Activation

Coder1

Coder2

LIMSI Coders


Clip61b global annotation byQUB and LIMSI

QUB coders – Intensity: strong, medium, weak Coder 1 Coder 2 Coder 3 Coder 4 Coder 5 Coder 6Happy Affectionate Affectionate Affectionate Happy ConfidentAffectionate Happy Happy Happy Excited AmusedAgreable

LIMSI coders – (I, A, V, C) – scale 1 - 5

Coder 1 Coder 2 Coder 3 Joy Joy Joy

(5 4 5 3) (4 4 4 4) (4 5 5 4) High similarity between global label annotations (QUB and LIMSI) for this clip


Conclusion/Perspectives

Conclusions Annotation of 2 verbal labels per segment for naturalistic emotions Combination of emotion annotations from coders (« soft

categories ») Combination of categorical and dimension emotion representation (QUB/LIMSI)

On-going work Temporal emotion evolution for ECAs (Univ. P8/LIMSI/QUB) Validation of the new annotation scheme Re-Annotation of EmoTV1 (others coders) Perceptual tests (UNIGE/QUB/LIMSI)

Perspectives Correlation between multimodal and emotion annotations ECA with « real-life » emotion (Univ. P8/LIMSI) EMoTV2


Next talk

Manual Annotation of Multimodal Behaviors in Emotional TV Interviews with ANVIL

Thank you for your attention

by Jean-Claude Martin:

Documents

HUMAINE - WP5 Belfast04 1 Experience with emotion labelling in naturalistic data L. Devillers, S. Abrilian, JC. Martin, LIMSI-CNRS, Orsay E. Cowie, C