Upload
osborn-everett-warner
View
212
Download
0
Tags:
Embed Size (px)
Citation preview
HUMAINE - WP5 Belfast04 1
Experience with emotion labelling in naturalistic data
L. Devillers, S. Abrilian, JC. Martin, LIMSI-CNRS, Orsay
E. Cowie, C. Cox, I. Sneddon, QUB, Belfast
HUMAINE - WP5 Belfast04 2
QUB - LIMSI
QUB and LIMSI are developing complementary approaches (coding scheme and tools) for annotating naturalistic emotional behaviour in English and French TV videos.
This cooperation will enable:- to study cross cultural issues in naturalistic emotions - to compare and eventually combine discrete/continuous
coding schemes.
QUB and LIMSI have already exchanged some data and begun to annotate them
HUMAINE - WP5 Belfast04 3
Outline
1. Challenges in annotating naturalistic emotion2. Experiments of emotion labelling on audio and audio-visual
data: call centers, movies, TV3. Experiment of emotion labelling on EMoTV14. On-going work: new emotions and metadata
coding scheme 5. Illustrative examples (ANVIL + Feeltrace)
1. EMoTV12. Belfast Naturalistic database
HUMAINE - WP5 Belfast04 4
1 – Challenges in annotating naturalistic emotion
Goals: Detection of « real-life » emotion Simulation of « real-life » emotion with ECAs
Which emotions modelled ? Which context annotated ? Which representation ?
Descriptors of emotional states Verbal categories (Ekman, Plutchick) Abstract dimensions (Osgood, Cowie) Appraisal-based description (Sherer), interaction model OCC
(Ortony)
HUMAINE - WP5 Belfast04 5
Categories and dimensions:Redundancy/Complementarity
Verbal categories: Applied to segments: speaker turns, sub-units Choose among a finite list, a task-dependent labels list: finance,
emergency, interviews TV. Limited in number to be tractable
Dimensions: contiuous (Feeltrace Cowie, Schröder), scale (Craggs, Devillers&Vidrascu, Abrilian et al) Segments: sequence, sub-units of the sequence 3 dimensions: Activation (Intensity) / Valence / Control
Dimensions match with categories (coarse classes) but not allow to distinguish between by example fear and anger
Study redundancy and complementarity of verbal categories and dimensions
HUMAINE - WP5 Belfast04 6
Context annotation for naturalistic emotion
We have to take into account of the contextual information
for naturalistic emotion at multiple levels Different contextual information are needed for different type of
application and modalities A tentative proposal of relevant contextual cues is on-going for EMoTV corpus. This scheme can be refined through work with different databases
HUMAINE - WP5 Belfast04 7
In practice
Iterative annotation protocol: Definition
emotion labels abstract dimensions: Task-dependent labels Universal dimensions segmental units: overall sequence, utterance, sub-units, words, etc.
Annotation One label or combined labels with abstract dimensions per segment Meta-annotation: context situation, appraisal related descriptors
Validation inter-annotator agreement perceptual tests
HUMAINE - WP5 Belfast04 8
2 – Naturalistic data: audio and audio-visual
Audio: Call Centers Pros: Natural H-H Interaction, Cons: social aspects, phone channel Task-dependent emotion: financial aspects, emergency, etc
Audio-visual: TV, moviesTV: +/- natural dependent on the type of TV broadcastings (Games,
reality-shows, news, interviews, etc), Live/Non live, recording context, etc EMoTV1, Interviews, very variable emotional content behavior
Realistic fictions: less naturalistic Emotions (such as fear, distress) in abnormal dangerous situation are
impossible to collect.
Goals: Call centers and movies: detection of emotion (audio-based) EMoTV: provocative corpus for stuying ECA specification
HUMAINE - WP5 Belfast04 9
Task-dependent annotations (1)FP5 - Amities Project
collaboration LIMSI/ENST Call center 1: Stock Exchange Customer Service Center Fear of losing money!
fear/apprehension, anger/irritation, satisfaction, excuse, neutral2 annotators - 12 % of speaker turns with emotion - kappa 0.8 100 dialogs, 5000 speaker turns
[Devillers, Vasilescu, LREC 2004, Speech prosody 2004, ICPhS 2003]
Call center 2: Capital Bank Service Center Fear of missing money!
fear/apprehension, anger/irritation, satisfaction, excuse, neutral 2 annotators on 1K speaker turns randomly extracted – 10 % of speaker
turns with emotion - kappa 0.5 250 dialogs, 5000 speaker turns extracted [Devillers, Vasilescu, LREC 2004]
HUMAINE - WP5 Belfast04 10
Task-dependent annotations (2)Collaboration LIMSI-Emergency call center
Call center 3: Emergency Service Center Fear of being sick, real fear/panic, call for help Larger emotional behaviour than for financial call centers - 18 classes obtained after
labels selection among Humaine emotion list (R. Cowie) - 5 persons (transcribers), majority voting: anxiety, stress, fear, panic, annoyance, cold anger, hot anger,
disappointement, sadness, despair, hurt, dismay, embarassment, relief, interest, amusement, compassion, surprise + negative, positive and neutral.
Annotation of 20h (on-going process with Transcriber tool, refinement of the labels list) 1/ manual segmentation (sub-speaker-turn segments)
2/ segment annotation with major/minor emotion, with abstract dimensions (scale) 3/ meta-annotation: contextual information: motif for the call, patient lies (kin), etc. audio information: quality, accent, pathological voice, etc.
PhD student: Laurence Vidrascu (LIMSI)
HUMAINE - WP5 Belfast04 11
Task-dependent annotations(3) collaboration ENST/LIMSI/THALES
Fiction: Fear manifestation in realistic movies Video surveillance application Fear of aggression !
Fear vs other neg emotions, other emotions, neutral Valence, Intensity, Control Video help for fear emotion
annotation (providing context) context: ex: aggressor, victim POSTER – I Vasilescu
[clavel, vasilescu, devillers, ehrette, ICSLP 2004]
PhD student: Chloé Clavel
HUMAINE - WP5 Belfast04 12
EmoTV1 – Large number of topics – interviews TV from news 51 clips, various context, 14 emotion labels, multimodal annotation [Ref] S. Abrilian, L. Devillers, JC Martin, S. Buisine, SummerSchoolWP5
Task-dependent annotations (4) EmoTV – FP6- HUMAINE
HUMAINE - WP5 Belfast04 13
3 – Experience of labellingon EmoTV1
Study the influence of the modalities on the perception
of emotions Two independent annotators: master students in psychology-
coder1 (male), coder2 (female) Annotations with Anvil tool (Kipp 2001) for 3 conditions:
Audio without video Video without audio Audio with video
HUMAINE - WP5 Belfast04 14
Segmentation and annotationprotocol for the 3 conditions
Instructions: detect emotional events
Segmentation (free) followed by agreed segmentationAnnotation scheme combining:
Labels (free-choice) Two abstract dimensions
Intensity (from very low to very high)Valence (from negative to positive)
Context: theme, what-for, etc (for the audio+video)
[Ref: Buisine, Abrilian, Devillers, Martin, poster WP3]
HUMAINE - WP5 Belfast04 15
1.1. Segmenting the extracts 2 independent coders
Separate segmentation of audio and video extracts
Unifying the segments2.2.
Intersection for videos
Union for audio corpus
Labeling the segments3.3.
Analyses inter-coder reliability
For categories of labels (Cohen’s kappa)on audio and video corpus
Step1: Audio-only and Video-only conditions
HUMAINE - WP5 Belfast04 16
1.1. Segmenting the extracts 2 independent coders
Separate segmentation of audio-video extracts
Unifying the segments2.2.
Labeling the segments3.3.
Analyses inter-coder reliability
For categories of labels (Cohen’s kappa)On audio-video corpus
Step2: Audio+Video conditions
HUMAINE - WP5 Belfast04 17
Segmentation (free for two annotators): Audio-only and Video-only
2 times more segments for video than for audio condition for both annotators
Automatic decision for obtaining a common set of segments (decision semantically motivated)
Intersection for video condition: 295 segmentsUnion for audio condition: 181 segments
Audio+video Agreed on a common set of 281 emotional segments
The use of audio-only segmentation for audio+video is not straightforward Audio+Video segments are included in audio-only segments.
Analyse: Speech vs. audio visual differences for segmentation
HUMAINE - WP5 Belfast04 18
Emotional labels
From the three experiments of annotation: a list of 176 different labels after normalization -> classified into a set of 14 labels
anger, despair, disgust, doubt, exaltation, fear, irritation, joy, neutral, pain, sadness, serenity, surprise and worry.
HUMAINE - WP5 Belfast04 19
Analyse: Speech vs. audio visual differences for annotation
Inter-coder agreements: Kappa values (on segments) Emotional Labels (14 values): audio-video 0.37, video 0.43,
audio 0.54 2 abstract dimensions:
Intensity: low inter-coder agreements except audio Video and Audio+Video very low, audio 0.69
Valence (Neg/?/Pos): high level of agreement for audio, Audio+Video 0.3 and Video 0.4, Audio 0.75
Low kappa for valence: Positive/Negative confusionAudio+Video 11%, Video 7%, Audio 3%
Real-life emotions -> blended, ambiguous, difficult to annotate
HUMAINE - WP5 Belfast04 20
Emotion annotation agreement for the 3 conditions (1)
VIDEO condition
joy35%
neutral1%
surprise2%
irritation5% worry
9%exaltation
5%
despair7%
disgust1%
doubt5%
pain0%
fear0%
sadness10%
anger20%
AUDIO condition
joy40%
0%
anger17%
doubt4%
disgust1%
pain2%
despair6%
worry1%irritation
8%
exaltation6%
neutral8%
fear0%
sadness7%
surprise0%
AUDIO-VIDEO condition
joy36%
serenity3%
sadness3%
fear1%
surprise0%
neutral12%
despair5%
irritation6% worry
0%
doubt0%
anger19%
pain7%
exaltation8%
disgust0%
HUMAINE - WP5 Belfast04 21
Emotion annotation agreement for the 3 conditions (2)
Anger, Irritation, Joy, Exaltation, Sadness -> high level of agreement for Video, Audio + Audio-Video conditions
Surprise, Worry -> for Video condition (visual cues)
Doubt -> for Audio or Video conditions, not for Audio+Video
Pain -> for Audio and Audio+Video (acoustic cues)
Serenity -> only for Audio+Video condition (much more subtle)
Neutral -> 1% of agreement for Video condition
HUMAINE - WP5 Belfast04 22
0
1
2
3
4
5
6
7
angerpaindoubt
disgust
despair
exaltation
irritation
joy
neutral
serenity
surprise
sadness
Série1
Série2
Valence and EmotionAudio-video
HUMAINE - WP5 Belfast04 24
Emotion perception - high subjectivity: examples
Different perception between coder1(male)/coder2(female):in the same valence classe:
ex: clip3 audio/video condition: anger/sadness, anger/despair -> blended emotionbetween negative/positive classes: ex: a woman cries for joy (relief) clip 4 audio condition: sadness/sadness video condition: sadness/don’t know audio-video condition: joy/sadness -> cause-effect conflicts
HUMAINE - WP5 Belfast04 26
Assessment of annotations:Next steps
Inter-annotation agreement Kappa low (14 classes) -> ambiguous annotated data but also rich data Study of disagreements in order
to define the different type of complex or blended emotions: low-intensity, cause-effect, masked, sequential(transition), ambiguous, etc.
Define hierarchical levels of annotations
Perceptual tests: Multilingual cross-cultural perceptual tests For validating annotation labels and type of emotions For studying the emotional perceptual abilities of coders: personality,
sensibility to different emotional cues in audio, face, gesture, etc Multilingual cross-cultural perceptual tests
Collaboration WP3-WP5 Unige, QUB, LIMSI
HUMAINE - WP5 Belfast04 27
Emotion categories: fine to coarse grain
Hierachical level of annotation: fine to coarse grained-labels
Fine (19 classes) Medium (10 classes) Coarse (7 classes) Anger anger, irritation Anger Irritation Fear fear, worry Fear Worry Disgust Disgust Disgust Sadness sadness, despair,
disappointment, Sadness
Despair Disappointment Joy joy, exaltation, pleased,
serenity Joy
Exaltation Pleased Serenity Surprise Surprise Surprise Shame Shame, embarrassment neutral/other Embarrassment Doubt Doubt Pride Pride neutral/other neutral/other
Pain Pain
HUMAINE - WP5 Belfast04 28
4- New annotation scheme (on-going)
Annotation of the global sequence and emotional segments of the sequence with:
non-basic emotion patterns: blended, masked, sequential, etc.
two emotion labels (major/minor)
activation, Intensity, Control, Valence (scale 1-5)
discrete temporal pattern: describing temporal evolution inside segments
Contextual annotations included derived appraisal-based descriptions: event causes emotion,
Global multimodal descriptors: Audio, face (eyes, gaze), torso, gesture (free-text fields)
Emotions and Metadata Coding Scheme: Annotation guide -> WP5 exemplar
HUMAINE - WP5 Belfast04 29
Intra-emotional segment temporal evolution
Abstract dimensions much more suitable than categorical description to describe gradual and mixed emotional behavior.
In ANVIL scheme, temporal evolution is given by the sequence of emotional segments (some are transitions) but intra-segment dynamic is lacking
On-going study: temporal evolution + categorical labels
Feetrace continuous dimension annotation (LIMSI/QUB)
Discrete temporal pattern describing intra segment evolution (LIMSI/Paris 8) such as:
HUMAINE - WP5 Belfast04 30
Context annotation for naturalistic emotion (on-going)
A tentative proposal of relevant contextual annotations: Emotion-context (some derived from appraisal-related descriptors)
Cause-emotion: text-free Time-of-emotion: immediate, near past, past, future Relation person-emotion: society subject, true story by self, by kin Degree-of-implication: low, normal, high
Overall communicative goal What-for: to claim, to share a feeling, etc To-whom: public, kin, society, etc
Scene descriptors: theme, type of interaction, Character descriptors: age, gender, race Recording: quality, camera/person position, channel and time
HUMAINE - WP5 Belfast04 33
Illustration of segmentationproblems
Segmentation/Annotation (clip 3) (Summer School Belfast)
Coder1 4-9 9-20 20-29 29-31 anger anger anger anger despair despair despair sadness
Coder2 4-24 24-34anger despair
sadness Coder3 4-7 7-11 11-24 24-32 ? despair irritation anger anger Coder44-7 7-10 10-13 20-23 26-31 anger anger anger dispointment
Final 4-7 7-20 20-33On-going study to find adequate rules to segment audio-video
sequence in emotional units
HUMAINE - WP5 Belfast04 34
Emotional annotations per segment by several coders
French coders - New scheme Majeur/Minor (I, A, C, V) 1-5 Coder1 4-7 7-20 20-31 anger anger anger
sadness despair
(4,4,3,2) (4,4,2,1) (5,5,2,1) blended blended
Coder2 4-7 7-20 20-31 4-7 7-20 20-31 worry anger despair worry 0,66 anger 0.5 despair 0.5
sadness anger anger 0.34 sadness 0.34 anger 0.5
(4,4,3,2) (4,4,2,1) (5,4,1,1) disgust 0.16blended blend (3,3,3,2) (4,4,2,1) (5,4,1,1)
Coder34-7 7-20 20-31 worry anger despair
disgust anger
(2,2,3,2) (4,4,2,1) (5,4,1,1)blended blended
Instead of a priori choice -> weighted vector of categories could be kept
HUMAINE - WP5 Belfast04 35
Feeltrace annotations combined with ANVIL labels
-1,00E+00
-8,00E-01
-6,00E-01
-4,00E-01
-2,00E-01
0,00E+00
2,00E-01
4,00E-01
6,00E-01
8,00E-01
1,00E+00
Tim
e
Valence
Activation
despairanger
-1.20E+00
-1.00E+00
-8.00E-01
-6.00E-01
-4.00E-01
-2.00E-01
0.00E+00
2.00E-01
4.00E-01
6.00E-01
Time Valence
Activation
Coder1
Coder2
worry
French Coders
HUMAINE - WP5 Belfast04 36
Clip3 annotated by QUB team with Feeltrace
Coder: Cate
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Time
Valence
Activation
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Time
Valence
Activation
Coder: Ellen
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
Time
Valence
Activation
Coder: IanHigh similarity between Feeltrace Annotations (QUB and LIMSI) for this clip
HUMAINE - WP5 Belfast04 37
Clip3: global sequence annotations from LIMSI/QUB
Coder: Cate Coder: Ellen Coder: Ian
High similarity between global label annotations (QUB and LIMSI) for this clip
Angry StrongSad MediumHurt Medium
Angry StrongHurt StrongDespair Strong
Angry MediumResentful MediumDespair Weak
Coder: SarkisAnger(5, 5, 2, 1)
Coder: Jean-ClaudeAnger(5, 4, 1, 1)
Coder: LaurenceAnger(5, 4, 2, 1)
QUB
LIMSI
HUMAINE - WP5 Belfast04 39
Weighted vectors combining emotion annotations from coders
French coders (Clip61) - Majeur/Minor (I, A, C, V) scale 1 -5
3 coders: Instead of a priori choice -> weighted vector of categories per segment
0-2.84 3.28-3.48 3.64-4.08 4.48-6joy 0.66 exaltation 0.5 joy 0.5 joy 0.75pleased 0.34 joy 0.25 pleased 0.25 pride 0.25(4,4,3,4) pride 0.25 pride 0.25 (4,4,4,5) (4,4,4,5) (4,4,3,5)
6.16-7.52 7.68-10-52 11.04-12.24 12.24-14.76 exaltation 0.5 joy 0.2 doubt 0.5 pride 0.5 joy 0.25 serenity 0.2 pleased 0.5 pleased 0.25
pride 0.25 pleased 0.2 (3,3,4,4) serenity 0.25(5,5,5,5) doubt 0.2 (4,4,5,4) worry 0.2 (3,3,3,3)
HUMAINE - WP5 Belfast04 40
Clip 61b: Feeltrace
-4,00E-01
-2,00E-01
0,00E+00
2,00E-01
4,00E-01
6,00E-01
8,00E-01
1,00E+00
Tim
e
Time
Valence
Activation
-1,00E-01
0,00E+00
1,00E-01
2,00E-01
3,00E-01
4,00E-01
5,00E-01
6,00E-01
7,00E-01
8,00E-01
Tim
e
Valence
Activation
Coder1
Coder2
LIMSI Coders
HUMAINE - WP5 Belfast04 41
Clip61b global annotation byQUB and LIMSI
QUB coders – Intensity: strong, medium, weak Coder 1 Coder 2 Coder 3 Coder 4 Coder 5 Coder 6Happy Affectionate Affectionate Affectionate Happy ConfidentAffectionate Happy Happy Happy Excited AmusedAgreable
LIMSI coders – (I, A, V, C) – scale 1 - 5
Coder 1 Coder 2 Coder 3 Joy Joy Joy
(5 4 5 3) (4 4 4 4) (4 5 5 4) High similarity between global label annotations (QUB and LIMSI) for this clip
HUMAINE - WP5 Belfast04 42
Conclusion/Perspectives
Conclusions Annotation of 2 verbal labels per segment for naturalistic emotions Combination of emotion annotations from coders (« soft
categories ») Combination of categorical and dimension emotion representation (QUB/LIMSI)
On-going work Temporal emotion evolution for ECAs (Univ. P8/LIMSI/QUB) Validation of the new annotation scheme Re-Annotation of EmoTV1 (others coders) Perceptual tests (UNIGE/QUB/LIMSI)
Perspectives Correlation between multimodal and emotion annotations ECA with « real-life » emotion (Univ. P8/LIMSI) EMoTV2