Upload
lee-robertson
View
214
Download
1
Embed Size (px)
Citation preview
Exploiting Subjective Annotations
Dennis Reidsma and Rieks op den AkkerHuman Media Interaction
University of Twentehttp://hmi.ewi.utwente.nl
Types of content Annotation as a task of subjective
judgments?
Manifest content Pattern latent content Projective latent content
Cf. Potter and Levine-Donnerstein 1999
Projective latent content
Why annotate data as projective latent content? Because it cannot be defined
exhaustively, whereas annotators have good `mental schema’s’ for it
Because the data should be annotated in terms that fit with the understanding of `naïve users’
Inter-annotator agreement and projective content
Disagreements may be caused by Errors by annotators Invalid scheme (no true label exists) Different annotators having different
`truths’ in interpretation of behavior (subjectivity)
Subjective annotation People communicate in different
ways, and therefore, as an observer, may also judge the behavior of others differently
Subjective annotation
People communicate in different ways, and therefore, as an observer, may also judge the behavior of others differently
Projective content may be especially vulnerable to this problem
Subjective annotation
People communicate in different ways, and therefore, as an observer, may also judge the behavior of others differently
Projective content may be especially vulnerable to this problem
How to work with subjectively annotated data?
Subjective annotation
How to work with subjectively annotated data? Unfortunately, it leads to low levels of agreement, and therefore usually would be avoided as `unproductive material’
I. Predicting agreement
One way to work with subjective data is to try to find out in which contexts annotators would agree, and focus on those situations.
Result: a classifier that will not always classify all instances, but if it does, it will do so with greater accuracy
II. Explicitly modeling intersubjectivity A second way: model different
annotators separately, then find the cases where the models agree, and assume that those are the cases where the annotators would have agreed, too.
Result: a classifier that tells you for which instances other annotators would most probably agree with its classification
Advantages
Both solutions lead to `cautious classifiers’ that only render a judgment in those cases where annotators would have been expected to agree
This may carry over to users, too… Neither solution needs to have all
data multiply annotated for this
Time?
Pressing questions so far?
(The remainder of the talk will give two case studies.)
Case studies
I. Predicting agreement from information in other (easier) modalities: The case of contextual addressing
II. Explicitly modeling intersubjectivity in dialog markup: The case of Voting Classifiers
Data used: The AMI Corpus
100h of recorded meetings, annotated with dialog acts, focus of attention, gestures, addressing, decision points, and other layers
I. Contextual addressing
Addressing, and focus of attention. Agreement is highest for certain
FOA contexts. In those contexts, the classifier
also performed better. … more in paper
II. Modeling intersubjectivity
Modeling single annotators, for `yeah’ utterances
Data annotated non-overlapping, 3 annotators
All data
d s v
Trn (3585)
Tst (2289)
Trn (1753)
Tst (528)
Trn (3500)
Tst (1362)
II. Modeling intersubjectivity Cross annotator
training and testing
TST_d TST_s TST_v TST_all
C_d 69 64 52 63
C_s 59 68 48 57
C_v 63 57 66 63
II. Modeling intersubjectivity
Building a voting classifier:
Only classify an instance when all three annotator-specific expert classifiers agree
II. Modeling intersubjectivity
In the unanimous voting context, performance is higher due to increased precision (avg 6%)
Conclusions
Possible subjective aspects to annotation should be taken into account
Agreement metrics are not designed to handle this
We proposed two methods designed to cope with subjective data
Thank you!
Questions?