22
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Agustín Gravano, Stefan Benus, Julia Julia Hirschberg Shira Mitchell, Ilia Vovsha INTERSPEECH, Antwerp, August 2007 INTERSPEECH, Antwerp, August 2007 Spoken Language Processing Group Spoken Language Processing Group Columbia University Columbia University

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Classification of Discourse Functions of Affirmative Words

in Spoken Dialogue

Agustín Gravano, Stefan Benus, JuliaJulia Hirschberg

Shira Mitchell, Ilia Vovsha

INTERSPEECH, Antwerp, August 2007INTERSPEECH, Antwerp, August 2007

Spoken Language Processing GroupSpoken Language Processing GroupColumbia UniversityColumbia University

Agustín Gravano INTERSPEECH 2007

2

Cue Words

Ambiguous linguistic expressions used for Making a semantic contribution, or Conveying a pragmatic function.

Examples: now, well, so, alright, and, okay, first, by the way, on the other hand.

Single affirmative cue words Examples: alright, okay, mm-hm, right, uh-huh, yes. May be used to convey acknowledgment or

agreement, to change topic, to backchannel, etc.

Agustín Gravano INTERSPEECH 2007

3

Research Goals

Learn which features best characterize the different functions of single affirmative cue words.

Determine how these can be identified automatically.

Important in Spoken Dialogue Systems: Understand user input. Produce output appropriately.

Agustín Gravano INTERSPEECH 2007

4

Previous Work

Classification of cue words into discourse vs. sentential use. Hirschberg & Litman ’87, ’93; Litman ’94; Heeman,

Byron & Allen ’98; Zufferey & Popescu-Belis ’04. In our corpus:

right: 15% discourse, 85% sentential. All other affirmative cue words: 99% disc., 1% sent.

Discourse vs. sentential distinction insufficient. Need to define new classification tasks.

Agustín Gravano INTERSPEECH 2007

5

Talk Overview

Columbia Games Corpus Classification tasks Experimental features Results

Agustín Gravano INTERSPEECH 2007

6

The Columbia Games Corpus 12 spontaneous task-oriented dyadic conversations

in Standard American English. 2 subjects playing computer games; no eye contact.

Agustín Gravano INTERSPEECH 2007

7

The Columbia Games CorpusFunction of Affirmative Cue Words

Cue Words alright gotcha huh mm-hm okay right uh-huh yeah yep yes yup

Functions Acknowledgment / Agreement Backchannel Cue beginning discourse segment Cue ending discourse segment Check with the interlocutor Stall / Filler Back from a task Literal modifier Pivot beginning: Ack/Agree + Cue begin Pivot ending: Ack/Agree + Cue end

7.9% of the words in our corpus

Agustín Gravano INTERSPEECH 2007

8

Literal Modifierthat’s pretty much okay

BackchannelSpeaker 1: between the yellow mermaid and

the whaleSpeaker 2: okaySpeaker 1: and it is

Cue beginning discourse segmentokay we gonna be placing the blue moon

The Columbia Games CorpusFunction of Affirmative Cue Words

Agustín Gravano INTERSPEECH 2007

9

The Columbia Games CorpusFunction of Affirmative Cue Words

3 trained labelers Inter-labeler agreement:

Fleiss’ Kappa = 0.69 (Fleiss ’71) In this study we use the majority label for

each affirmative cue word. Majority label: label chosen by at least two of the

three labelers.

Agustín Gravano INTERSPEECH 2007

10

Identification of a discourse segment boundary function Segment beginning

vs. Segment end vs. No discourse segment boundary function

Identification of an acknowledgment function Acknowledgment vs. No acknowledgment

MethodTwo new classification tasks

Agustín Gravano INTERSPEECH 2007

11

ML Algorithm JRip: Weka’s implementation of the propositional

rule learner Ripper (Cohen ’95). We also tried J4.8, Weka’s implementation of the

decision tree learner C4.5 (Quinlan ’93, ’96), with similar results.

10-fold cross validation in all experiments.

MethodMachine Learning Experiments

Agustín Gravano INTERSPEECH 2007

12

IPU (Inter-pausal unit) Maximal sequence of words delimited by pause >

50ms.

Conversational Turn Maximal sequence of IPUs by the same speaker, with

no contribution from the other speaker.

MethodExperimental features

Agustín Gravano INTERSPEECH 2007

13

Text-based features Extracted from the text transcriptions. Lexical id; POS tags; position of word in IPU / turn; etc.

Timing features Extracted from the time alignment of the transcriptions. Word / IPU / turn duration; amount of overlap; etc.

Acoustic features {min, mean, max, stdev} x {pitch, intensity} Slope of pitch, stylized pitch, and intensity, over the whole word,

and over its last 100, 200, 300ms. Acoustic features from the end of the other speaker’s previous turn.

MethodExperimental features

Agustín Gravano INTERSPEECH 2007

14

ResultsDiscourse segment boundary function

Feature Set Error RateF-Measure

Begin End

Text-based 11.6 % .77 .30

Timing 11.3 % .73 .52

Acoustic 14.2 % .66 .19

Text-based + Timing 9.8 % .81 .53

Full set 9.6 % .81 .57

Baseline (1) 19.0 % .00 .00

Human labelers (2) 5.7 % .94 .71

(1) Majority class baseline: NO BOUNDARY.(2) Calculated wrt each labeler’s agreement with the majority labels.

Agustín Gravano INTERSPEECH 2007

15

ResultsAcknowledgment function

Feature Set Error Rate F-Measure

Text-based 8.3 % .94

Timing 11.0 % .92

Acoustic 17.2 % .87

Text-based + Timing 6.2 % .95

Full set 6.5 % .95

Baseline (1) 16.7 % .88

Human labelers (2) 5.5 % .98

(1) Baseline based on lexical identity: {huh, right } no ACK all other words ACK(2) Calculated wrt each labeler’s agreement with the majority labels.

Agustín Gravano INTERSPEECH 2007

16

Best-performing features

Discourse Segment Boundary Function

Acknowledgment Function

• Lexical identity• POS tag of the following word• Number and proportion of

succeeding words in the turn• Context-normalized mean

intensity

• Lexical identity• POS tag of preceding word• Number and proportion of

preceding words in the turn• IPU and turn length

Agustín Gravano INTERSPEECH 2007

17

ResultsClassification of individual words

Classification of each individual word into its most common functions. alright Ack/Agree, Cue Begin, Other mm-hm Ack/Agree, Backchannel okay Ack/Agree, Backchannel, Cue

Begin, Ack+CueBegin, Ack+CueEnd,

Other right Ack/Agree, Check, Literal Modifier yeah Ack/Agree, Backchannel

Agustín Gravano INTERSPEECH 2007

18

ResultsClassification of the word ‘okay’

Feature SetError Rate

F-MeasureAck /Agree

Back-channel

Cue Begin

Ack/Agree + Cue Begin

Ack/Agree + Cue End

Text-based 31.7 .76 .16 .77 .09 .33

Acoustic 40.2 .69 .24 .64 .03 .25

Text-based + Timing 25.6 .79 .31 .82 .18 .67

Full set 25.5 .80 .46 .83 .21 .66

Baseline (1) 48.3 .68 .00 .00 .00 .00

Human labelers (2) 14.0 .89 .78 .94 .56 .73

(1) Majority class baseline: ACK/AGREE.(2) Calculated wrt each labeler’s agreement with the majority labels.

Agustín Gravano INTERSPEECH 2007

19

Summary

Discourse/sentential distinction is insufficient for affirmative cue words in spoken dialogue.

Two new classification tasks: Detection of an acknowledgment function. Detection of a discourse boundary function.

Best performing ML models: Based on textual and timing features. Slight improvement when using acoustic features.

Agustín Gravano INTERSPEECH 2007

20

Further Work

Gravano et al, 2007On the role of context and prosody in the interpretation of ‘okay’.ACL 2007, Prague, Czech Republic, June 2007.

Benus et al, 2007The prosody of backchannels in American English. ICPhS 2007, Saarbrücken, Germany, August 2007.

Classification of Discourse Functions of Affirmative Words

in Spoken Dialogue

Agustín Gravano, Stefan Benus, JuliaJulia Hirschberg

Shira Mitchell, Ilia Vovsha

INTERSPEECH, Antwerp, August 2007INTERSPEECH, Antwerp, August 2007

Spoken Language Processing GroupSpoken Language Processing GroupColumbia UniversityColumbia University

Agustín Gravano INTERSPEECH 2007

22

alright mm-hm okay right uh-huh yeah Other Total

Ack / Agree 99 61 1137 114 18 808 133 2370

Backchannel 6 402 121 14 143 72 5 763

Cue Begin 89 0 548 2 0 2 0 641

Cue End 8 0 10 0 0 0 0 18

Pivot Begin 5 0 68 0 0 0 0 73

Pivot End 13 12 232 2 0 22 17 298

Back from Task 9 1 33 0 0 0 0 43

Check 0 0 6 53 0 1 8 68

Stall 1 0 15 1 0 2 0 19

Literal Modifier 9 0 29 1079 0 0 1 1118

? 56 27 235 10 3 65 11 407

Total 295 503 2434 1275 164 972 175 5818