19
Agustín Gravano 1,2 Julia Hirschberg 1 (1) (1) Columbia University, New York, USA Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented Dialogue

Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Embed Size (px)

Citation preview

Page 1: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano1,2

Julia Hirschberg1

(1)(1) Columbia University, New York, USAColumbia University, New York, USA(2) Universidad de Buenos Aires, Argentina(2) Universidad de Buenos Aires, Argentina

Turn-Yielding Cuesin Task-Oriented Dialogue

Page 2: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 2

Interactive Voice Response Systems

• Quickly spreading.

• “Uncomfortable”, “awkward”.

• ASR+TTS account for most IVR problems.

• Other problems revealed.• Coordination of system-user exchanges.

• Long pauses after user turns; interruptions.

• Modeling turn-taking behavior should lead to improved system-user coordination.

Introduction

Page 3: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 3

Goal

• Learn when the speaker is likely to end her/his conversational turn.

• Find turn-yielding cues.• Cues displayed by the speaker when approaching a

potential turn boundary.

• This should improve the coordination of IVRs:• Speech understanding: Detect the end of the user’s turn.

• Speech generation: Display cues signalling the end of system’s turn.

Introduction

Page 4: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 4

Talk Outline

• Previous work• Material• Method• Results• Conclusions

Page 5: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 5

Previous Work on Turn-Taking

• Duncan 1972, 1973, 1974, inter alia.• Hypothesized 6 turn-yielding cues in face-to-face dialogue.• Conjectured a linear relation between the number of

displayed cues and the likelihood of a turn-taking attempt.

• Studies formalized and verified some of Duncan’s hypotheses. [For&Tho96; Wen&Sie03; Cut&Pea86; Wic&Cas01]

• Implementations of turn-boundary detection.• Simulations [Fer&al.02,03; Edl&al.05; Sch06; Att&al.08; Bau08]

• Actual systems: Let’s Go! [Rau&Esk08]

• Exploiting turn-yielding cues improves performance.

Page 6: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 6

Columbia Games Corpus

• 12 task-oriented spontaneous dialogues.• Standard American English.• 13 subjects: 6 female, 7 male.• Series of collaborative computer games.• No eye contact. No speech restrictions.• 9 hours of dialogue.• Manual orthographic transcription, alignment.• Manual prosodic annotations (ToBI).

Material

Page 7: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 7

Player 1: Describer Player 2: Follower

Material

Columbia Games Corpus

Page 8: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 8

Turn-Yielding Cues

• Cues displayed by the speaker when approaching a potential turn boundary.

Page 9: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 9

Method

• Smooth switch: Speaker A finishes her utterance; speaker B takes the turn with no overlapping speech.

• Trained annotators distinguished Smooth switches from Interruptions and Backchannels using a scheme based on Ferguson 1977, Beattie 1982.

Turn-Yielding Cues

• IPU (Inter Pausal Unit): Maximal sequence of words from the same speaker surrounded by silence ≥ 50ms.

Speaker A:

Speaker B:

IPU1 IPU2

IPU3

Hold Smooth switch

Page 10: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 10

• To find turn-yielding cues, we compare:• IPUs preceding Holds,

• IPUs preceding Smooth switches.

• ~200 features: acoustic, prosodic, lexical, syntactic.

Speaker A:

Speaker B:

Hold Smooth switchIPU1 IPU2

IPU3

Turn-Yielding Cues

Method

Page 11: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 11

1. Final intonation: • Falling (L-L%) or high-rising (H-H%).

2. Faster speaking rate.• Reduction of final lengthening.

3. Lower intensity level.4. Lower pitch level.5. Higher jitter, shimmer, NHR.

• Related to perception of voice quality.

6. Longer IPU duration (seconds and #words).

Individual Cues

Turn-Yielding Cues

Page 12: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 12

7. Textual completion (independent of intonation).(1) Manually annotated a portion of the data.

Labelers read up to the end of a target IPU (no right context), judged whether it could constitute a ‘complete’ utterance. 400 tokens. K=0.81.

(2) Trained an SVM classifier.19 lexical + syntactic features.Accuracy: 80%. Maj-class baseline: 55%. Human agreement: 91%.

(3) Labeled all IPUs in the corpus with the SVM model.

Individual Cues

Incomplete

Complete

Before smooth switches:

Before holds:

18%

82%47% 53%

(X2 test, p ~ 0)

Turn-Yielding Cues

Page 13: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 13

1. Final intonation: L-L% or H-H%.2. Faster speaking rate.3. Lower intensity level.4. Lower pitch level.5. Higher jitter, shimmer, NHR.6. Longer IPU duration.7. Textual completion.

Individual Cues

Turn-Yielding Cues

Page 14: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 14

Defining Presence of a Cue

• 2-3 representative features for each cue:Final intonation Abs. pitch slope over final 200ms, 300ms.

Speaking rate Syllables/sec, phonemes/sec over IPU.

Intensity level Mean intensity over final 500ms, 1000ms.

Pitch level Mean pitch over final 500ms, 1000ms.

Voice quality Jitter, shimmer, NHR over final 500ms.

IPU duration Duration in ms, and in number of words.

Textual completion Complete vs. incomplete (binary).

• Define presence/absence based on whether the value is closer to the mean before S or H.

Turn-Yielding Cues

Page 15: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 15

Turn-yielding cues:

1: Final intonation

2: Speaking rate

3: Intensity level

4: Pitch level

5: IPU duration

6: Voice quality

7: Completion

digit == cue present

dot == cue absent

Top Frequencies of Complex Cues

Page 16: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 16

Combined Cues

Number of cues conjointly displayed

Per

cent

age

of t

urn-

taki

ng a

ttem

pts

Turn-Yielding Cues

0%

10%

20%

30%

40%

50%

60%

70%

0 1 2 3 4 5 6 7

r 2 = 0.969

Page 17: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 17

Turn-Yielding Cues

IVR Systems

• After each IPU from the user:if estimated likelihood > thresholdthen take the turn

• To signal the end of a system’s turn:Include as many cues as possible in the system’s final IPU.

Page 18: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 18

Summary

• Study of turn-yielding cues.• Objective, automatically computable.• Combined cues.• Improve turn-taking decisions of IVR systems.

• Results drawn from task-oriented dialogues.• Not necessarily generalizable.• Suitable for most IVR domains.

• Interspeech 2009: Study of backchannel-inviting cues.

Page 19: Agustín Gravano 1,2 Julia Hirschberg 1 (1)Columbia University, New York, USA (2) Universidad de Buenos Aires, Argentina Turn-Yielding Cues in Task-Oriented

Agustín Gravano SIGdial 2009 19

Special thanks to…• Julia Hirschberg• Thesis Committee Members

• Maxine Eskenazi, Kathy McKeown, Becky Passonneau, Amanda Stent.

• Speech Lab at Columbia University• Stefan Benus, Fadi Biadsy, Sasha Caskey, Bob Coyne,

Frank Enos, Martin Jansche, Jackson Liscombe, Sameer Maskey, Andrew Rosenberg.

• Collaborators• Gregory Ward and Elisa Sneed German (Northwestern U);

Ani Nenkova (UPenn); Héctor Chávez, David Elson, Michel Galley, Enrique Henestroza, Hanae Koiso, Shira Mitchell, Michael Mulley, Kristen Parton, Ilia Vovsha, Lauren Wilcox.