HIGGINSA spoken dialogue system for investigating error handling techniques
Jens Edlund, Gabriel Skantze and Rolf Carlson
ScenarioUser: I want to go to the closest subway station.
System: Ok, to the closest subway station. Can you describe where you are now?
User: I have an ATM to my left and a pedestrian crossing in front of me.
System: Can you see some trees to your right and a white building in front of you?
User: Yes
System: Ok, take left after the large building which you have on your left and follow the street until you reach a crossing.
User: Ok, there is a bus station here.
System: That’s right. Take left again after the bus station.
Centre forSpeech Technology
TT
User utterance
User reaction/repair
Assume understanding
No recovery
Non-understanding
Assume understanding
Architecture
The Higgins Project• Instigated in 2003• Theoretical goal: Investigate error handling techniques for
collaborative dialogue systems• Practical goal: Build a system in which these can be tested
empirically• This poster presents the current stage of the project.
Error recovery(Non-understanding)
Error recovery•Map-task-like studies on human-human conversation using ASR in one direction:
•Results show that humans tend not to signal non-understanding:
•This leads to•Increased experience of task success•Faster recovery from non-understanding
•Skantze, G. (2003). Exploring human error handling strategies: implications for spoken dialogue systems.
Early error detection
Grounding
Late error detection
Error recovery(Misunderstanding)
Late error detectionThe need for late error detection is task dependent:• Sometimes not necessary:
• Sometimes reference handing is sufficient:
• For slots with multiple possible values, late error detection is necessary:
(These also exemplify misunderstanding error recovery.)
Grounding• The amount of feedback from the system should at
least depend on• Confidence of understanding• Consequence of misunderstanding
• The discourse modeller• Unifies assertions and tracks referents• Solves ellipses • Solves anaphora• Keeps track of who contributed which
information:
Early error detection
KTH LVSCR
Large-Vocabulary Probabilistic ASR
Machine-learned error detection
Rule-driven semantic/syntactic error detection
Rule-driven discourse error detection
• Which features could be used for detecting word level errors• How are they operationalised?• Initial tests with Memory-based and Transformation-based
learning suggest:• Utterance context• Lexical information• Word confidences• Discourse history
• Skantze, G. & Edlund, J. (2004). Early error detection on word level.
ASR post-processing
PICKERING:Robust
interpretation
• Rule-based semantic parsing• Finds partial results with largest coverage• Allows insertions inside phrases• Allows non-agreement if necessary
• Evaluation results show robustness against inserted content words
• Skantze, G. & Edlund, J. (2004). Robust interpretation in the Higgins spoken dialogue system.
ASR
Utterance interpretation
Discourse modelling
Generation Decision making
TTS
• Distributed modular system• Goals:
• A module for every task that is reasonably well-defined
• Separation of the domain specific (XML) and the domain independent (module code)
• Incremental processing allows for:• Rapid feedback• Flexible turn-taking• Faster processing
U1: I want to go to BostonS1: To London...U2: No, to Boston!
U1: How much is the big apartment?S1: The small apartment is […]U2a: No, the big apartment!U2b: And the big apartment?
U1: I have a large building on my leftS1: A large building on your rightU2a: No, on my left!U2b: And on my left
Misunderstanding
U1: There is a large red buildingS2: What material is the large building made
of?
O1: Do you see a wooden house in front of you?U1: YES CROSSING ADDRESS NOW
(I pass the wooden house now)O2: Can you see a restaurant sign?
Vocoder
User Operator
Listens Speaks
ReadsSpeaks ASR
GALATEA:Discourse modelling