33
Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology Aruba, December 10-13, 2006

Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Embed Size (px)

Citation preview

Page 1: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Spoken Language Understanding for Conversational Dialog Systems

Michael McTearUniversity of Ulster

IEEE/ACL 2006 Workshop on Spoken Language TechnologyAruba, December 10-13, 2006

Page 2: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Overview

Introductory definitions Task-based and conversational dialog

systems Spoken language understanding

Issues for spoken language understanding Coverage Robustness

Overview of spoken language understanding Hand-crafted approaches Data-driven methods

Conclusions

Page 3: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Basic dialog system architecture

SpeechRecognition

DialogueManager

Backend

LanguageGeneration

Text to SpeechSynthesis

Audio

SpokenLanguage

Understanding

WordsSemantic

representation

ConceptsWords

Audio

HMMAcousticModel

N-GramLanguage

Model

Page 4: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Task-based Dialog Systems

Mainly interact with databases to get information or support transactions

SLU module creates a database query from user’s spoken input by extracting relevant concepts

System initiative: constrains user input Keyword / keyphrase extraction

User-initiative: less constrained input Call-routing: call classification with named entity

extraction Question answering

Page 5: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Conversational Dialog

AI (agent-based systems) e.g. TRIPS User can take initiative, e.g. raise new topic, ask for

clarification (TRIPS) More complex interactions involving recognition of the

user’s intentions, goals, beliefs or plans Deep understanding of the user’s utterance, taking into

account contextual information Information State Theory, Planning Theory, User

Modelling, Belief Modelling… Simulated conversation e.g. CONVERSE

Conversational companions, chatbots, help desk Does not require deep understanding SLU involves identifying system utterance type and

determining a suitable response

Page 6: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Defining Spoken Language Understanding

extracting the meaning from speech utterances

a transduction of the recognition result to an interpretable representation

Meaning (in human–computer interactive systems):

a representation that can be executed by an interpreter in order to change the state of the system

Bangelore et al. 2006

Page 7: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

SLU for task based systems

a flight from Belfast to Malagauh I’d like uh um could you uh is there a flight

from Bel- uh Belfast to um Gran- I mean Malaga

I would like to find a flight from Pittsburgh to Boston on Wednesday and I have to be in Boston by one so I would like a flight out of here no later than 11 a.m.

Topic: FlightOrigen: BFSDestination: AGP

Page 8: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

SLU for advanced conversational systems (TRIPS)

Interpretation requires intention recognitioncan we use a helicopter to get the people from:

Abyss (request to modify plan)Barnacle (include sub-goal and suggest

solution)Delta (extension of a solution)

Six possible interpretations with only change of city name

Requires reasoning about task and current context to identify most plausible interpretation

Requires more than concept spotting to identify structure and meaning of utterance as basis for reasoning

Page 9: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Role of syntax in deep understanding

List all employees of the companies who/which are based in the city centre

employees based in city centre companies based in city centre

I would like to know where to mail the check.

I would like you to mail the check to me

I mail checkto ?where

You mail checkto me

Stop word removal will transform both strings to “mail check”

Page 10: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

SLU for simulated dialog

C: Are you still a friend of XXX?H: I am not sure if he has any real

friends. He has achieved a lot and has left a lot of people behind.

C: You really don’t like the guy - well, at least that’s definite - I thought most people were just indifferent or numb by now.

?

Page 11: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Coverage

Possible requirement:“The system should be able to understand

everything the user might say”

Predicting user input Analysis of corpora and iterative design of

hand-crafted grammars Use of carefully designed prompts to

constrain user input is constrained Learning grammar from data

Page 12: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Robustness

Characteristics of spontaneous spoken language Disfluencies and filled pauses – not just errors,

reflect cognitive aspects of speech production and interaction management

Output from speech recognition component Words and word boundaries not known with

certainty Recognition errors

Approaches Use of semantic grammars and robust parsing for

concepts spotting Data-driven approaches – learn mappings

between input strings and output structures

Page 13: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Developing the SLU component

Hand-crafted approaches Grammar development Parsing

Data-driven approaches Learning from data Statistical models rather than grammars Efficient decoding

Page 14: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Hand-crafting grammars

Traditional software engineering approach of design and iterative refinement

Decisions about type of grammar required Chomsky hierarchy Flat v hierarchical

representations Processing issues

(parsing) Dealing with ambiguity Efficiency

Parsing

FrameGeneration

DiscourseProcessing

DBQuery

parse tree

semantic frame

frame in context

SQL query

ASR

n-best list, word lattice, …

Page 15: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Semantic Grammar and Robust Parsing: PHOENIX (CMU/CU)

ASR

SemanticParser

word string

meaningrepresentation

The Phoenix parser maps input word strings on to a sequence of semantic frames. named set of slots, where the

slots represent related pieces of information.

each slot has an associated Context-Free Grammar that specifies word string patterns that match the slot

chart parsing with path pruning: e.g. path that accounts for fewer words is pruned

Page 16: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

ASRUses finite state grammars as language models for recognition and semantic tags in the grammars for semantic parsing

Deriving Meaning directly from ASR output: VoiceXML

I would like a coca cola and three large pizzas with pepperoni and mushrooms

{ drink: "coke", pizza: { number: "3", size: "large", topping: [ "pepperoni", "mushrooms" ] }}

meaningrepresentation

Page 17: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Deep understanding

Requirements for deep understanding advanced grammatical formalisms syntax-semantics issues parsing technologies

Example: TRIPS Uses feature-based augmented CFG with

agenda-driven best-first chart parser Combined strategy: combining shallow

and deep parsing (Swift et al. )

Page 18: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Combined strategies: TINA (MIT)

Grammar rules include mix of syntactic and semantic categories

Context free grammar using probabilities trained from user utterances to estimate likelihood of a parse

Parse tree converted to a semantic frame that encapsulates the meaning

Robust parsing strategy Sentences that fail to parse are parsed using

fragments that are combined into a full semantic frame

When all things fail, word spotting is used

Page 19: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Problems with hand-crafted approaches

Hand-crafted grammars are not robust to spoken language input require linguistic and engineering

expertise to develop if grammar is to have good coverage and optimised performance

time consuming to develop error prone subject to designer bias difficult to maintain

Page 20: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Statistical modelling for SLU

SLU as pattern matching problem

Given word sequence W, find semantic representation of meaning M that has maximum a posteriori probability P(M|W)

)()|(maxarg)|(maxargˆ MPMWPWMPMMM

P(M): semantic prior model – assigns probability to underlying semantic structure

P(W|M): lexicalisation model – assigns probability to word sequence W given the semantic structure

Page 21: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Early Examples

CHRONUS (AT&T: Pieraccini et al, 1992; Levin & Pieraccini, 1995) Finite state semantic tagger ‘Flat-concept’ model: simple to train but does not

represent hierarchical structure HUM (Hidden Understanding Model) (BBN: Miller et al,

1995) Probabilistic CFG using tree structured meaning

representations Grammatical constraints represented in networks rather

than rules Ordering of constituents unconstrained - increases

robustness Transition probabilities constrain over-generation

Requires fully annotated treebank data for training

Page 22: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Using Hidden State Vectors (He & Young)

Extends ‘flat-concept’ HMM model Represents hierarchical structure (right-

branching) using hidden state vectors Each state expanded to encode stack of a push down

automaton Avoids computational tractability issues associated with

hierarchical HMMs Can be trained using lightly annotated data Comparison with FST model and with hand-

crafted SLU systems using ATIS test sets and reference parse results

Page 23: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Which flights arrive in Burbank from Denver on Saturday?

Problem with long-distance dependency between ‘Saturday’ and ‘arrive’‘Saturday’ associated with ‘FROMLOC’

Hierarchical model allows ‘Saturday’ to be associated with ‘ARRIVE’Also: more expressive, allows sharing of sub-structures

Page 24: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

SLU Evaluation: Performance

Statistical models competitive with approaches based on handcrafted rules

Hand-crafted grammars better for full understanding and for users familiar with system’s coverage, statistical model better for shallow and more robust understanding for naïve users

Statistical systems more robust to noise and more portable

Page 25: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

SLU Evaluation: Software Development“Cost of producing training data should be less than

cost of hand-crafting a semantic grammar” (Young, 2002)

Issues Availability of training data Maintainability Portability Objective metrics? e.g. time, resources, lines of code, … Subjective issues e.g. designer bias, designer control

over system Few concrete results, except …

HVS model (He & Young) can be robustly trained from only minimally annotated corpus data

Model is robust to noise and portable to other domains

Page 26: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Additional technologies

Named entity extraction

Rule-based methods: e.g. using grammars in form of regular expressions compiled into finite state acceptors (AT&T SLU system) – higher precision

Statistical methods e.g. HMIHY, learn mappings between strings and NEs – higher recall as more robust

Call routingQuestion Answering

Page 27: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Additional Issues 1

ASR/SLU coupling Post-processing results from ASR

noisy channel model of ASR errors (Ringger & Allen)

Combining shallow and deep parsing major gains in speed, slight gains in accuracy

(Swift et al.) Use of context, discourse history, prosodic

information re-ordering n-best hypotheses determining dialog act based on

combinations of features at various levels: ASR and parse probabilities, semantic and contextual features (Purver et al, Lemon)

Page 28: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Additional Issues 2

Methods for learning from sparse data or without annotation e.g. AT&T system uses ‘active learning’ (Tur

et al, 2005) to reduce effort of human data labelling – uses only those data items that improve classifier performance the most

Development tools e.g. SGStudio (Wang & Acero) – build semantic grammar with little linguistic knowledge

Page 29: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Additional Issues 3

Some issues addressed in poster session

Using SLU for: Dialog act tagging Prosody labelling User satisfaction analysis Topic segmentation and labelling Emotion prediction

Page 30: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Conclusions 1

SLU approach is determined by

type of application finite state dialog with single word

recognition frame based dialog with topic classification

and named entity extraction advanced dialog requiring deep

understanding simulated conversation, …

Page 31: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Conclusions 2

SLU approach is determined by

type of output required syntactic / semantic parse trees semantic frames speech / dialog acts, … intentions, beliefs, emotions, …

Page 32: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Conclusions 3

SLU approach is determined by

Deployment and usability issues applications requiring accurate

extraction of information applications involving complex

processing of content applications involving shallow processing

of content (e.g. conversational companions, interactive games)

Page 33: Spoken Language Understanding for Conversational Dialog Systems Michael McTear University of Ulster IEEE/ACL 2006 Workshop on Spoken Language Technology

Selected References

Bangalore, S., Hakkani-Tür, D., Tur, G. (eds), (2006) Special Issue on Spoken Language Understanding in Conversational Systems. Speech Communication 48.

Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore, S., Riccardi, G., Gilbert, M. (2006) The AT&T Spoken Language Understanding System. IEEE Transactions on Speech and Audio Processing 14:1, 213-222.

Allen, JF, Byron, DK, Dzikovska, O, Ferguson, G, Galescu, L, Stent, A. (2001) Towards conversational human-computer interaction. AI Magazine, 22(4):27–35.

Jurafsky, D. & Martin, J. (2000) Speech and Language Processing, Prentice-Hall

Huang, X, Acero, A, Hon, H-W. (2001) Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice-Hall