Upload
agatha-shields
View
223
Download
2
Tags:
Embed Size (px)
Citation preview
Spoken Language Understanding for Conversational Dialog Systems
Michael McTearUniversity of Ulster
IEEE/ACL 2006 Workshop on Spoken Language TechnologyAruba, December 10-13, 2006
Overview
Introductory definitions Task-based and conversational dialog
systems Spoken language understanding
Issues for spoken language understanding Coverage Robustness
Overview of spoken language understanding Hand-crafted approaches Data-driven methods
Conclusions
Basic dialog system architecture
SpeechRecognition
DialogueManager
Backend
LanguageGeneration
Text to SpeechSynthesis
Audio
SpokenLanguage
Understanding
WordsSemantic
representation
ConceptsWords
Audio
HMMAcousticModel
N-GramLanguage
Model
Task-based Dialog Systems
Mainly interact with databases to get information or support transactions
SLU module creates a database query from user’s spoken input by extracting relevant concepts
System initiative: constrains user input Keyword / keyphrase extraction
User-initiative: less constrained input Call-routing: call classification with named entity
extraction Question answering
Conversational Dialog
AI (agent-based systems) e.g. TRIPS User can take initiative, e.g. raise new topic, ask for
clarification (TRIPS) More complex interactions involving recognition of the
user’s intentions, goals, beliefs or plans Deep understanding of the user’s utterance, taking into
account contextual information Information State Theory, Planning Theory, User
Modelling, Belief Modelling… Simulated conversation e.g. CONVERSE
Conversational companions, chatbots, help desk Does not require deep understanding SLU involves identifying system utterance type and
determining a suitable response
Defining Spoken Language Understanding
extracting the meaning from speech utterances
a transduction of the recognition result to an interpretable representation
Meaning (in human–computer interactive systems):
a representation that can be executed by an interpreter in order to change the state of the system
Bangelore et al. 2006
SLU for task based systems
a flight from Belfast to Malagauh I’d like uh um could you uh is there a flight
from Bel- uh Belfast to um Gran- I mean Malaga
I would like to find a flight from Pittsburgh to Boston on Wednesday and I have to be in Boston by one so I would like a flight out of here no later than 11 a.m.
Topic: FlightOrigen: BFSDestination: AGP
SLU for advanced conversational systems (TRIPS)
Interpretation requires intention recognitioncan we use a helicopter to get the people from:
Abyss (request to modify plan)Barnacle (include sub-goal and suggest
solution)Delta (extension of a solution)
Six possible interpretations with only change of city name
Requires reasoning about task and current context to identify most plausible interpretation
Requires more than concept spotting to identify structure and meaning of utterance as basis for reasoning
Role of syntax in deep understanding
List all employees of the companies who/which are based in the city centre
employees based in city centre companies based in city centre
I would like to know where to mail the check.
I would like you to mail the check to me
I mail checkto ?where
You mail checkto me
Stop word removal will transform both strings to “mail check”
SLU for simulated dialog
C: Are you still a friend of XXX?H: I am not sure if he has any real
friends. He has achieved a lot and has left a lot of people behind.
C: You really don’t like the guy - well, at least that’s definite - I thought most people were just indifferent or numb by now.
?
Coverage
Possible requirement:“The system should be able to understand
everything the user might say”
Predicting user input Analysis of corpora and iterative design of
hand-crafted grammars Use of carefully designed prompts to
constrain user input is constrained Learning grammar from data
Robustness
Characteristics of spontaneous spoken language Disfluencies and filled pauses – not just errors,
reflect cognitive aspects of speech production and interaction management
Output from speech recognition component Words and word boundaries not known with
certainty Recognition errors
Approaches Use of semantic grammars and robust parsing for
concepts spotting Data-driven approaches – learn mappings
between input strings and output structures
Developing the SLU component
Hand-crafted approaches Grammar development Parsing
Data-driven approaches Learning from data Statistical models rather than grammars Efficient decoding
Hand-crafting grammars
Traditional software engineering approach of design and iterative refinement
Decisions about type of grammar required Chomsky hierarchy Flat v hierarchical
representations Processing issues
(parsing) Dealing with ambiguity Efficiency
Parsing
FrameGeneration
DiscourseProcessing
DBQuery
parse tree
semantic frame
frame in context
SQL query
ASR
n-best list, word lattice, …
Semantic Grammar and Robust Parsing: PHOENIX (CMU/CU)
ASR
SemanticParser
word string
meaningrepresentation
The Phoenix parser maps input word strings on to a sequence of semantic frames. named set of slots, where the
slots represent related pieces of information.
each slot has an associated Context-Free Grammar that specifies word string patterns that match the slot
chart parsing with path pruning: e.g. path that accounts for fewer words is pruned
ASRUses finite state grammars as language models for recognition and semantic tags in the grammars for semantic parsing
Deriving Meaning directly from ASR output: VoiceXML
I would like a coca cola and three large pizzas with pepperoni and mushrooms
{ drink: "coke", pizza: { number: "3", size: "large", topping: [ "pepperoni", "mushrooms" ] }}
meaningrepresentation
Deep understanding
Requirements for deep understanding advanced grammatical formalisms syntax-semantics issues parsing technologies
Example: TRIPS Uses feature-based augmented CFG with
agenda-driven best-first chart parser Combined strategy: combining shallow
and deep parsing (Swift et al. )
Combined strategies: TINA (MIT)
Grammar rules include mix of syntactic and semantic categories
Context free grammar using probabilities trained from user utterances to estimate likelihood of a parse
Parse tree converted to a semantic frame that encapsulates the meaning
Robust parsing strategy Sentences that fail to parse are parsed using
fragments that are combined into a full semantic frame
When all things fail, word spotting is used
Problems with hand-crafted approaches
Hand-crafted grammars are not robust to spoken language input require linguistic and engineering
expertise to develop if grammar is to have good coverage and optimised performance
time consuming to develop error prone subject to designer bias difficult to maintain
Statistical modelling for SLU
SLU as pattern matching problem
Given word sequence W, find semantic representation of meaning M that has maximum a posteriori probability P(M|W)
)()|(maxarg)|(maxargˆ MPMWPWMPMMM
P(M): semantic prior model – assigns probability to underlying semantic structure
P(W|M): lexicalisation model – assigns probability to word sequence W given the semantic structure
Early Examples
CHRONUS (AT&T: Pieraccini et al, 1992; Levin & Pieraccini, 1995) Finite state semantic tagger ‘Flat-concept’ model: simple to train but does not
represent hierarchical structure HUM (Hidden Understanding Model) (BBN: Miller et al,
1995) Probabilistic CFG using tree structured meaning
representations Grammatical constraints represented in networks rather
than rules Ordering of constituents unconstrained - increases
robustness Transition probabilities constrain over-generation
Requires fully annotated treebank data for training
Using Hidden State Vectors (He & Young)
Extends ‘flat-concept’ HMM model Represents hierarchical structure (right-
branching) using hidden state vectors Each state expanded to encode stack of a push down
automaton Avoids computational tractability issues associated with
hierarchical HMMs Can be trained using lightly annotated data Comparison with FST model and with hand-
crafted SLU systems using ATIS test sets and reference parse results
Which flights arrive in Burbank from Denver on Saturday?
Problem with long-distance dependency between ‘Saturday’ and ‘arrive’‘Saturday’ associated with ‘FROMLOC’
Hierarchical model allows ‘Saturday’ to be associated with ‘ARRIVE’Also: more expressive, allows sharing of sub-structures
SLU Evaluation: Performance
Statistical models competitive with approaches based on handcrafted rules
Hand-crafted grammars better for full understanding and for users familiar with system’s coverage, statistical model better for shallow and more robust understanding for naïve users
Statistical systems more robust to noise and more portable
SLU Evaluation: Software Development“Cost of producing training data should be less than
cost of hand-crafting a semantic grammar” (Young, 2002)
Issues Availability of training data Maintainability Portability Objective metrics? e.g. time, resources, lines of code, … Subjective issues e.g. designer bias, designer control
over system Few concrete results, except …
HVS model (He & Young) can be robustly trained from only minimally annotated corpus data
Model is robust to noise and portable to other domains
Additional technologies
Named entity extraction
Rule-based methods: e.g. using grammars in form of regular expressions compiled into finite state acceptors (AT&T SLU system) – higher precision
Statistical methods e.g. HMIHY, learn mappings between strings and NEs – higher recall as more robust
Call routingQuestion Answering
Additional Issues 1
ASR/SLU coupling Post-processing results from ASR
noisy channel model of ASR errors (Ringger & Allen)
Combining shallow and deep parsing major gains in speed, slight gains in accuracy
(Swift et al.) Use of context, discourse history, prosodic
information re-ordering n-best hypotheses determining dialog act based on
combinations of features at various levels: ASR and parse probabilities, semantic and contextual features (Purver et al, Lemon)
Additional Issues 2
Methods for learning from sparse data or without annotation e.g. AT&T system uses ‘active learning’ (Tur
et al, 2005) to reduce effort of human data labelling – uses only those data items that improve classifier performance the most
Development tools e.g. SGStudio (Wang & Acero) – build semantic grammar with little linguistic knowledge
Additional Issues 3
Some issues addressed in poster session
Using SLU for: Dialog act tagging Prosody labelling User satisfaction analysis Topic segmentation and labelling Emotion prediction
Conclusions 1
SLU approach is determined by
type of application finite state dialog with single word
recognition frame based dialog with topic classification
and named entity extraction advanced dialog requiring deep
understanding simulated conversation, …
Conclusions 2
SLU approach is determined by
type of output required syntactic / semantic parse trees semantic frames speech / dialog acts, … intentions, beliefs, emotions, …
Conclusions 3
SLU approach is determined by
Deployment and usability issues applications requiring accurate
extraction of information applications involving complex
processing of content applications involving shallow processing
of content (e.g. conversational companions, interactive games)
Selected References
Bangalore, S., Hakkani-Tür, D., Tur, G. (eds), (2006) Special Issue on Spoken Language Understanding in Conversational Systems. Speech Communication 48.
Gupta, N., Tur, G., Hakkani-Tür, D., Bangalore, S., Riccardi, G., Gilbert, M. (2006) The AT&T Spoken Language Understanding System. IEEE Transactions on Speech and Audio Processing 14:1, 213-222.
Allen, JF, Byron, DK, Dzikovska, O, Ferguson, G, Galescu, L, Stent, A. (2001) Towards conversational human-computer interaction. AI Magazine, 22(4):27–35.
Jurafsky, D. & Martin, J. (2000) Speech and Language Processing, Prentice-Hall
Huang, X, Acero, A, Hon, H-W. (2001) Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice-Hall