Upload
mauricio-hopper
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Conversational Technologies1
Natural Language Processing August 23, 2007
SpeechTEK UniversityDeborah Dahl
Conversational Technologies
Conversational Technologies 2
Description of the Tutorial
An introduction to the principles of natural language processing and the role of natural language processing in current and future speech applications
9:00-9:15 Introduction: what is natural language
9:15-10:15 Part 1: Overview and Principles 10:15-10:45 (30 minute break) 10:45-12:00 Part 2: Detailed Examples
Conversational Technologies 3
Attendees
Backgrounds and goals
Conversational Technologies 4
Audience and Background
A general technical background. No natural language processing background
will be assumed, but experience developing speech applications would be helpful.
Conversational Technologies 5
What is Natural Language?
Natural language is the kind of language that’s used to communicate between people
Can be spoken, written or gestural (in the case of Sign Languages)
There are several thousand currently spoken human languages
Conversational Technologies 6
Why are We Interested in Natural Language?
Support for more natural and effective computer-human interactions by accommodating the ways that people already communicate
Conversational Technologies 7
Natural Language Processing
Natural language understanding Natural language generation Machine translation
Conversational Technologies 8
Part 1: Overview and Principles
Conversational Technologies 9
Goals
Understand what natural language is Learn about the most common techniques for
processing natural language Their strengths and weaknesses Understand where natural language
processing technology is headed in the future. Focus is on commercial applications
Conversational Technologies 10
Topics
What is natural language? Issues in spoken natural language and how to
handle them Statistical Language Models (SLM's) speech grammars with semantic tags Variability in expression, pronouns, and filling
multiple slots from a single utterance How emerging standards such as EMMA will
contribute to more sophisticated future applications
Recent topics in natural language research and how this research may eventually be utilized in future applications
Conversational Technologies 11
Natural Language Understanding
The task of automatically assigning meaning to language
Conversational Technologies 12
What natural language processing isn’t
Speech recognition, which turns the sounds of spoken language into the words of written language
Dialog management, which manages a natural language interaction between a user and a computer
Artificial intelligence, which studies how to provide intelligent capabilities to computers
Conversational Technologies 13
Assigning Meaning to Language
In most applications, the developer decides what the set of possible meanings is
Meanings can be simple or complex Language can be simple or complex Current commercial techniques can
Assign simple meanings to simple language Assign simple meanings to complex language
Research systems can handle more complex meanings and language, but no existing system can handle all meanings and all language for even one human language
Conversational Technologies 14
Examples of Complex Language
Shakespeare Religious texts The United States Constitution We don’t have to worry about assigning
meaning to these texts!
Conversational Technologies 15
Simple to Slightly More Complex Language
“yes” “New York” “call home” “a red t-shirt, size large” “I want to go from Philadelphia to New
York on Sunday, August 19” As language becomes more complex,
the more we need special techniques to process it
Conversational Technologies 16
Human Communication Process?
languageThought Thought
Person A Person B
Conversational Technologies 17
More Realistic Communication Process
languageThought 1
A thought somewhat similar to Thought 1
How should I express this?Is this something I really need to say?What does B already know?Why do I want to express this thought?Do I want to impress B?Might I offend B by saying this?What language should I use?
Should I believe this?Could A be lying or lacking credibility?If I think A is lying should I say so?Did I hear it right?Did I understand it?Why did Person A say that?
Person APerson B
Conversational Technologies 18
Issues in Natural Language
Variability of expression Infinite number of meanings that can be
expressed Infinite number of possible sentences in
a language Many ways to say the same thing The same thing can have different
meanings in different contexts
Conversational Technologies 19
What is a Meaning?
Many approaches to representing meanings in traditional linguistics and philosophy of language
Most widely used commercial representation is as a token or as a set of slot/value pairs (also called “key/value” or “attribute/value” pairs)
Often structured into a set of related slot/value pairs (for example, the fields of a VoiceXML <form>, or a traditional frame)
Conversational Technologies 20
Tokens
“my printer is printing horizontal bands and everything is printing in blue” “printer problem”
“I can’t connect to the internet” “internet problem”
Conversational Technologies 21
What is a Meaning? Slot/Value Pairs
I want to go from Chicago to New York on August 19 midafternoon on United
Form/frame – airline reservation Destination: New York Departure city: Chicago Departure date: August 19 Departure time: midafternoon Airline: United
Conversational Technologies 22
Information Available for Extracting Meaning
Used by today’s commercial systems Words of the utterance Word order Grammatical endings Specific grammar for the application Information about what previous instances of that
utterance have meantUsed by research systems and people
Prosody (intonation, pauses, loudness, stress, timing) General information about the language itself
(dictionaries, grammars, thesauri) Context of the utterance Information about the topic Facial expressions, gestures
Conversational Technologies 23
Traditional Tasks in Natural Language Understanding
(Recognition – speech, handwriting, OCR…)
Lexical lookup Part of speech tagging Sense disambiguation Syntactic parsing Semantic analysis Pragmatic analysis
Conversational Technologies 24
Problems with Traditional Approaches
Try to describe the full language and a broad set of meanings
For practical applications, it’s much easier to just write a small grammar for a specific application
Conversational Technologies 25
(Recognition – speech, handwriting, OCR…) Lexical lookup (part of recognition) Part of speech tagging – parts of speech not
used Sense disambiguation – not needed,
constrained application Syntactic parsing – syntactic structure used
indirectly Semantic analysis Pragmatic analysis
Natural Language Tasks in Commercial Speech Systems
}Done in parallel
Conversational Technologies 26
Extracting Meaning in Commercial Applications
Filling slots by using semantically tagged grammars (CFG’s)
Mapping complex utterances to categories (SLM’s)
Conversational Technologies 27
Semantically Tagged Grammars
A grammar defines what the recognizer can recognize (recognized strings)
Tags define return values for different recognized strings
Information used: words of the utterance and a special-purpose grammar
Conversational Technologies 28
Context-Free Grammar Formats
Represent what a speech recognizer can recognize
Example: Request PoliteWord + Action + Item (please open the door) Speech Recognition Grammar Specification
(SRGS) (ABNF and XML formats) Java Speech Grammar Format (JSGF) Nuance GSL Microsoft Speech Application Programmer’s
Interface (SAPI)
Conversational Technologies 29
Semantic Tags
Reduce variability of expression Assign return values to recognized strings W3C Semantic Interpretation for Speech
Recognition (SISR) JSGF tags SAPI tags IBM ECMAScript tags Nuance GSL
Conversational Technologies 30
Capabilities of Tag Formats
Assign tokens to strings (JSGF)Yeah yes Create key-value pairs (SAPI)
“to chicago” <destination>ord</destination>
Perform computations (SISR, IBM,GSL) “three days from now” August 26, 2007 “two medium and three large pizzas” 5
pizzas
Conversational Technologies 31
SISR Tags for “yes” and “no”
<rule id="yes"> <one-of>
<item>yes</item> <item>yeah<tag>yes</tag></item> <item><token>you bet</token><tag>yes</tag></item> <item xml:lang="fr-CA">oui<tag>yes</tag></item>
</one-of> </rule> <rule id="no">
<one-of> <item>no</item> <item>nope</item> <item><token>no way</token></item>
</one-of> <tag>no</tag>
</rule>
Conversational Technologies 32
GSL Token
DigitValue [ ([zero oh] one) { return (01) } ...]
“oh one” 01
Conversational Technologies 33
SISR Slot/Value
"I would like a small coca cola and three large pizzas with pepperoni and mushrooms.”
<rule id="order"> I would like a <ruleref uri="#drink"/> <tag>out.drink = new Object();
out.drink.liquid=rules.drink.type; out.drink.drinksize=rules.drink.drinksize;</tag> and <ruleref uri="#pizza"/> <tag>out.pizza=rules.pizza;</tag> </rule>
Conversational Technologies 34
GSL Slot/Value
;GSL 2.0; ColoredObject:public (Color Object) Color [ [red pink] { <color red> } [yellow canary] { <color yellow> } [green khaki] { <color green> } ] Object [ [truck car] { <object vehicle> } [ball block] { <object toy> } [shirt blouse] { <object clothing> } ]
Conversational Technologies 35
SAPI Slot-Value
<RULE name="elvis"> <L PROPNAME="artist"> <P VALSTR="elvis_presley">elvis
<O>presley</O></P> <P VALSTR="elvis_presley">the king</P> </L> </RULE>
Conversational Technologies 36
Problems with Tagged Grammars
Hard to maintain when complex Hard to anticipate all the variations in
how someone might say something Can use wildcards/garbage to ignore
parts of utterance Speech recognition suffers when
grammars are too complex Speech recognition suffers when
wildcards are used
Conversational Technologies 37
Statistical Language Models (SLM’s)
Speech recognition is based on statistical models, not grammars
In commercial systems, natural language processing is a process of classification, relatively coarse meaning extraction
Works well if goal is to extract very simple meanings
Conversational Technologies 38
Stages in SLM Processing
Ngram speech recognition: probabilities of word sequences, usually 2-3 words
Much more flexible (but less accurate) than a grammar
However, accuracy is not as critical with SLM’s because you don’t have to get every single word right
Text classification: given a text, assign it to categories based on training from previous texts
There are many algorithms for classification
Conversational Technologies 39
Problems with SLM’s
Less accurate than CFG’s Expensive to implement and maintain Require a lot of data for good
performance
Conversational Technologies 40
Tagged Grammars or SLM’s?
Deeply nested menus SLM’s Complex applications with many slots to
fill and precise meanings needed grammars
Can combine both approaches in one application Front-end SLM followed by grammar Prompt asks specific question to catch most
common tasks but has “other” category
Conversational Technologies 41
Other Combination Approaches
Use SLM technology to recognize but grammar to interpret
Rules combined with SLM’s Robust parsing Rules combined with wildcard
I want um make that a large pizza with pepperoni and onions
Conversational Technologies 42
Emerging Standards: EMMA
EMMA (Extensible Multi-Modal Annotation)
Developed by the World Wide Web Consortium Multimodal Interaction Working Group
An XML format for representing users’ inputs and the results of processing them
Conversational Technologies 43
How does EMMA relate to natural language understanding?
EMMA represents the results of a natural language understanding process
Conversational Technologies 44
EMMA Benefits (1)
EMMA’s standard format lets all kinds of EMMA producers (multimodal modality components) exchange results handwriting recognizers speech recognizers text classifiers face recognizers speaker identification and verification …
Conversational Technologies 45
EMMA Benefits (2)
Through “<derived-from>”, provides a way for “specialist” processing components to cooperate in processing a single input
Speechrecognition
Lexicallookup
Part ofSpeechtagging
ParsingSemantic analysis
Ngram speech recognition
Classification
Conversational Technologies 46
EMMA Example – (1) Annotation Elements
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma/"> <emma:info>
<application>airline</application></emma:info><emma:model>
<model class="airline"> <source></source> <destination></destination> <days></days> <meals></meals>
</model></emma:model><emma:model>
from philadelphia to boston and i want a vegetarian meal
Conversational Technologies 47
EMMA Example – (2) Annotation Attributes
<emma:interpretation
id="interp5
emma:start="1186519245101"
emma:mode="speech“
emma:end="1186519248391“
emma:confidence="0.03"
emma:function="dialog"
emma:duration="3290"
emma:uninterpreted="false“
emma:lang="en-US"
emma:verbal="true"
emma:dialog-turn=“1"
emma:tokens="from philadelphia to boston and i want a vegetarian meal "
emma:medium="acoustic"
emma:process="file://Microsoft Speech Recognizer 8.0 for Windows (English - US), SAPI5, Microsoft" >
/>
Conversational Technologies 48
EMMA Example (3) Application Semantics
<source>philadelphia </source><destination>boston</destination><meal>vegetarian</meal>
Conversational Technologies 49
Part 2: Detailed Examples
Conversational Technologies 50
SAPI XML Grammar Examples
Windows Speech Recognition (Vista) Office 2003 Speech Recognition Example – music player interface I’d like to hear Beethoven’s 5th
Please play Brandenburg Concertos by Bach
Play something by Elvis
Conversational Technologies 51
Canonicalizing Forms
<RULE name="elvis"> <L PROPNAME="artist"> <P VALSTR="elvis_presley">elvis
<O>presley</O></P> <P VALSTR="elvis_presley">the king</P> </L> </RULE>
Conversational Technologies 52
Canonicalizing Forms (2)
<RULE name="name"> <L PROPNAME="name"> <P VALSTR="opus 125">ninth <O>symphony</O></P> <P VALSTR="opus 92">seventh <O>symphony</O></P> <P VALSTR="opus 67">fifth <O>symphony</O></P> <P VALSTR="brandenburg_concertos">Brandenburg
Concertos</P> <P VALSTR="opus 55">third symphony</P> <P VALSTR="you ain't nothing but a hound dog">hound dog</P> <P VALSTR="anything">something</P> <P VALSTR="anything">anything</P> <P VALSTR="opus 3">symphony in d major <O>opus
3</O></P> </L></RULE>
Conversational Technologies 53
Disambiguating
<RULE name="jsbach"> <P PROPNAME="composer" VALSTR="johann_sebastian_bach"> <O> <L><P>J S </P>
<P>Johann Sebastian</P> </L> </O> <P>Bach</P> </P></RULE><RULE name="jcbach"> <P PROPNAME="composer" VALSTR="johann_christian_bach"> <L> <P>J C </P>
<P>Johann Christian</P> </L> <P>Bach</P> </P></RULE>
Conversational Technologies 54
SLM Examples
Meta-utterances for channel control I’m confused Speak louder please Could you say that again?
Conversational Technologies 55
Training Data
Find out how people ask these questions Manually tag them with their categoriesCategory:repeatcould you say that again pleasei didn't catch thatsorrypardon me?repeat that pleasesay that againwhat?Category:operatorI need to speak to a humanare there any humans I can talk to?please get me an operatorI want an operatoroperator pleaseI need an agent
Conversational Technologies 56
Use NGram Speech Grammar
Ngrams are sets of two or three words and the probabilities that they’ll occur together in that order
Much less constrained than CFG’s Less accurate Used in “How may I help you?”
applications, dictation systems, and research
Conversational Technologies 57
Use Text Classification Software
Uses training data to develop probabilities that a new text is in one of the training categories
Many algorithms and approaches to text classification
Similar to the technology used in spam filters, but input is speech
Conversational Technologies 58
Example
User says:Pardon me, I didn’t catch thatSpeech recognizer hears:party may i didn't catch that Classifier classifiesincrease_volume
0.4595725150090289
decrease_volume 0.0
slower 0.0
faster 0.0
confused 0.4447495899966607
repeat 0.567774973957669
operator 0.5163977794943222
Conversational Technologies 59
EMMA Text Input Example
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma/"> <emma:interpretation id="interp4 emma:duration="3038" emma:confidence="1.0" emma:process="file://Microsoft Speech Recognizer 8.0 for Windows (English - US), SAPI5, Microsoft" emma:medium="tactile" emma:verbal="true" emma:mode="keys" emma:start="1187040519583" emma:uninterpreted="false" emma:function="dialog" emma:dialog-turn="4" emma:end="1187040737446" emma:lang="en-US" emma:tokens="i'd like to go from boston to philadelphia on tuesday " > <source>boston</source> <destination>philadelphia</destination> <day>Tuesday</day> </emma:interpretation></emma:emma>
Conversational Technologies 60
EMMA: Classification Example
<emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma/"> <emma:interpretation id="interp4 emma:duration="3038" emma:confidence=“.5" emma:process=“tech-support-slm" emma:medium=“acoustic" emma:verbal="true" emma:mode=“voice" emma:start="1187040519583" emma:uninterpreted="false" emma:function="dialog" emma:dialog-turn="4" emma:end="1187040737446" emma:lang="en-US" emma:tokens=“my internet connection keeps going off " > <problem>internet connectivity</problem></emma:interpretation></emma:emma>
Conversational Technologies 61
Natural Language Research
Natural language processing is an active area of academic and industrial research
Topics studied include spoken dialog processing, text understanding, natural language generation, automatic translation, acquisition of natural language information such as words and grammars, information extraction, summarization and support for search
Conversational Technologies 62
Natural Language Research
Most interesting to this audience are topics such as
Broadening domains (sense disambiguation and parsing disambiguation)
Handling spoken dialog phenomena such as pronouns and ellipses
Handling speech errors such as hesitations, false starts Multimodal communication, such as integrating speech
and gestures Extracting information provided by prosody and other
suprasegmentals
The main academic organization is The Association for Computational Linguistics (www.aclweb.org)
Conversational Technologies 63
More Information: Websites
W3C Voice Browser WG SISRhttp://www.w3.org/TR/semantic-interpretation/ W3C Multimodal Interaction WG (EMMA)http://www.w3.org/TR/emma/ Association for Computational Linguistics (www.aclweb.org) Loquendo Café (for testing SISR grammars)http://www.loquendocafe.com Voxeo Prophecy Platform (for testing Nuance grammars) www.voxeo.com SAPI XML grammars (test with Windows Speech Recognition or
Office 2003 Microsoft 6.1 recognizer)http://www.microsoft.com/speech/SDK/51/sapi.chm Conversational Technologies http://www.conversational-technologies.com
Conversational Technologies 64
More Information: Books, Journals, Articles
“Natural Language Processing: the Next Steps” (September 2006)
http://www.speechtechmag.com/Articles/ReadArticle.aspx?
ArticleID=29474 Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and Speech
Recognition by Daniel Jurafsky and James H. Martin (2000) Computational Linguistics Natural Language Engineering