Upload
vudung
View
224
Download
3
Embed Size (px)
Citation preview
Telecom Innovators’Web Seminar Series
Real-World Experience Adding Speech to IVR Solutions
with MRCP A webinar by NMS, ScanSoft and CapitalOne
Slide 2
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Agenda
Introduction to speech technologyDr. Rob Kassel, Senior Product Manager, ScanSoft, Inc.
MRCP and Natural AccessJack Chase, Director, Product Marketing, NMS
MRCP integration on the TelBert IVR Platform using NMS and ScanSoft
Eric Cunningham, Enterprise Architect, Capital One
Slide 3
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Introduction toSpeech Technology
Rob KasselSenior Product Manager
ScanSoft
Slide 4
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
The Need For Speech Recognition
Automation less costly than live agentsIncreases call handling capacity / reduces hold timesDTMF often is pressed into service
Numeric entry is easy… unless you are readingSpelling entry is more difficultMenus need to be enumerated, can’t be too longDeep menu structure becomes tiresomeAssignment inconsistent between vendors (e.g., voicemail)How do you enter “5 ½%” or “Albuquerque”?
With speech, questions are answered naturallyCaller satisfaction is higherFewer zero-outs leads to additional cost savings
Slide 5
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Speech Recognition Process
FeatureExtractionFeature
Extraction
SpeechDetectorSpeechDetector
ConfidenceScoring
ConfidenceScoring
Speech
Results
Grammar
GrammarCompilerGrammarCompiler
SystemDictionarySystem
DictionaryPronunciation
RulesPronunciation
Rules
PhonemeClassifierPhonemeClassifier Acoustic
ModelsAcousticModels
SearchSearch
Slide 6
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Speech Recognition Challenges
Speech can be difficult to decode, even for humansFixed, confusable vocabularies: “B-C-D-E-G-P-T-V-Z”Ambiguous boundaries: “It’s hard to wreck a nice beach!”
Speaker variability: dialect, volume, gender, etc.
Noise rejection: hands-free, mobile, telematics
Out-of-vocabulary rejection & confidence measures
Processor and memory demands
Slide 7
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Speech Recognition: State of the Art
Callers speak naturally in directed dialogs
Million-word vocabularies: stocks, names, addresses
Open-ended responses, coupled with language understanding: “How may I help you?”
High accuracy, infrequent confirmation
Transaction completion rate over 90% is typical
Automatically adapt to caller population andchannel characteristics
Slide 8
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
The Need For Text-To-Speech
Professional recordings costly and time-consuming
Large output vocabularies common (e.g. city names)
Word concatenation is difficult to do wellOften used for numeric outputCan sound mechanical; irritating when frequent
Some applications defy recordings (e.g. messaging)
Slide 9
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Text-To-Speech Process
PronunciationGeneration
PronunciationGeneration
TextNormalization
TextNormalization
Text
Speech
SystemDictionarySystem
Dictionary
PronunciationRules
PronunciationRules
ProsodyGeneration
ProsodyGeneration
VoiceDatabase
VoiceDatabase
UnitSelection
UnitSelection
Concatenateand Smooth
Concatenateand Smooth
Slide 10
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Text-To-Speech ChallengesText Normalization
Numerics: “12535” (number / zip code), “2x4”Abbreviations: “OR” (or / Oregon), “Dr. Jones on Elm Dr.”Acronyms: “IBM is listed on NASDAQ”Evolving usage: “CUL8R”
Pronunciation GenerationHomographs: “minute” (60 seconds / tiny)Vowel reduction: “he came to town” vs. “he came to”
Prosody GenerationPhrasing: “he is physically and mentally exhausted”Emphasis: “Are you flying tomorrow?”Emotion: upbeat vs. serious, calming vs. urgency
Slide 11
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Text-to-Speech: State of the ArtNatural sounding output, no more “drunken Swede”
Seamlessly mix dynamic data with recorded prompts
Accurate pronunciation, including proper names
A variety of voices to choose from
Custom voices to maintain brand identity
Listen here…http://www.scansoft.com/speechworks/realspeak/teleco/
Slide 12
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
MRCP and Natural AccessJack Chase
Director, Product MarketingNMS
Slide 13
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
What Is MRCP v1?
Speech servers are connected by VoIP to IVR servesStandard API for ASR and TTS Easy to reconfigure system as needs changeEasy to implement redundancy
Control: MRCP/ RTSP/ TCP/ IP
Speech: G.711/ RTP/ UDP/ IP MRCP ServerMRCP Server
Speech
ServersIP
PSTN IVR
ServersIVR
ServersSpeech
Servers
Slide 14
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Natural Access and MRCP
Service Managers, Libraries
Driver Driver Driver IPC
Call Control
CX Boards AG Boards CG Boards PacketMedia HMP
SNMP
HMP
PCI PCI PCI IP
IVRServices
PSTNTrunking
Fusion(VoIP)
Conferencing
FaxServices
Universal Speech Access
(MRCP)
OAM
VideoAccess
Slide 15
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Universal Speech Access Makes Speech Integration Easy
Slide 16
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Current Support for Universal Speech Access
Loquendo ASR LSS 6.0N/AASRLoquendo
SWMS 3.1OSR 3.0
OMS 2.0.1OSR 2.0
ASRScanSoft
SWMS 3.1RealSpeak 4.0
OMS 2.0.1Speechify 2.0
TTSScanSoft
teliSpeech 1.0 SP4Philsoft 3.2ASRTelisma
Vocalizer 3.0.8Vocalizer 3.0TTSNuance
MRCP Server SP7 Nuance 8.5
MRCP Server SP5 Nuance 8.5
ASRNuance
Universal Speech Access 1.1
Universal Speech Access 1.0
TypeVendor
Slide 17
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
What’s Next for MRCP?MRCP v2
draft-ietf-speechsc-mrcpv2-06, Feb 20, 2005
Adds SIP/ SDP for session setupReplaces RTSP
Adds support for speaker verification
Little deployment yet
NMS will update Universal Speech Access when deployments occur
Slide 18
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
MRCP Integration on the TelBert IVR Platform
using NMS and ScanSoftEric Cunningham
Enterprise ArchitectCapital One
Slide 19
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Agenda
Why use MRCPMain business drivers for voice enablementOverview of architectureLessons learned
Slide 20
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Why Use MRCP
Capital One has built its own IVR system (TelBert)Internally built and maintainedLinux based C/C++ system5000+ ports in productionHandles nearly 100% of all in-bound credit card callsBusiness wants to have speech enabled applications
Leading speech vendors are embracing MRCP for integrationCentralizes automated speech recognition (ASR) and text-to-speech (TTS) resources in the networkStandards based protocol, allowing multi-vendor interoperability
continued
Slide 21
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Why Use MRCP (cont'd)
Benefits to Capital OneMRCP allows integration with leading vendors and avoids vendor lock-in
NMS APIs simplify the learning of MRCP and RTP protocols and integration; accelerated the adoption of MRCP into TelBert
Migration from AG 4000 to CG 6000 – clean evolution
CG 6000 provides on-board Ethernet and T1 terminations; eliminates host based processing of RTP data
Current AG 4000 code compatible with CG 6000; quick upgrade to existing platform
Slide 22
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Overview of TelBert ArchitectureWhere applications run. The control what grammars are used, processing of results, and user prompting
Where NMS libraries are integrated. Single, state-machine model handling 184 ISDN callers, Voice processing commands, and the new ASR/TTS commands via Universal Speech Access.
ScanSoft has their MRCP server (SWMS) co-located on the same machine as the OSR and RealSpeak servers.
Note: This means that load balancing and failover is done by TelBert, not the MRCP serer
Private network (100MB switch) to encapsulate the RTP traffic.
Slide 23
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Main Business Drivers for Voice Enablement
Improve customer experienceProvide both touch-tone and speech-enabled handlingSwitch between modes
Provide additional automated customer servicingAutomating time-consuming call center activitiesFrees call center representatives for more complex tasks
Basically, all of the standard reasons a business wants to start using voice recognition technologies
Slide 24
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Lessons Learned
NMS Universal Speech Access and Fusion APIs front-end the complexity of RTSP, MRCP, and RTP protocols
You still need to read the specifications to troubleshoot problems
You need to understand the specifications in order to talk to vendors you are integrating with (ScanSoft)
continued
Slide 25
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Lessons Learned (cont'd)
Example: NMS codeif( (nRtn = saiCreateSynthesizer(m_cta_context_handle, m_stRtpEndpointTts, m_ob_locate.get_server(),
TELBERT_CONTEXT_TTS, &m_stTtsHd)) != SUCCESS){
……}
RTSP/MRCP sniffer trace (what the MRCP server sees)
RequestSETUP rtsp://NEWBOX36/synthesizer/ RTSP/1.0CSeq: 7Transport: RTP/AVP;unicast;destination=10.87.204.8;client_port=3000-3001Content-Type: application/sdpContent-Length: 167
v=0o=139112752 0 127.0.0.1s=nms speechc=IN IP4 0.0.0.0t=0 0m=audio 3000 RTP/AVP 0 96a=rtpmap:0 pcmu/8000a=rtpmap:96 telephone-event/8000
ResponseRTSP/1.0 200 OKCSeq: 7Session: RQKCRCSPWX0000000368fgJiuWPnxzTransport: RTP/AVP;unicast;client_port=3000-3001Content-Length: 215Content-Type: application/sdp
v=0o=- RQKCRCSPWX0000000368fgJiuWPnxz RQKCRCSPWX0000000368fgJiuWPnxz IN IP4 10.87.204.36s=SpeechWorks OpenSpeech Media Server version 2.0c=IN IP4 0.0.0.0t=0 0m=audio 3000 RTP/AVP 0a=rtpmap: 0 pcmu/8000
Slide 26
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Lessons Learned (cont'd)
Load BalancingThe MRCP specification allows for the MRCP server to coordinate where to setup the RTP connection with the ASR/TTS server; allows performance of load balancing activities
Currently ScanSoft’s MRCP server does not provide load balancing, but their engineers are looking at providing this
Until then, your IVR will have to create its own load balancing and failover logic for the ASR/TTS server farm
continued
Slide 27
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Lessons Learned (cont'd)
Lots of specifications to be learned and not just by the integration team
Application Interface TeamApplication Developers
http://www.w3.org/TR/nl-spec/Natural Language Semantics Markup Language for Speech Interface Framework (nl-spec) Specification
Application Interface TeamApplication Developers
http://www.w3.org/TR/2004/REC-speech-grammar-20040316/
Speech Recognition Grammar Specification
Integration Teamftp://ftp.rfc-editor.org/in-notes/std/std64.txt
Real-Time Protocol (RTP) Specification
Integration Teamftp://ftp.rfc-editor.org/in-notes/rfc2326.txt
Real Time Streaming Protocol (RTSP) Specification
Integration TeamApplication Interface Team
ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-shanmugham-mrcp-05.txt
Media Resource Control Protocol (MRCP) Specification
Who needs to understand/ be aware of this spec
LocationSpecification
Slide 28
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Thank You!Note:
PDF will be posted todayRecorded version posted in a few days
Slide 29
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
Please use the text messaging feature to send your questions
Q & A Session
Slide 30
TelecomInnovators’
Web SeminarSeries
TelecomInnovators’
Web SeminarSeries
For more information…Contact
Dr. Rob Kassel, Senior Product Manager, ScanSoft+1 617 428 4444; [email protected]
Jack Chase, Director, Product Marketing, NMS +1 508 271 1109; [email protected]
Eric Cunningham, Enterprise Architect, Capital One+1 804 855 3597; [email protected]
Upcoming EventsVON Europe
May 23 – 26Stockholm, SwedenBooth # 1040
Upcoming WebinarsJune: Ready for Mainstream: AdvancedTCA Solutions Become RealityJuly: “Transforming Speech Applications With NMS' new VoiceXML Server”