View
1.183
Download
0
Category
Tags:
Preview:
Citation preview
Alphabet Soup: Sorting out Emerging
Telephony and Speech StandardsKen Rehor
Co-founder, VoiceXML Forum Founder, Harken Systems, LLC
• Voice Web Telephony Architecture
• Benefits of Open Interfaces, Protocols, Languages
• Status and Deployment
Components of a Voice Solution
Voice Processing and Telephony
Middleware
API Layer
Telephony Interface
Dialog Layer
ASR TTS AudioDTMF Media
Voice Application
Application Server
Application§ Logic§ Prompts§ Grammars
Database Database
Transaction Server
Break out of the monolithic systems trap
• Modernize existing proprietary applications without starting from scratch
• Develop new apps, and incrementally add features in a modular fashion
• Advantages• Faster development
• Less expensive to develop and maintain
• Path towards modern, open standards architecture
Internet or
Intranet
Phone user
Web user
HTTP
HTTP
App server
• Application logic• Content and data• Transaction processing• Database interface
<html>
VoiceXMLplatform
TDM orVoIP
Voice / Web Application Architecture
• Grammars• Audio / SSML• Scripts
• Images• Media• Scripts
HTTP
.wav
<grxml>
Any phone
<vxml>
© 2008 Ken Rehor. All Rights Reserved.
Scripts
HTTPHTTPS
HTTPHTTPS
VoIPGateway
VoiceXMLBrowser
Telephony Control Interface: SIP, etc.Dialog Control Interface: SIP, MSCP, etc.
DialogControlInterface
VoiceXMLApplication
CCXML VXML
Conference/MediaServer
CCXMLBrowser
PhoneNetwor
k
Caller
CCXMLCall ControlApplication
Media ControlInterface
SOAP
MRCP Client
Audio
DTMF
GRXML
Scripts
Audio
MediaMixer /Server
T1 / E1ISDNSS7
SIP
RFC 2833
RTP
TTS
Server
M R C P
SIV
Server
ASR
Server
GRXMLSSML
GRXML
G.711, WAV, .au, mp3, etc.
SIP NetannMSCMLMOML / MSMLMSCPDMSPMGCPetc.
Telephony ControlInterface
VoiceXML 2.0VoiceXML 2.1ECMAScript 262
MRCP v1MRCP v2
SSML
Voice App Architecture and Standards
Why Standards?
• Grow an industry
• Interoperation
• Lower cost of goods
• Innovation and evolution
• Disrupt proprietary markets
• Ecosystems develop around every open interface
• Everyone benefits through joint work: reduces design effort
• Promote technology to the next level
• Sell more due to larger market
Open Interfaces Enable Innovation
• Migration: Proprietary, hardware-based solutions to Proprietary software-based solutions to Open Software
• New Business Models• e.g. Voice Service Provider: Separate application from
Telephony/Speech resources
• Separation of concerns
• Evolve components without starting from scratch
• Concentrate on innovation rather than duplication
• Move up the value chain
• Leverage open, known technology• Web protocols, servers, networks, development tools, expertise
• Distributed Client-Server Architecture• Enables new business models and efficient resource utilization
• Standard/Common high-level language• Designed for voice dialogs and telephony
• Phone number mapped to URL• Phone number associated with URL of voice application
Voice Web Fundamental Concepts
Visual vs. Voice markup
Web app UI
• HTML – Structure• Layout
• Input declaration
• Transitions
• Images
• Audio
• Video
• Text
• Scripts
Voice Web app UI
• VoiceXML – Structure• Dialog flow
• Input declaration
• Transitions
• Audio
• Video, Images
• Text (for TTS)
• Scripts
Protocols
Web applications• HTTP, HTTPS
• SIP
• RTP
• SOAP
• WSDL
• …
Voice Web applications• HTTP, HTTPS
• SIP
• RTP
• SOAP
• WSDL
• …
The Telecom Trilogy
• User Interaction• Voice user interface• Multimodal user interface
• Switching• Connecting endpoints• Moving connections• Signaling
• Media processing• ASR, SIV, TTS, Record / Play• Conferencing, Mixing, Echo cancellation• Endpointing, Coding / Format conversion
Ecosystem at Every Interface
AudioEngine
ASREngine
<grxml>
TTSEngine
<ssml>
VoiceXML browser
<vxml>Application Server
Code Generator
GUI Tool / SDEProprietary dialog XML
.wav
<xml>
VoiceXML, GRXML, SSML,Scripts, etc.
MRCP client
MRCP server
VSP:Telephony, Speech, apps
• Application Developers
• VUI designers
• Voice platforms
• Tools
• Service Providers
• Application Servers
Industry Standards – Global Adoption
• VoiceXML Forum • Nearly 100 member organizations worldwide• Platform Certification• Speaker Biometrics• Collaborating with W3C, ANSI, ISO
• W3C Speech Interface Framework• VoiceXML 2.0/2.1, SRGS 1.0, SSML 1.0, CCXML 1.0• SISR 1.0, PLS 1.0• Coming: VoiceXML 3.0, SSML 1.1
• IETF• Media Resource Control Protocol (MRCPv2)• SIP / VoiceXML media server spec (MEDIACTRL)
W3C Speech Interface Framework
• VoiceXML
• SRGS
• SSML
• Semantic Interpretation
• Call Control
• Pronunciation Lexicon
• SCXML
For more information, see:
W3C Voice Browser Working Group http://www.w3.org/Voice/
W3C Speech Interface Framework
• W3C VoiceXML 2.0• W3C Recommendation March 2004• Widely implemented
• Approximately 4 dozen platforms• Many service providers worldwide• Many tools, countless applications
• VoiceXML Forum Platform Certification Program
• 24 certified platforms, more coming
• W3C VoiceXML 2.1• W3C Recommendation April 2007• Most platform vendors support it• Certification Program and Test suite in progress
• W3C VoiceXML 3.0• Spec in early stages of development
W3C Speech Interface Framework
• Call Control W3C CCXML 1.0• W3C Working Draft Jan 2007
• Implementations increasing
• Pronunciation Lexicon W3C PLS 1.0• Used to describe phonetic information for use in
speech recognition and synthesis
• 2nd Last Call Working Draft Oct 2006
W3C Speech Interface Framework• Input grammars SRGS 1.0
• W3C Recommendation March 2004
• Widely implemented
• Output formatting SSML 1.0, 1.1• SSML 1.0 - W3C Recommendation March 2004
• Widely implemented, yet minor real support (most TTS engines ignore the SSML instructions)
• SSML 1.1 – W3C Working Draft June 2007
• Adds support for Asian, Eastern European, and Middle Eastern languages
• Semantic Interpretation for Speech Recognition SISR 1.0• W3C Recommendation April 2007
• Implementations increasing
• Required for new Platform Certification
What's Next?
• VoiceXML 3.0• Video
• Multimodal integration
• Speaker Biometrics
• Cleaner Modularity
• SCXML 1.0• State Chart Markup Language
• Separate logic from presentation • W3C Working Draft Feb 2007
• Several implementations available
• Commercial, educational, open source
Web / Voice ++
• Standards enable easy integration with other technologies
• Re-use web technologies
• Multiple modalities / channels: Voice +• SMS
• Web
• Chat
• Mobile
• Voice Control / Search
"Integration" / "Mashups" / "SOA"
• Modular architecture
• Open interfaces
• Common languages, protocols
• Combine data, services, modalities
• Easy adoption of new technologies and features• Video
• Multimodal
• Biometrics
• Telephony
POTS
PSTN orVoIP
Mashups, SOA, Multi-Channel/Modal
Mobile web
VXMLBrowser
Voice UIApp
Mobile IP
IP
PC
Presentationlogic
Businesslogic
Mobile UIApp
Web UIApp
http://www.kenrehor.com
http://www.voicexml.org
http://www.w3.org/voice
For more information:
An eComm 2008 presentation –
http://eCommMedia.com for more
Recommended