24
Alphabet Soup: Sorting out Emerging Telephony and Speech Standards Ken Rehor Co-founder, VoiceXML Forum Founder, Harken Systems, LLC

Ken Rehor's presentation at eComm 2008

Embed Size (px)

Citation preview

Page 1: Ken Rehor's presentation at eComm 2008

Alphabet Soup: Sorting out Emerging

Telephony and Speech StandardsKen Rehor

Co-founder, VoiceXML Forum Founder, Harken Systems, LLC

Page 2: Ken Rehor's presentation at eComm 2008

• Voice Web Telephony Architecture

• Benefits of Open Interfaces, Protocols, Languages

• Status and Deployment

Page 3: Ken Rehor's presentation at eComm 2008

Components of a Voice Solution

Voice Processing and Telephony

Middleware

API Layer

Telephony Interface

Dialog Layer

ASR TTS AudioDTMF Media

Voice Application

Application Server

Application§ Logic§ Prompts§ Grammars

Database Database

Transaction Server

Page 4: Ken Rehor's presentation at eComm 2008

Break out of the monolithic systems trap

• Modernize existing proprietary applications without starting from scratch

• Develop new apps, and incrementally add features in a modular fashion

• Advantages• Faster development

• Less expensive to develop and maintain

• Path towards modern, open standards architecture

Page 5: Ken Rehor's presentation at eComm 2008

Internet or

Intranet

Phone user

Web user

HTTP

HTTP

App server

• Application logic• Content and data• Transaction processing• Database interface

<html>

VoiceXMLplatform

TDM orVoIP

Voice / Web Application Architecture

• Grammars• Audio / SSML• Scripts

• Images• Media• Scripts

HTTP

.wav

<grxml>

Any phone

<vxml>

Page 6: Ken Rehor's presentation at eComm 2008

© 2008 Ken Rehor. All Rights Reserved.

Scripts

HTTPHTTPS

HTTPHTTPS

VoIPGateway

VoiceXMLBrowser

Telephony Control Interface: SIP, etc.Dialog Control Interface: SIP, MSCP, etc.

DialogControlInterface

VoiceXMLApplication

CCXML VXML

Conference/MediaServer

CCXMLBrowser

PhoneNetwor

k

Caller

CCXMLCall ControlApplication

Media ControlInterface

SOAP

MRCP Client

Audio

DTMF

GRXML

Scripts

Audio

MediaMixer /Server

T1 / E1ISDNSS7

SIP

RFC 2833

RTP

TTS

Server

M R C P

SIV

Server

ASR

Server

GRXMLSSML

GRXML

G.711, WAV, .au, mp3, etc.

SIP NetannMSCMLMOML / MSMLMSCPDMSPMGCPetc.

Telephony ControlInterface

VoiceXML 2.0VoiceXML 2.1ECMAScript 262

MRCP v1MRCP v2

SSML

Voice App Architecture and Standards

Page 7: Ken Rehor's presentation at eComm 2008

Why Standards?

• Grow an industry

• Interoperation

• Lower cost of goods

• Innovation and evolution

• Disrupt proprietary markets

• Ecosystems develop around every open interface

• Everyone benefits through joint work: reduces design effort

• Promote technology to the next level

• Sell more due to larger market

Page 8: Ken Rehor's presentation at eComm 2008

Open Interfaces Enable Innovation

• Migration: Proprietary, hardware-based solutions to Proprietary software-based solutions to Open Software

• New Business Models• e.g. Voice Service Provider: Separate application from

Telephony/Speech resources

• Separation of concerns

• Evolve components without starting from scratch

• Concentrate on innovation rather than duplication

• Move up the value chain

Page 9: Ken Rehor's presentation at eComm 2008

• Leverage open, known technology• Web protocols, servers, networks, development tools, expertise

• Distributed Client-Server Architecture• Enables new business models and efficient resource utilization

• Standard/Common high-level language• Designed for voice dialogs and telephony

• Phone number mapped to URL• Phone number associated with URL of voice application

Voice Web Fundamental Concepts

Page 10: Ken Rehor's presentation at eComm 2008

Visual vs. Voice markup

Web app UI

• HTML – Structure• Layout

• Input declaration

• Transitions

• Images

• Audio

• Video

• Text

• Scripts

Voice Web app UI

• VoiceXML – Structure• Dialog flow

• Input declaration

• Transitions

• Audio

• Video, Images

• Text (for TTS)

• Scripts

Page 11: Ken Rehor's presentation at eComm 2008

Protocols

Web applications• HTTP, HTTPS

• SIP

• RTP

• SOAP

• WSDL

• …

Voice Web applications• HTTP, HTTPS

• SIP

• RTP

• SOAP

• WSDL

• …

Page 12: Ken Rehor's presentation at eComm 2008

The Telecom Trilogy

• User Interaction• Voice user interface• Multimodal user interface

• Switching• Connecting endpoints• Moving connections• Signaling

• Media processing• ASR, SIV, TTS, Record / Play• Conferencing, Mixing, Echo cancellation• Endpointing, Coding / Format conversion

Page 13: Ken Rehor's presentation at eComm 2008

Ecosystem at Every Interface

AudioEngine

ASREngine

<grxml>

TTSEngine

<ssml>

VoiceXML browser

<vxml>Application Server

Code Generator

GUI Tool / SDEProprietary dialog XML

.wav

<xml>

VoiceXML, GRXML, SSML,Scripts, etc.

MRCP client

MRCP server

VSP:Telephony, Speech, apps

• Application Developers

• VUI designers

• Voice platforms

• Tools

• Service Providers

• Application Servers

Page 14: Ken Rehor's presentation at eComm 2008

Industry Standards – Global Adoption

• VoiceXML Forum • Nearly 100 member organizations worldwide• Platform Certification• Speaker Biometrics• Collaborating with W3C, ANSI, ISO

• W3C Speech Interface Framework• VoiceXML 2.0/2.1, SRGS 1.0, SSML 1.0, CCXML 1.0• SISR 1.0, PLS 1.0• Coming: VoiceXML 3.0, SSML 1.1

• IETF• Media Resource Control Protocol (MRCPv2)• SIP / VoiceXML media server spec (MEDIACTRL)

Page 15: Ken Rehor's presentation at eComm 2008

W3C Speech Interface Framework

• VoiceXML

• SRGS

• SSML

• Semantic Interpretation

• Call Control

• Pronunciation Lexicon

• SCXML

For more information, see:

W3C Voice Browser Working Group http://www.w3.org/Voice/

Page 16: Ken Rehor's presentation at eComm 2008

W3C Speech Interface Framework

• W3C VoiceXML 2.0• W3C Recommendation March 2004• Widely implemented

• Approximately 4 dozen platforms• Many service providers worldwide• Many tools, countless applications

• VoiceXML Forum Platform Certification Program

• 24 certified platforms, more coming

• W3C VoiceXML 2.1• W3C Recommendation April 2007• Most platform vendors support it• Certification Program and Test suite in progress

• W3C VoiceXML 3.0• Spec in early stages of development

Page 17: Ken Rehor's presentation at eComm 2008

W3C Speech Interface Framework

• Call Control W3C CCXML 1.0• W3C Working Draft Jan 2007

• Implementations increasing

• Pronunciation Lexicon W3C PLS 1.0• Used to describe phonetic information for use in

speech recognition and synthesis

• 2nd Last Call Working Draft Oct 2006

Page 18: Ken Rehor's presentation at eComm 2008

W3C Speech Interface Framework• Input grammars SRGS 1.0

• W3C Recommendation March 2004

• Widely implemented

• Output formatting SSML 1.0, 1.1• SSML 1.0 - W3C Recommendation March 2004

• Widely implemented, yet minor real support (most TTS engines ignore the SSML instructions)

• SSML 1.1 – W3C Working Draft June 2007

• Adds support for Asian, Eastern European, and Middle Eastern languages

• Semantic Interpretation for Speech Recognition SISR 1.0• W3C Recommendation April 2007

• Implementations increasing

• Required for new Platform Certification

Page 19: Ken Rehor's presentation at eComm 2008

What's Next?

• VoiceXML 3.0• Video

• Multimodal integration

• Speaker Biometrics

• Cleaner Modularity

• SCXML 1.0• State Chart Markup Language

• Separate logic from presentation • W3C Working Draft Feb 2007

• Several implementations available

• Commercial, educational, open source

Page 20: Ken Rehor's presentation at eComm 2008

Web / Voice ++

• Standards enable easy integration with other technologies

• Re-use web technologies

• Multiple modalities / channels: Voice +• SMS

• Web

• Chat

• Mobile

• Voice Control / Search

Page 21: Ken Rehor's presentation at eComm 2008

"Integration" / "Mashups" / "SOA"

• Modular architecture

• Open interfaces

• Common languages, protocols

• Combine data, services, modalities

• Easy adoption of new technologies and features• Video

• Multimodal

• Biometrics

• Telephony

Page 22: Ken Rehor's presentation at eComm 2008

POTS

PSTN orVoIP

Mashups, SOA, Multi-Channel/Modal

Mobile web

VXMLBrowser

Voice UIApp

Mobile IP

IP

PC

Presentationlogic

Businesslogic

Mobile UIApp

Web UIApp

Page 23: Ken Rehor's presentation at eComm 2008

http://www.kenrehor.com

http://www.voicexml.org

http://www.w3.org/voice

For more information:

Page 24: Ken Rehor's presentation at eComm 2008

An eComm 2008 presentation –

http://eCommMedia.com for more