Ken Rehor's presentation at eComm 2008

Preview:

Citation preview

Alphabet Soup: Sorting out Emerging

Telephony and Speech StandardsKen Rehor

Co-founder, VoiceXML Forum Founder, Harken Systems, LLC

• Voice Web Telephony Architecture

• Benefits of Open Interfaces, Protocols, Languages

• Status and Deployment

Components of a Voice Solution

Voice Processing and Telephony

Middleware

API Layer

Telephony Interface

Dialog Layer

ASR TTS AudioDTMF Media

Voice Application

Application Server

Application§ Logic§ Prompts§ Grammars

Database Database

Transaction Server

Break out of the monolithic systems trap

• Modernize existing proprietary applications without starting from scratch

• Develop new apps, and incrementally add features in a modular fashion

• Advantages• Faster development

• Less expensive to develop and maintain

• Path towards modern, open standards architecture

Internet or

Intranet

Phone user

Web user

HTTP

HTTP

App server

• Application logic• Content and data• Transaction processing• Database interface

<html>

VoiceXMLplatform

TDM orVoIP

Voice / Web Application Architecture

• Grammars• Audio / SSML• Scripts

• Images• Media• Scripts

HTTP

.wav

<grxml>

Any phone

<vxml>

© 2008 Ken Rehor. All Rights Reserved.

Scripts

HTTPHTTPS

HTTPHTTPS

VoIPGateway

VoiceXMLBrowser

Telephony Control Interface: SIP, etc.Dialog Control Interface: SIP, MSCP, etc.

DialogControlInterface

VoiceXMLApplication

CCXML VXML

Conference/MediaServer

CCXMLBrowser

PhoneNetwor

k

Caller

CCXMLCall ControlApplication

Media ControlInterface

SOAP

MRCP Client

Audio

DTMF

GRXML

Scripts

Audio

MediaMixer /Server

T1 / E1ISDNSS7

SIP

RFC 2833

RTP

TTS

Server

M R C P

SIV

Server

ASR

Server

GRXMLSSML

GRXML

G.711, WAV, .au, mp3, etc.

SIP NetannMSCMLMOML / MSMLMSCPDMSPMGCPetc.

Telephony ControlInterface

VoiceXML 2.0VoiceXML 2.1ECMAScript 262

MRCP v1MRCP v2

SSML

Voice App Architecture and Standards

Why Standards?

• Grow an industry

• Interoperation

• Lower cost of goods

• Innovation and evolution

• Disrupt proprietary markets

• Ecosystems develop around every open interface

• Everyone benefits through joint work: reduces design effort

• Promote technology to the next level

• Sell more due to larger market

Open Interfaces Enable Innovation

• Migration: Proprietary, hardware-based solutions to Proprietary software-based solutions to Open Software

• New Business Models• e.g. Voice Service Provider: Separate application from

Telephony/Speech resources

• Separation of concerns

• Evolve components without starting from scratch

• Concentrate on innovation rather than duplication

• Move up the value chain

• Leverage open, known technology• Web protocols, servers, networks, development tools, expertise

• Distributed Client-Server Architecture• Enables new business models and efficient resource utilization

• Standard/Common high-level language• Designed for voice dialogs and telephony

• Phone number mapped to URL• Phone number associated with URL of voice application

Voice Web Fundamental Concepts

Visual vs. Voice markup

Web app UI

• HTML – Structure• Layout

• Input declaration

• Transitions

• Images

• Audio

• Video

• Text

• Scripts

Voice Web app UI

• VoiceXML – Structure• Dialog flow

• Input declaration

• Transitions

• Audio

• Video, Images

• Text (for TTS)

• Scripts

Protocols

Web applications• HTTP, HTTPS

• SIP

• RTP

• SOAP

• WSDL

• …

Voice Web applications• HTTP, HTTPS

• SIP

• RTP

• SOAP

• WSDL

• …

The Telecom Trilogy

• User Interaction• Voice user interface• Multimodal user interface

• Switching• Connecting endpoints• Moving connections• Signaling

• Media processing• ASR, SIV, TTS, Record / Play• Conferencing, Mixing, Echo cancellation• Endpointing, Coding / Format conversion

Ecosystem at Every Interface

AudioEngine

ASREngine

<grxml>

TTSEngine

<ssml>

VoiceXML browser

<vxml>Application Server

Code Generator

GUI Tool / SDEProprietary dialog XML

.wav

<xml>

VoiceXML, GRXML, SSML,Scripts, etc.

MRCP client

MRCP server

VSP:Telephony, Speech, apps

• Application Developers

• VUI designers

• Voice platforms

• Tools

• Service Providers

• Application Servers

Industry Standards – Global Adoption

• VoiceXML Forum • Nearly 100 member organizations worldwide• Platform Certification• Speaker Biometrics• Collaborating with W3C, ANSI, ISO

• W3C Speech Interface Framework• VoiceXML 2.0/2.1, SRGS 1.0, SSML 1.0, CCXML 1.0• SISR 1.0, PLS 1.0• Coming: VoiceXML 3.0, SSML 1.1

• IETF• Media Resource Control Protocol (MRCPv2)• SIP / VoiceXML media server spec (MEDIACTRL)

W3C Speech Interface Framework

• VoiceXML

• SRGS

• SSML

• Semantic Interpretation

• Call Control

• Pronunciation Lexicon

• SCXML

For more information, see:

W3C Voice Browser Working Group http://www.w3.org/Voice/

W3C Speech Interface Framework

• W3C VoiceXML 2.0• W3C Recommendation March 2004• Widely implemented

• Approximately 4 dozen platforms• Many service providers worldwide• Many tools, countless applications

• VoiceXML Forum Platform Certification Program

• 24 certified platforms, more coming

• W3C VoiceXML 2.1• W3C Recommendation April 2007• Most platform vendors support it• Certification Program and Test suite in progress

• W3C VoiceXML 3.0• Spec in early stages of development

W3C Speech Interface Framework

• Call Control W3C CCXML 1.0• W3C Working Draft Jan 2007

• Implementations increasing

• Pronunciation Lexicon W3C PLS 1.0• Used to describe phonetic information for use in

speech recognition and synthesis

• 2nd Last Call Working Draft Oct 2006

W3C Speech Interface Framework• Input grammars SRGS 1.0

• W3C Recommendation March 2004

• Widely implemented

• Output formatting SSML 1.0, 1.1• SSML 1.0 - W3C Recommendation March 2004

• Widely implemented, yet minor real support (most TTS engines ignore the SSML instructions)

• SSML 1.1 – W3C Working Draft June 2007

• Adds support for Asian, Eastern European, and Middle Eastern languages

• Semantic Interpretation for Speech Recognition SISR 1.0• W3C Recommendation April 2007

• Implementations increasing

• Required for new Platform Certification

What's Next?

• VoiceXML 3.0• Video

• Multimodal integration

• Speaker Biometrics

• Cleaner Modularity

• SCXML 1.0• State Chart Markup Language

• Separate logic from presentation • W3C Working Draft Feb 2007

• Several implementations available

• Commercial, educational, open source

Web / Voice ++

• Standards enable easy integration with other technologies

• Re-use web technologies

• Multiple modalities / channels: Voice +• SMS

• Web

• Chat

• Mobile

• Voice Control / Search

"Integration" / "Mashups" / "SOA"

• Modular architecture

• Open interfaces

• Common languages, protocols

• Combine data, services, modalities

• Easy adoption of new technologies and features• Video

• Multimodal

• Biometrics

• Telephony

POTS

PSTN orVoIP

Mashups, SOA, Multi-Channel/Modal

Mobile web

VXMLBrowser

Voice UIApp

Mobile IP

IP

PC

Presentationlogic

Businesslogic

Mobile UIApp

Web UIApp

http://www.kenrehor.com

http://www.voicexml.org

http://www.w3.org/voice

For more information:

An eComm 2008 presentation –

http://eCommMedia.com for more