View
223
Download
0
Category
Tags:
Preview:
Citation preview
(1)
VoiceXMLOverview, Opportunities
& Challenges
Hitesh Kr. SethChief Technology EvangelistSeraNova, Inchitesh.seth@seranova.comO’Reilly Conference on Enterprise Java, 2001
(2)
Agenda Introduction History Elements Developing Voice Portals Applications Vendor Landscape Challenges Resources
(3)
Introduction
(4)
The Web is Ubiquitous Key Highlights
HTTP Protocol HTML for Content
Static, Dynamically Generated
Usage Model Create Content/Scripts Publish on the Web Server Access it through a web browser
(5)
What about Voice? Call Center, IVR based products have been
around IVR Applications usually are “DTMF” oriented
Interaction through the key pad rather than Voice Complex Infrastructure
Involve huge investments in proprietary solutions Lack of integration with the Internet ASP model for deployment wasn’t established Emergence of sophisticated
Text-to-Speech/Voice Recognition solutions
(6)
VoiceXML
What is VoiceXML? XML based markup language which describes
voice/touch-tone based interactions for development of interactive voice based applications
(7)
Application Model
(8)
Technical Highlights Based on XML 1.0 Supports
DTMF (touch tone keys) and Voice Input Press 1 for Email; Please say your name
TTS (Text-to-Speech) and Pre-Recorded Audio Output Recording of User Input Telephony Integration
e.g. Connect to a Live Operator Form & field level grammars direct and (near) natural dialogs
Direct: Which city would you like to go?San Jose
Natural Like: What can I do for you, today?I would like to travel from San Jose, CA to Newark, NJ on 15 Nov
(9)
Key Benefits Brings the ubiquity of Web to the ubiquitous access device
– an ordinary phone Reach billion(s) of LAN and mobile phones Hands free communication for automobiles Single Platform for developing Web & Voice Applications Opens up the web to reach billions of ordinary phones
worldwide Automated Customer Service
Can enhance customer satisfaction (immediate response) Lower costs (lesser customer service reps. and customer
waiting costs!) Can use it even in a flight!
(10)
Hello VoiceXML<?xml version="1.0"?>
<vxml version="1.0">
<form>
<block>
Hello World!
</block>
</form>
</vxml>
(11)
Demo
(12)
History
(13)
History 3/2/1999
AT&T, Lucent & Motorola create VXML ForumNo of Members: 17
8/25/1999VoiceXML 0.9 Preliminary Spec ReleasedNo of Members: 61
3/7/2000VoiceXML 1.0 Spec ReleasedNo of Members: 79
5/22/2000VoiceXML 1.0 submitted to W3CNo of Members: 150
Today, there are 281 members of the VoiceXML Forum(10/5/2000)
(14)
Earlier Works SpeechML by IBM VoxML by Motorola PhoneWeb/PML by Lucent/AT&T
(15)
Elements
(16)
Elements Root
<vxml> Form/Interaction
<field>, <filled>, <initial>, <param>, <option> Grammar
<dtmf>, <grammar> Events
<error>, <exit>, <noinput>, <help>, <nomatch> Platform Specific
<meta>, <property>, <object> Telephony Integration
<disconnect>, <record>, <transfer>
(17)
Elements Language
<if>, <else>, <elseif>, <assign>, <value>, <var>, <script>, <return>, <clear>, <throw>, <catch>, <subdialog>, <block>
Prompt/Audio <break>, <sayas>, <audio>, <block>, <enumerate>,
<emp>, <prompt>, <pros>, <div>, <reprompt> Navigation
<choice>, <menu>, <link>, <goto>, <submit>
(18)
Prompts TTS (Text-to-Speech)
<prompt>What can I do for you?</prompt> <prompt>
Did you say <sayas class=“phone”>732-362-2187</sayas></prompt>
Did you say Area Code (732) 362-2187 Pre-Recorded Prompts
<prompt><audio src=“initial_greetings.wav”/>, Hitesh
</prompt> Rule of Thumb
Use TTS sparingly (only for dynamic information) <prompt bargein=“false”> can be used for Ads or any other
special announcements.
(19)
Navigation<?xml version="1.0"?><vxml version="1.0">
<menu><prompt>Welcome to your Personal Portal. <enumerate/> </prompt><choice dtmf="1" caching="safe" next="Email.jsp">Email</choice><choice dtmf="2" caching="safe" next="Calendar.jsp">Calendar</choice><choice dtmf="3" caching="safe" next=“EmployeeDirectory.jsp">Employee Directory</choice></menu>
</vxml>
(20)
Grammars Specify utterances that a user may speak to provide
corresponding string value or set of attribute-value pairs Can define a form grammar or field grammar Spec. doesn’t require an implementation to support a
particular format Common Grammar Formats
Java Speech API Grammar Spec (JSGF) Nuance GSL Speech Recognition Grammar Spec for W3C Speech Interface
Framework (Working Draft) Can be specified inline with the VoiceXML document or
referenced externally using the <grammar> tag
(21)
Grammars Inline...<field name="emplId">
<prompt>Say the name of the person</prompt><grammar type="application/x-jsgf">
hitesh seth {1} | ...
</grammar>...</field>...
External...<field name="emplId">
<prompt>Say the name of the person</prompt><grammar type="application/x-jsgf" src="mycompany.gram#employee" caching="safe"/>...
</field>...mycompany.gram#JSGF V1.0;grammar mycompany;public <employee> =
(hitesh seth) {1} ...
(22)
Interaction<?xml version="1.0"?><vxml version="1.0">
<form id="Main"><field name="emplId">
<prompt>Say the name of the person</prompt><grammar type="application/x-jsgf">
(hitesh seth) {1}| ...
</grammar><filled>
<if cond="emplId=='1'"><goto next="#Employee1"/>
<elseif cond="emplId=='2'"/>...
</if></filled>
</field></form>
(23)
Interaction<form id=“Employee1">
<block>
<prompt>Hitesh Seth.
Direct Phone:
<sayas class="phone">732-362-2187</sayas>.
</prompt>
</block>
</form>
...
</vxml>
(24)
Telephony Integration <transfer> element Connect the user to another phone Applications
Assisted dialing Online Employee Directory! I would like to call Hitesh on his cellular phone. Connecting to (732) 433-5603 ….
Switching to a human Operator Welcome to XYZ Voice Portal. At any point of time say
Operator to connect to a customer service agent. Please say your name. ….
(25)
Telephony Integration<?xml version="1.0"?><vxml version="1.0"><form ...>... <field name="cmd">
<prompt>Hitesh’s direct phone is (732) 362-2187, Cellular ...</prompt><grammar type="application/x-jsgf">
home | direct | cellular</grammar><filled>
<if cond="cmd=='direct'"> <assign name="phone_no" expr="'7323622187'" /> <goto next="#CallTransfer"/>
<elseif cond="cmd=='cellular'"/>...
</if></filled>
</field>...</form>
(26)
Telephony Integration <form id="CallTransfer">
<block><prompt><audio src="transfer.wav“/></prompt>
<transfer dest="{phone_no}"/>
</form>
</vxml>
(27)
Extensions <object> & <property> Tags <property>
Implementation Specific Properties e.g.
TTS Engine Parameters (gender, tone etc) <object>
Implementation Specific Components and Value Add Services
e.g. Integration with the components built for the underlying ASR
Engine (e.g. Nuance SpeechObjects) e.g. Component for getting an address
Caller-Id Information Service Cellular Phone Location Service
(28)
Demo
(29)
Developing Voice Portals
(30)
Developing What do you need?
Development Tool To develop/test the application
IBM WebSphere Voice Server SDK, Motorola Mobile ADK, Nuance V-Builder, Tellme Studio, …
Web Server To execute the scripts/server VoiceXML content
Apache, Microsoft, Netscape, … JSP, Servlets XML Parser, XSLT Processor
VoiceXML Interpreter/Implementation Platform Ordinary Touch Tone Phone PC with a good Sound Card and microphone
For Creating/Testing Applications using Simulators/SDKs
(31)
Static/Dynamic Serving! Up VoiceXML
Static v/s Dynamic Content Dynamic
Server Scripting technologies such as JSP,Servlets to generate VoiceXML
Dynamic Presentation using XML/XSLT XML represents content XSLT represents transformation of the content into
presentation Use Apache Cocoon!
(32)
XML/XSLT XML
Represents Data Static XML
or Dynamically Generated
using Server Scripts XSLT
Represents Formatting Write it yourself
or Create through a tool
(33)
Processing XML/XSLT JSP<%@page import="org.apache.xalan.xslt.*"%><% String xml =“AddressBook.xml"; XSLTProcessor processor=
XSLTProcessorFactory.getProcessor(); String xslFile = "AddressBook.xsl"; processor.process(
new XSLTInputSource(xmlFile),new XSLTInputSource(xslFile),new XSLTResultTarget(out));
%> Use Sophisticated Content Management Systems Create different Style Sheets for different interfaces - VoiceXML,
HTML,WML,etc.
(34)
Deployment Infrastructure Required
In Addition to Web Application Server serving VoiceXML pages, you need
Telephony Interface Boards ASR Engine TTS Engine VoiceXML Interpreter Bandwidth/Incoming Lines
Deployment Options Pre-packaged VoiceXML Server (all-in-one) Pick and choose VoiceXML Solution components
ASR, TTS, VoiceXML Interpreter, Hardware Ports, Bandwidth Hosted Voice ASP Solutions
(35)
Applications
(36)
Applications Utilized Web Content/Information
Stock Quotes, Weather Information, News Customer Service
Order Status, Address Change, Automated Call Center, etc Commerce
Banking, Stock Trading, Voice Enabled Commerce Corporate Portals
Employee Directory, Employee Self Service - Human Resources, Email, Calendar, Unified Messaging
Alerts [Push Model] Server Initiated Transactions (Call me when the stock price of
any company in my portfolio goes up by $10)
(37)
Corporate Portal Scenario 1 (800) – XXXXXXX Welcome to Your Corporate Portal. Please say your name. Hitesh Seth Please enter your access code **** Good Morning, Hitesh. What can I do for you? Check my mail You have 34 new messages. Is there any new message from my boss? Yes there are two message from …
(38)
Corporate Portal (contd.) First message. Subject: Help Need in XYZ Project. Hitesh, could you please call …?. Reply I am in San Jose till 15th of November. I could come to
Phoenix on 16th November.[#] [used <record>]
Mail Sent When am I meeting with John today? You have a meeting with John, at 2:00 PM. Connect me to his office, please. Connecting to John’s direct number, (732) ...
[used <transfer>]
(39)
Vendor Landscape
(40)
Vendor Landscape All-in-one VoiceXML Gateways/Servers
Combines ASR, TTS, VoiceXML Interpreter, Hardware Ports
Lucent Speech Server, Motorola Voice Developer Gateway, VoiceGenie VoiceXML Gateway, …
ASR (Advanced Speech Recognition) Engines AT&T, IBM, Nuance, Philips, SpeechWorks, …
Development Tools IBM WebSphere Voice Server SDK, Motorola Mobile ADK,
Nuance V-Builder, Tellme Studio, … Recording & Developing Prompts
Microsoft Sound Recorder, Sonic Foundry Sound Forge, Syntrillium Software Cool Edit, ...
(41)
Vendor Landscape Text-to-Speech Engines
AT&T, Fonix TTS, L&H RealSpeak, Lucent TTS Engine, Nuance Vocalizer, SpeechWorks Speechify, …
Telephony Interface Boards Dialogic, Lucent, ...
Voice ASP Solutions BeVocal, Interactive Telesis, Tellme, VoiceGenie
Technologies, Voxeo.net, ...
(42)
Challenges
(43)
Challenges Need Sophisticated Infrastructure Voice Recognition Quality Need to build Sophisticated Grammars for near
natural language speech recognition. Your Application is as good as its grammar.
TTS Quality & Customization Server Initiated VoiceXML Interactions! (Push
Model) VoiceXML Application Development Tools are still
maturing
(44)
Authentication Possible Approaches
User-Ids/Passwords Too cryptic for ASR Engines to recognize Usually need to spell it out, which is hard
Names/Access-Codes Names may not be unique; may be good for intranets
Telephone No/Access Codes Telephone No are unique (0017323622187) for International
Portal, (7323622187) for a US Portal (or redirected to a US only area)
Easy to Key in and/or say-aloud If available, use Caller-Id similar to “persistent cookie”
Voice Based Authentication Voice Print/Pattern
(45)
Performance Grammars
Inline v/s External Caching!
VoiceXML Documents Caching! Multiple interactions per document
Audio TTS v/s Recorded Prompts Quality v/s Size
(46)
Getting Started Take Small Steps
Use DTMF Enter your 10 digit account number Press 1 for Email, 2 for calendar, 3 for employee directory
Use Directed Dialogs Say the name of the person
Move towards natural language conversations What can I do for you?
Use TTS Sparingly for quality of voice interaction If your application incorporate ads, make sure to make
them short and crisp Start Small, grow big (try regional betas/limited trials and
move towards a larger audience)
(47)
Opportunities According to Kelsey Group
By 2005, Advertising and transaction from Voice Portals will
produce $5 billion in revenues and $6 billion for associated hardware, software and Net service provider companies.
(Adopted from Voice portal companies overshooting demand,
http://news.cnet.com/news/0-1004-200-1844967.html, May 9, 2000)
(48)
Resources
(49)
Resources Organizations
VoiceXML Forumhttp://www.voicexml.org
W3C Voice Browser Activityhttp://www.w3c.org/Voice
Specs VoiceXML Specification
http://www.voicexml.org/spec.html Java Speech API Grammar Spec (JSGF)
http://java.sun.com/products/java-media/speech/forDevelopers/JSGF.pdf
(50)
Resources Vendors
AT&Thttp://www.att.com/aspg/
BeVocalhttp://www.BeVocal.com
Dialogichttp://www.dialogic.com
IBMhttp://www.ibm.com/software/speech
Lucenthttp://www.lucent.com/speech
Motorolahttp://www.motorola.com
Nuancehttp://www.nuance.com
Tellmehttp://www.tellme.com
SpeechWorkshttp://www.speechworks.com
VocieGenie Technologieshttp://www.voicegenie.com
Voxeohttp://www.voxeo.com
(51)
Questions?
(52)
Thanks for your time.
Recommended