28
New challenge: telephone Text To Speech & audio Speech recognition VoiceXML Homework: sign up on studio.tellme.com

New challenge: telephone Text To Speech & audio Speech recognition VoiceXML Homework: sign up on studio.tellme.com

Embed Size (px)

Citation preview

New challenge: telephone

Text To Speech & audio

Speech recognition

VoiceXML

Homework: sign up on studio.tellme.com

Telephone

• Caller to system: speech recognition, – using grammars (limited vocabulary, general audience,

no training)– optional use of touch tones (numbers)

• System to caller: recorded audio (wav files) plus TTS (text to speech)

• Limited bandwidth, in comparison to other applications, but very familiar, ubiquitous medium

• 800 long distance, some airline information systems, others?

Problems in context

• Speech recognition: very difficult if – no restrictions on speakers

– grammar for all of English with aim of 'natural language understanding'

• Text to speech: much easier problem (but English is more difficult than more fully phonetic languages like Spanish. (I've been told.)

(More next class)

studio.tellme.com• Company that provides ‘engine’ for applications• Provides developing environment

– We are doing the tellme version of VoiceXML, but it appears to be standard.

• Register as a developer:– Provide your own id; assigned a PIN– Scratchpad for quick testing

• Put VoiceXML in ScratchPad place (no audio files)• 1-800-555-VXML (8965)

– SAY id and then PIN.– Application URL for projects with multiple files

• To look at someone else's project, you change your Application URL– called pointing your account to a new source.

VoiceXML• XML document (VXML header)• VoiceXML has tags for flow-of-control and

calculations.– Also can use <script> for JavaScript

• Grammars come in different varieties. We will use the tellme way. – Grammars are included in CDATA tags to prevent

XML interpretation.– Many grammars constructed for you.

• <field name="answer" type="boolean" >…will listen for yes or no. <field name="price" type="currency" > … will listen for currency.

– <menu > <choice > <choice> for list

VoiceXML basics, continued• <form> element can contain

– <block> elements, which can contain <audio>, <go>, other

– <field> which can contain• <prompt>• <grammar> (if not one of built-in grammars)• <filled>

• <var> tags can be at different levels (for example, document, block, or higher levels)

• <if> <elseif><else> tags• <script> elements for JavaScript (which can also

appear in expressions>

VoiceXML basics: typical case

• a form element – <field>

• <prompt>, made up of <audio>, with reference to recorded wav file and backup text

• <grammar>, if NOT using built-in grammars designated by type attribute of field. This is a CDATA section.

• <filled> with (follow-on) code using field

• <catch> for nomatch, noinput cases

Caution

A form contains various elements,

including

a field.

If a field has a grammar and the grammar is satisfied, control goes to a

filled tag

obligatory…

<?xml version="1.0"?><vxml version="2.0"> <form> <block> <audio src="prompt1.wav">Hello, world </audio>

</block> </form></vxml>

recorded using tellme studio

backup using TTS, just in case src file missing

Preparation: objects

• JavaScript (and other languages) use classes and objects

• Objects (aka object instances) are declared (created, instantiated) as members of a class

• Objects have– properties ('the data')

– methods (functions that you can use 'on' the objects)

– static methods• Math.random

Example: tm_date

• var dt = new tm_date; creates a date/time object.• Use methods to extract/manipulate information held

'in' dt.var day = dt.get_day();

• Use static methods supplied to do common tasks:var dn=tm_date.to_day_of_week_name(day);

or directly:var dn=tm_date.to_day_of_week_name(dt.get_day());

outline

• Header stuff

• script with external reference

• script (code) encased in CDATA notation

• Form/Block, with text to speech using value produced by script

• Closing stuff

<?xml version="2.0"?> <vxml><script src="http://resources.tellme.com/lib/code/tm_date.js"/>

Will make use of data functions

<script> <![CDATA[ var dt = new tm_date(); var monis = tm_date.to_month_name(dt.get_month());

var dateis = dt.get_date(); var dayis = tm_date.to_day_of_week_name(dt.get_day());

var yearis = tm_date.to_year_name(dt.get_full_year());

var houris= dt.get_hours() - 4; var minutesis=dt.get_minutes() var whole = 'The date is '+ monis+' '+dateis+'. It is ' + dayis+'. The time is ' + houris + ' ' + minutesis;

]]> </script> brute force correction from GMT

<form>

<block>Hello.

<value expr="whole"/>

Good bye.

</block>

</form>

</vxml>Can use block for audio

Example: my family• Directed responses to 3 family members:

– Daniel, • question/response on activities

– Aviva, • question/response on number of cranes

– Esther • response

• Calculations (arithmetic) done using variables• if tags

– The cond attribute is a condition test.

• limited error handled: exit on no-match event– alternative is to repeat prompt, generally using count

attribute

<vxml version="2.0"> <form> <field name="childid"> <prompt> <audio src="whosthis.wav">Hello. Who is calling?</audio>

</prompt>

<grammar type="application/x-gsl" mode="voice">

<![CDATA[[[dan daniel (daniel meyer) (dan meyer)] {<childid "daniel">}

[aviva (aviva meyer)] {<childid "aviva">}

[esther (esther minkin) ] {<childid "esther">}

]]]></grammar>

<catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>

<filled> <if cond="'daniel'==childid"> <goto next="#danfollowup"/> <elseif cond="'aviva'==childid"/> <goto next="#avivafollowup"/> <elseif cond="'esther'==childid"/> <goto next="#estherfollowup"/> <else/> <reprompt/> </if> </filled> </field></form>

never happens Note inner, single quote marks. Note double ='s

<form id="danfollowup"> <field name="today" > <prompt> <audio src="congratsdan.wav" >Congratulations on the new job.

Did you work on your thesis, or do aikido or jo today?</audio> </prompt><grammar type="application/x-gsl" mode="voice"><![CDATA[[[aikido (i key dough)] {<today "aikido">}[thesis (work)] {<today "thesis">}[jo (joe) ] {<today "jo">}[both (all) (everything) ((i key dough) jo)]{<today "both">}[none nothing (sort of)] {<today "nothing">}]]]></grammar><catch event="noinput nomatch"> <audio >I didn't quite

understand. Call or send e-mail.</audio> <exit/> </catch>

<filled><if cond="today=='aikido'" > <audio>Some aikido is fine. </audio> <elseif cond="today=='thesis'" /> <audio>Good, but do other things also.</audio> <elseif cond="today=='jo'" /> <audio>don't get hit in the head.</audio> <elseif cond="today=='both'" /> <audio>Doing some of everything is best. </audio> <elseif cond="today=='nothing'"/> <audio> You deserve a break, but remember you want to

be done by September. </audio> <else/> <audio> See you soon.</audio> </if></filled> </field> <block> <audio> Good bye </audio> </block> </form>

<form id="avivafollowup">

<var name="rest" expr="1000"/>

<field name="bcount" type="number">

<prompt>

<audio src="howmanycranes.wav">Hello, Aviva. How many cranes have you made? </audio>

</prompt>

<grammar type="application/x-gsl" mode="voice" >

<![CDATA[

NATURAL_NUMBER_THRU_9999

]]>

</grammar>

<catch event="noinput nomatch"> <audio src="sorry.wav">Sorry. I didn't get that.</audio> <exit/> </catch>

<filled> <assign name="rest" expr="1000-bcount"/> <audio> <value expr="rest" /> </audio> <audio src="togo.wav"> to go. </audio> <if cond="rest&lt;200" > <audio src="homestretch.wav">You're in the home stretch

</audio> <elseif cond="rest&lt;500" /> <audio src="morethanhalf.wav">More than half way

</audio> <elseif cond="rest&lt;800" /> <audio src="goodstart.wav">Off to a good start </audio> <else/> <audio> Get a move on </audio> </if> <audio src="goodbye.wav">Good bye. </audio> </filled> </field> </form>

can't use <

<form id="estherfollowup">

<block>

<audio >Hello, Mommy. This is all I can do now. </audio>

</block>

</form>

</vxml>

Application logic• VoiceXML elements (for example, <if> and

<var>.– Note: more powerful than XSLT: <assign> tag

• JavaScript code in attributes (for example, cond, expr)

• JavaScript code in <script> </script>– Encase in CDATA to avoid problems with certain

characters

• external JavaScript code, cited using <script src=file address />

Class work

• EVERYONE (who hasn't already) signup studio.tellme.com tonight

• Design simple application (you may work in groups):– Ask one question– Detect and respond to each of 2 or 3 answers– Use examples here for models– All text to speech

• Pick (at least) one and implement.• (Do this a short time and then go on to next lecture.

Resume after 9pm when minutes are free.)

Homework

• (Majors requirement overdue: there will be a deduction but better late than never.)

• Go to studio.tellme.com & signup as developer.– try examples (using scratch pad)

– record some voice samples

– do tellme tutorials

• ALSO try and report on– 800 long distance or some other commercial

application