19
© 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal Applica

© 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

Embed Size (px)

Citation preview

Page 1: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

© 2007 IBM Corporation

SpeechTEK, August 21, 2007

Jan SedivyIBM, Voice Technologies and Systems, Czech Republic, Prague

Architecture for Web Multimodal Application

Page 2: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application2

Introduction - need

Design a simple multimodal architecture

Architecture supports all possible kinds of multimodal applications starting from simple form filling to Interactive movie including animation.

Small required resources - runs on PDA and on Internet

Use open standards when possible

No compromises in multimodality - let the user freely change between voice (VUI) and GUI

Simple and fast development

IBM ViaVoice

Page 3: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application3

Key Components - approach

IBM Embedded ViaVoice link

Embedded VoiceXML Browser (EVB) - research prototype

Standard HTML browser – Internet Explorer or Firefox

The Adobe Flash Player

(XML) protocol which enables the control of the browser by the external application

Page 4: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application4

Embedded ViaVoice overview

Embedded ViaVoice® delivers IBM speech technology to mobile devices and automobile components.

Robust speech-recognition with low error rate and text-to-speechSLM and action classification supporting freeform commands – no

need for user’s manual Embedded grammars or large lists of over 100 000 words N-best, confidence score, out of vocabulary detection Speaker and noisy environment adaptation Push to activate button, automatic gain control, automatic end of

utterance detection, transient noise detection,Broad range of languagesEclipse based easy-to-use developer toolkitC/C++ highly portable, scalable, small footprint, low CPU MIPS code.IBM provides porting, integration, testing and consulting services,

along with customized development workshops

IBM ViaVoice

Page 5: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application5

IBM Embedded VoiceXML Browser overview

Small, fast, and portable Embedded VoiceXML Browser (EVB)VoiceXML 2.0 compliant. Written in plain C++ (no templates, etc.)Compact and portable code.Targeted to small portable devices - PDA, handhelds, set-top

boxes, etc. Runs on top of the IBM's Embedded Speech Engine and TTS. Ported to Win32, WinCE (iPAQ), and Linux.Runs as a viewer, VoiceXML snippets are pushed to the EVB

EVB

Page 6: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application6

Flash Player - overview

The Adobe Flash Player is a widely distributed multimedia and application player created and distributed by Macromedia (a division of Adobe Systems). Flash Player runs SWF files that can be created by the Adobe Flash authoring tool, by Adobe Flex or by a number of other Macromedia and third party tools.

Flash Player has support for an embedded scripting language called ActionScript (AS), which is based on ECMAScript. ActionScript matured from a script without variables to one that supports object-oriented code.

Page 7: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application7

HTML Browsers - overview

HTML BrowserMS IE 6, IE 7

Firefox

Browsers support add-ons

Page 8: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application8

PDA architecture

EVB

GUI – Adobe Flash PlayerVUI – Embedded VoiceXML Browser – viewer modeApplication control ActionScript

ActionScripts synchronizes GUI and VUI and generates: VoiceXML snippets of code,

Dynamic grammars, grammars, prompts (links)

All other dialog parameters

Result processing (n-best, disambiguation, similarity, OOV, ...)

Page 9: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application9

Internet Extensions

EVB Life-Cycle Manager Add-on starting, initializing, running shutting down the browser

prevent multiply VXML browsers running at the same time

version policy mechanism providing new version notification

The Security Server permits to open a socket in a different domain.

Communicate with EVB

Life CycleManager

SecurityServer

Page 10: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application10

Internet Architecture

Life cyclemanager

Securityserver EVB

Add-ons

Browser

Client

Internet

Page 11: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application11

Sample application - Literacy Tutor

IBM, Corporate Citizenship & Corporate AffairsProject goals

Use speech recognition technology - over the web - to help children and adults improve their literacy skills

Value to customer Gain literacy skills through practice and positive reinforcement

Improve pronunciation in a private setting

Interaction with tutor character introduces ‘fun’ and increases computer skills

Web = Anywhere/anytime access:

Can resume where left off Can share progress with family Build and share books on the web

www.readingcompanion.org

Page 12: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application12

Home page

Page 13: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application13

Functionality

Practice Reading – main application Flash application that uses EVB+EVV to decode speech

Flash animates a tutor character that interacts with the reader

Reporting – performance reports for teachers indicating strengths as well as problem areas for students

Book Library – add/remove books from classroom, rate books, book browser

Classroom Management – add/delete students, adjust reading level, add/delete classrooms as well as teachers and schools

Book Authoring – separate tool to author new books

Page 14: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application14

Bookshelf

Page 15: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application15

Children’s book/character

Page 16: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application16

Adult book/character

Page 17: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application17

Student Performance

Page 18: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

IBM, VTS, Czech Republic, Prague

© 2007 IBM CorporationSpeechTEK, August 21, 2007 Architecture for Web Multimodal Application18

Reading Companion - summary

We currently have more than 200 schools and not-for-profit organizations participating in the grant program, involving more than 11,000 users (children and adults) in 9 countries, as follows: Canada, United States, Spain, United Kingdom, Ireland, South Africa, Mexico, Venezuela, India

Community relations managers are reviewing proposals from prospective organizations since we hope to expand the program this year to 100 more sites.

Market value: US$10,000 per site (regardless of number of users)

Page 19: © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal

© 2007 IBM Corporation

SpeechTEK, August 21, 2007

Thank You!