Upload
srksameer
View
217
Download
0
Embed Size (px)
Citation preview
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
1/56
RealTimeSpeechTranslatorXavier GarciaCabrera
25/06/2008
Author XavierGarciaCabrera
Projectdirector JanRudinsky
Date 25/06/2008
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
2/56
Summary1 Overview...............................................................................................................................1
2 Introduction:ACCProject......................................................................................................3
2.1 Platformdescription......................................................................................................3
2.1.1 Software................................................................................................................3
2.1.2 Hardware...............................................................................................................5
2.1.3 Architecture...........................................................................................................5
2.2 ACCTeamProjects........................................................................................................7
2.2.1 RealTimeVoiceTranslator....................................................................................7
2.2.2 VXMLIDE...............................................................................................................8
2.2.3 ACCsecurity...........................................................................................................8
2.2.4 Others....................................................................................................................9
3 RealTimeVoiceTranslator..................................................................................................11
3.1 Objectives....................................................................................................................11
3.2 Requirements..............................................................................................................11
3.3 MarketAnalysis...........................................................................................................12
3.3.1 SkypePersonalInterpreterService.....................................................................12
3.3.2 GoogletalkbotsforrealtimetranslationsinIM(InstantMessaging)...............13
3.3.3 NECAutomaticTranslatorformobilephones....................................................13
3.4 ChoosingaTranslationEngine....................................................................................14
3.4.1 WebTranslationservices....................................................................................14
3.4.2 TranslationWebServicesAccuracybenchmark.................................................15
3.4.3 Conclusion...........................................................................................................26
3.5 SoftwareTools.............................................................................................................26
3.5.1 IBMWebSphereApplicationServer....................................................................26
3.5.2 JavaEE.................................................................................................................26
3.5.3 ApacheTomcatHTTPServer...............................................................................27
3.5.4 EclipseIDE...........................................................................................................27
3.5.5 IBMRationalWebDeveloper&VoiceToolkit....................................................28
3.6 HowIconfigurethesoftwaretools.............................................................................29
3.6.1 Tomcat6InstallationandconfigurationonFedoraLinux..................................29
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
3/56
3.6.2 IBMRWDVoiceLanguageandVoiceEnginesConfiguration..............................32
3.7 CreatingandrunningadynamicVoiceXMLapplication.............................................33
3.7.1 CreateanewJavaProjectinEclipseIDE.............................................................34
3.7.2 AddingJEElibrariesinEclipseIDE.......................................................................34
3.7.3 CreatethemainprocedureintheServletclass..................................................35
3.7.4 Copythebinaryfilestotheserver......................................................................35
3.7.5 Tomcatapplicationconfiguration.......................................................................36
3.7.6 ConfiguringVEtousetheVXMLApplicationgeneratedbytheservlet..............37
3.8 Applicationschema.....................................................................................................38
3.9 TextToSpeechandSpeechRecognition.....................................................................39
3.10 Achievedgoals.............................................................................................................39
3.10.1 SetupWebSphereServertorunVoiceXMLApplications....................................40
3.10.2 UseIBMRationalWebDevelopertoimplementVoiceXMLApplications..........40
3.10.3 ConfigureandusedifferentlanguageswithinourVoiceEngine........................40
3.10.4 GeneratedynamicVXMLcode............................................................................41
3.10.5 Makequeriestowebsitesandprocesstheresponse.........................................41
3.10.6 MakequeriestoGoogleTranslator.....................................................................42
3.10.7 GetdynamicdatafromaVoiceXMLApplication................................................43
4 Conclusions.........................................................................................................................45
5 APPENDIXA:Definitions.....................................................................................................47
6 APPENDIXB:Acronyms.......................................................................................................51
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
4/56
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
5/56
10BOverview
1 OverviewThisdocumentiswrittentoreportonmyworkontheRealTimeVoiceTranslatorProject,the
projectIcarriedoutasmyfinalthesisprojectduringtheacademicyear20072008.Duringthis
period I have been working in the Research and Development Center (RDC) for Mobile
Applications,adepartmentoftheCzechTechnicalUniversity(CTU)inPrague.IntheRDCIwas
a member of the Automatic Call Center Project (ACC Project) team, and within it, I was
assignedtocarryouttheRealTimeVoiceTranslatorProject.
The Automatic Call Center Project (ACC Project), now renamed to Voice2Web Project, is a
projectcarriedoutbytheResearchandDevelopmentCenter.TheRDCisadepartmentinside
theElectroTechnicalFacultyoftheCTUthatcarriesoutResearchandDevelopmentprojects
regarding the Information Technologies (IT). Some of its partners are IBM, Vodafone and
Ericson,whotheRDCisdoingprojectsfor.
TheACCProjectbeganon2007and itsaim istodevelopVoiceApplications,within the IBM
and RDC agreement, using IBM Voice Technologies and whatever open standards or open
sourcesoftware. IBM isanACCProjectpartnerandprovidesfinancingfor it. Italsoprovides
hardwareandsoftwarelicensestotheACCProjectandgivesussupport.Themembersofthe
ACCProjectaredevelopingseveralVoiceApplicationsatthesametime,allthemfollowingthe
ACCProjectpurposes.
Although this document is focused on the Real Time Voice Translator Project, it will also
explain in the introduction some aspects of the ACC Project. This is because the Real Time
VoiceTranslatorProjecthasalotofpointsincommonwithitanditisworth,tounderstandit
well,understandsomepointsoftheACCProjectaswell.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
6/56
2 RealTimeSpeechTranslator
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
7/56
31BIntroduction:ACCProject
2 Introduction:ACCProject2.1PlatformdescriptionTorunVoiceApplicationsandachievethegoalsofourteam,itisneededtohaveaplatformto
build on it the all the projects we are going to carry on. This platform is composed of
hardware,softwareandnetworkarchitecturethatworktogethertoachieveitsproposal.
Thefollowingsectionsdescribethedifferentpartsofourplatform,andexplainhowtheyare
configuredtoworktogether.
2.1.1 Software
To get all the features we need from our platform we use some software installed on our
servers that provides the services we require. We have been adding the different software
applicationsduringthedevelopingoftheprojectaswehaveneededthem,notallatthesame
timeatthebeginningof it.Therefore,thenumberofthesoftwareapplicationsusedmaybe
increased in the future if we require some extra functionality the software we have now
installedisnotabletoprovide.
Eachtimewehavefoundthenecessityofaddingsomefunctionality,wehavefoundoutthe
applicationsthatareabletoprovideusourrequirementsandwehavechosenwhichfitsbetter
to our present and future necessities. One of the most important criteria we have used to
choosethesoftwareapplicationstouse isthat itshouldbeopensource.Theonlyexception
wehavedone, iswith the IBMWebSpheresoftware,due IBM issupporting ourprojectand
providedusthisforfree.
IntheTable21youcanseethecurrentsoftware installed inourserversandthefeatures it
provides
to
our
platform
and
the
applications
in
which
they
are
used.
Software Server FeaturesandUses
FedoraLinux Adela,Was,Bolek
ThisistheOSusedonallourservers.Asall
theLinuxdistributions,itprovidesallthe
basicfunctionalitiesforaserverlike:user
accounts,sshserver,ftpserver,network
protocols
VoiceEnabler(VE) Adela It
is
a
part
of
the
IBM
WebSphere
Server
framework.Itprovidesthebasiccapabilities
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
8/56
4 RealTimeSpeechTranslator
toenableVoiceApplications.Itunderstands
theSIPandCCXMLprotocolsandmanages
andprocessestheVXMLfile.Itreceivesthe
SIPcallsforwardedbyAsteriskserver,assigns
themaVoiceXMLApplication,andexecutes
andinterpretstheVXMLfilesofthat
application.
VoiceServer(VS) Was
ThisisapartoftheIBMWebSphereServer
framework.ItisinstalledintheWasserver
andimplementsthebothVoiceEngines:ASR
(AutomaticSpeechRecognition)andTTS
(TextToSpeech).Itreceivestherequestfrom
VoiceEnablerthroughMRCPProtocoland
returnstheresultofprocessingthembythe
VoiceEngines.Theenginesrequiremore
amountofCPUpowerthanother
applicationsandthisisthereasonwhyitis
installedinaseparatemachine.
IBMHTTPServer Adela
ItissuppliedinsidetheIBMWebSphere
Serverframework.ItisanHTTPserver,and
providesdebasicfeaturesofthiskindof
servers.Itisusedtohostwebpagesandthe
VXMLfilesofstaticVoiceXMLapplications.
ApacheTomcat Adela
ItisanHTTPServerenabledtorunJava
Servlets.ItisusedtoruntheServletswe
needtogenerateDynamicVoiceXML
Applications.Italsohostsomewebpages
andsomeconfigurationfilesusedbyServlets.
MySQL Adela
MySQLisaverypopularandopensource
database.Itprovidesthesamefunctionalities
ofthemostpowerfulproprietarydatabases.
Weuseitasadatabaseinthedevelopment
ofVXMLIDEProject.
JavaFramework Adela,Was
ItprovidesdeJVM(JavaVirtualMachine)and
extraresourcesneededtoexecuteJava
applications.Itisrequiredtobeinstalledin
allthemachinesthatrunIBMWebSphere
framework.WealsorequireitinAdelatorun
theJavaServlets.
Asterisk Bolek
Itisthemostworldwideusedopensource
SIPserver.Weuseittomanagetheincoming
SIPcallsandredirectthemtoourVoice
Applications.Italsomakepossibletoreceive
SIPcallsbehindaNAT/Firewall.
Table21.SoftwareusedinACCProjectPlatform.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
9/56
51BIntroduction:ACCProject
2.1.2 Hardware
Wehavethreephysicalmachines.Thesemachinesaretheserversthatrunthesoftwarethat
providesthefunctionalitiesweneed.Twooftheseservers,thosethatrunIBMsoftware,were
providedbyIBM.Theother,whereistheAsteriskserverinstalled,belongstoRDCdepartment.
Our entire infrastructure is behind the Firewall/ NAT of the CVUT University, but the server
that runs the Firewall/NAT server belongs to the university and is administrated by the
Universitysadministrator,soitwillbeconsideredoutofourplatform.
Each of the machines is identified by a name (Adela, Was and Bolek) and run different
software applications. Following our principle of using always open source software in the
wholeproject,weuseFedoraLinuxoperationsystemonalltheseservers.InTable22youcan
seethesoftwareinstalledoneachmachine.
Name InstalledSoftware
Adela
VoiceEnabler(IBMWebSphere)
IBMHTTPServer(IBMWebSphere)
ApacheTomcatHTTPServer
MySQLDatabase
Was VoiceEngine(IBMWebSphere)
Bolek AsteriskSIPServer
Table22.Servernamesandinstalledsoftware.
Inthefollowingtableyoucanseethehardwaretechnicalfeaturesofeachofourservers:
Adela Bolek Was
Model DELL PCABACUSARCH
8700N
DELL PCABACUSARCH
8700N
SUPERMICRO
SuperServer
CPU IntelPentium4
3.6GHz
IntelPentium4
3.6GHz
IntelXeon5130 @
2.00GHz
RAM 2GB 2GB 2GB
HD 230GB 230GB 230GB
Graphicscard GeForce550 GeForce550
Table23.Servershardwarefeatures.
2.1.3 Architecture
Ourplatformisconstitutedforseveralserverswhichareinterconnectedwithinanetwork.We
arenotusingadedicatednetworkonlyforourproject.Weareusingtheuniversitynetwork,
sharedwithallotheruniversityservers.OurserversareBolek,WasandAdela.Theyconnect
andreceiveconnectionsfrominternetthroughtheuniversityFirewall/NATserver.Thatisthe
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
10/56
6 RealTimeSpeechTranslator
reason why we need to use asterisk to be able to manage SIP connections behind a NAT
server.InFigure21youcanseetheACCProjectnetworkarchitecture.
Figure21.ACCProjectnetworkarchitecture.
Whenauserwantstouseourservices,hehastoestablishaconnectionwithourSIPserver
(Asterisk server installed on Bolek) which is behind the university firewall. Once this
connection is established, our SIP server redirects the call to the Voice Enabler (VE) where
therearetheVoiceXMLapplications.VoiceEnablertakesthecontrolofthecallandexecutes
the Voice Application that user is requesting. To be able to interact with the user, Voice
Applications need to use a Voice Server (VS). When it is needed, VE establish an MRCP
connectionwiththeVS(Bolekserver)tomakethequeriesandreceivetheanswersfromthe
VoiceEngine.InFigure22youcanseethecalldiagramandtheprotocolsusedtoestablishthe
connectionsbetweenservers.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
11/56
71BIntroduction:ACCProject
Figure22.Calldiagram.
2.2ACCTeamProjectsACCProjectisabigProjectwhichpurposeisdevelopingallkindofVoiceApplicationsandnot
only those focused on call center applications. Therefore, ACC Project is made up of other
smallerprojectscarriedoutforitsmembersthatmatchthemainprojectaims.
Atthebeginning,ACCProjectteammadeabrainstorminganditsmembersproposedseveral
Voice Projects to realize. After that, the proposed projects were sorted by priority and the
teambegantoworkonthosewhichweremoreimportantaccordingonitscriteria.Afterthat,
some other projects were coming up during the development of the Project and they were
addedtothelist.
Inthissection,Iwillexplainsomeoftheseprojects.Somearecurrentlyondevelopmentand
othersareplannedtobedoneinthefuturebytheACCTeam.
2.2.1 Real Time Voice Translator
This project was set the first in the priority list and it was the first Voice Application to be
started. We decided it so because many of the steps done to realize this project will be
requiredinanyotherVoiceProjectwepurposetocarryoutinthefuture.
ThemainpurposeofthisprojectistodevelopaVoiceApplicationwhichwasabletodoareal
timetranslationofaphoneconversationbetweentwocallerswhospeakdifferentlanguages.
This is theprojectthatwasassigned tome,XaviGarcia,andwhich Ihavebeencarryingout
duringallthesemonths.Thisprojectwillbeexplaineddeeplyinsection4.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
12/56
8 RealTimeSpeechTranslator
2.2.2 VXML IDE
Despite it is not a Voice Application properly speaking, it is within the purposes of the ACC
Project. ItwasinthesecondpositionintheACCTeamprioritylist,soitmeansthisprojectis
veryimportantforus.ThefinalaimofthisprojectisdevelopingaWebbasedIDE(Integrated
Development Environment) for VoiceXML Applications. Its purpose is making available to a
wide range of programmers VoiceXML technology and supply them an easy way to develop
their own Voice Applications. The most important requirement is that it has to be a Web
Application;therefore itwillbeavailableforeverybodywithoutanykindoflocalinstallation,
whattheuserwillonlyneed is awebbrowser.Thisproject iscurrentlyunderdevelopment
andiscarriedoutbyTomasMikula.
Figure23.VXMLIDEinterface.
2.2.3 ACC security
ThisisanothernonVoiceApplication,butitisextremelyimportantforourProject.Theaimof
this project is provide security to all our project applications and protect our infrastructure
against attacks. This task is being accomplished by Michal Vanek and he is focusing on
providingthefirewallrulesanddoingthepropersoftwareconfiguration(Asteriskserver)to
avoidDoSattacks.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
13/56
91BIntroduction:ACCProject
2.2.4 Others
ACCTeamhasplannedtocarryoutsomeotherprojectsinthefuture.Theyhavenotstarted
yetbecauseeitherwehavenotaccomplishedsometheirprerequisitesorwehavenotenough
team members to do all the projects at the same time. Some of these future projects are:
PhoneWikipediaProject,andthirdpartyapplications.
PhoneWikipediaProjectpretends tomakeavailableall thecontentsofWikipedia througha
mobilephonecall.Thatis,everybodywhohaveamobilephonewouldbeabletoread(inthis
casewouldbeabletolisten)anyWikipediaarticleavailableininternet.
ThereareotherProjectsthatcameupduringthedevelopmentofothertasks.Thisisthecase
ofWeatherApplication,whichwasdevelopedtotesttheVoiceXMLDATAtag.Thisapplication
providesthecurrentweatherofanyEuropeancapital.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
14/56
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
15/56
112BRealTimeVoiceTranslator
3 RealTimeVoiceTranslatorTheRealTimeVoiceTranslatorisoneoftheProjectscarriedoutfortheACCProjectTeam.This
documentwillfocusonitmoredeeplythanotherACCprojectsbecauseit istheprojectthat
was assigned to be carried out by me. In the following sections I will describe the different
partsofitandhowithasbeenfulfilled.
3.1ObjectivesThepurposeofthisproject istodevelopaVoiceApplicationthatwasabletodoarealtime
translationfromaspokensentenceinone languagetoaspokensentence inother language.
The basic idea is to enable that two people, who are talking through a voice call in two
differentcountries,wereable tounderstandeachotherspeakingdifferent languagesdue to
anautomaticrealtimetranslationdonebyamachine.
3.2RequirementsIn this project, like in every project; before carrying out it, it is needed to establish some
requirements thatshouldbeaccomplished.Oncealltheserequirementshadbeenachieved,
wealsowillbeabletosaythattheaimsoftheRealTimeVoiceTranslatorhavebeenachieved.
Theserequirementsarethefollowings:
The application must be autonomous. It should work without the interaction of any
operator.
It isneededtogeneratedynamicVXMLfilesdynamically. It isnotpossibletostorea
list of possible results. All the answers will be generated after the user said his
sentence.
Theapplicationshouldbeabletounderstandeverythingtheusersaid.Usershouldbe
abletotalkaboutwhateverhewantsandtheapplicationwouldbeabletotranslate
whathehasspoken.
Itshouldbeabletotranslatesimplesentencesto/fromatleasttwolanguages.
It is needed to use a Translator Engine which was able to do the translation at real
time.
The delay between the moment the caller ends his sentence and the moment the
translatedsentence isdeliveredtothecalleeshouldbeasshortaspossibletomake
possibleasmoothconversation.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
16/56
12 RealTimeSpeechTranslator
3.3MarketAnalysisBeforebeginningthisproject,wedidaninternetresearchtofindoutotherpossibleservices,
either commercial or free services, which were currently offering the same features we
attempttoofferwiththeRealTimeVoiceApplication.
This Market Analysis shows that, although there are some attempts to give a real time
translation, they do not approach the problem in the same way we want to do in our
application.Some of them try tosolve theproblem usinga mobilemachine where the user
talks and it replies with the translated sentence. Others offer translation services for some
written Instant Messaging (IM)services.And Skypehas introducedanewservice to provide
InstantTranslationinphonecallsdonethroughahumaninterpreter.
InthefollowingsectionsIwillexplainthefeaturesofeachofthemostimportantservicesthat
trytoapproachthetranslationproblemincommunications.Theseservicesare:SkypePersonal
InterpreterService,GoogleTalkBots,NECAutomaticTranslatorformobiles.
3.3.1 Skype Personal Interpreter Service
Skype is now offering realtime voice translation services for Skype calls. Language Line
Personal Interpreter offers SkypeOut customers nearinstantaneous access to professional
voice translation. This is an access to live interpreters, not machine translation. The cost is
$2.99perminute,whichisrelativelyhigh,butwiththistypeofserviceyouarepayingforthe
conveniencefirstandforemost.Theuseofthisservice isonthefly,withnoscheduling.You
cangetaninterpreteronaveragein45secondsafteraninitialrequest.
Features:
Price $2.99/min
Scope PhonecallsusingSkype
Keypoints Liveinterpreters,notmachinetranslation
150supportedlanguages Noscheduling
Moreinfo www.skype.com
Table31.SkypePersonalInterpreterServicefeatures
http://www.skype.com/http://www.skype.com/8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
17/56
132BRealTimeVoiceTranslator
3.3.2 Google talk bots for realtime translations in IM (Instant
Messaging)
Recently, on December2007,Googleannounced the availabilityofrealtime translations for
their IM (Instant Messaging) service Google Talk. The translations are made by bots that
translatethetextmessagesintheGoogleTalkconversations.Foreachlanguagetherearetwo
bots,oneforeachtranslationdirection.Forexample,forSpanishtoEnglishtranslationthere
are the boots [email protected] and [email protected]. This service
supports23translationdirections(bots).
Features:
Price Free
Scope InstantMessagingconversationsusingGoogleTalk
Keypoints Texttranslationonly
23translationdirectionssupported
Moreinfohttp://googletalk.blogspot.com/2007/12/merry
christmasgodjuland.html
Table32.GoogleTalkBotsfeatures.
3.3.3 NEC Automatic Translator for mobile phones
Japanese electronics company NEC has developed the first automatic translation software
tailoredformobilephones. ItonlytranslatesfromspeechJapanesetoEnglish,butthis isthe
firsttimethetranslationtechnologyisusedinmobilephoneswithoutanyexternalhelp.The
softwarehasadictionaryof50,000wordsanditisstillunderdevelopment.
Features:
Price unknown
Scope Mobilephoneusers
Keypoints JapanesetoEnglishtranslationonly(notEnglishtoJapanese)
Underdevelopment
Moreinfo http://www.akihabaranews.com/en/news_details.php?id=15185
Table33.NECAutomaticTranslatorfeatures.
http://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.htmlhttp://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.htmlhttp://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.htmlhttp://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.htmlhttp://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.htmlhttp://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.htmlhttp://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.htmlhttp://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.htmlhttp://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.htmlhttp://www.akihabaranews.com/en/news_details.php?id=15185http://www.akihabaranews.com/en/news_details.php?id=15185http://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.htmlhttp://googletalk.blogspot.com/2007/12/merry-christmas-god-jul-and.html8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
18/56
14 RealTimeSpeechTranslator
3.4ChoosingaTranslationEngineInordertocarryouttheRealTimeVoiceTranslatorProject,itismandatorytohavearealtime
translator which can provide us the translated sentences. As it is a very hard work to
implement a translator engine, we decided to use one of the free translators available inInternet.
Hence,firstofallwedidaresearchtofindoutallthefreetranslatorenginesavailable.After
this research we had to decide which one we choose for our purposes. Like it is known for
everybody,allthetranslationenginescurrentlyavailablehavenotagoodaccuracy.Therefore
wedecidetochoosethatonewhichhadthebesttranslationaccuracy.Toknowwhichoneof
all the preselected translation services has the best accuracy, I did a test that give a
punctuationtoeachoneandfinallyIcomparetheresults.
3.4.1 Web Translation services
After a deep research, we find out the most important free translation services currently
available in internet. We have preselected all them to do our test and choose the best
translationenginetouseinourapplication.InTable44youcanseealistofalltheseservices
andtheirURLs.
Service Website
Altavista Babel Fish http://babelfish.altavista.com/
Yahoo!Babelfish http://babelfish.yahoo.com/
Google Translator http://translate.google.com/translate_t?hl=en
InterTran http://www.tranexp.com:2000/InterTran
FreeTranslation http://www.freetranslation.com/
WorldLingohttp://www.worldlingo.com/en/products_services/worldlingo_translat
or.html
Dictionary.Com http://translator.dictionary.com/text.html
PROMT http://www.epromt.com/
TraduceGratis.com
(Google engine)
http://www.traducegratis.com/
Imtranslator
(PROMTengine)
http://translation.paralink.com/
Table34.Webtranslationserviceslist.
http://babelfish.altavista.com/http://babelfish.altavista.com/http://babelfish.yahoo.com/http://babelfish.yahoo.com/http://translate.google.com/translate_t?hl=enhttp://www.tranexp.com:2000/InterTranhttp://www.freetranslation.com/http://www.freetranslation.com/http://www.worldlingo.com/en/products_services/worldlingo_translator.htmlhttp://www.worldlingo.com/en/products_services/worldlingo_translator.htmlhttp://translator.dictionary.com/text.htmlhttp://www.e-promt.com/http://www.e-promt.com/http://www.e-promt.com/http://www.e-promt.com/http://www.traducegratis.com/http://translation.paralink.com/http://translation.paralink.com/http://translation.paralink.com/http://www.traducegratis.com/http://www.e-promt.com/http://translator.dictionary.com/text.htmlhttp://www.worldlingo.com/en/products_services/worldlingo_translator.htmlhttp://www.worldlingo.com/en/products_services/worldlingo_translator.htmlhttp://www.freetranslation.com/http://www.tranexp.com:2000/InterTranhttp://translate.google.com/translate_t?hl=enhttp://babelfish.yahoo.com/http://babelfish.altavista.com/8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
19/56
152BRealTimeVoiceTranslator
3.4.2 Translation Web Services Accuracy benchmark
We decided to do an accuracy benchmark to each one of the preselected web translation
servicestobeabletocompareeachotherinabetterway.Thisbenchmarkconsistintranslate
severalbasicsentencesfromSpanishtoEnglish.Ihaveselectedsomecommonsentencesthat
areusedinabasicconversationwhensomeoneistryingtocontractsomeservice.Ispecifically
have selected the basic dialog for booking a taxi and the basic dialog for ordering a pizza. I
havetonoticethattheaccuracymarkisanabsolutelysubjectivemarkanditisonlybasedon
mypersonalopinion.
3.4.2.1 SpanishsentencesHerearelistedtheSpanishsentencesofthedialogswhichourbenchmarkismadeof:
1. BuenosdasleatiendeJuanGonzlez,enqupuedoayudarle?
2. Desearapediruntaxiparalas22horasenladireccin:calleSanRamnnmero2.
3. Deacuerdo,sureservahasidorealizada.
4. Desearapedirunapizzatropicaltamaofamiliar.
5. Deseaaadiralgningredientems?
6. S.Extradequeso,jamnypeperoni.
7. Muybien,mepuededecirsunmerodetelfonoysudireccin?
8. Minmerodetelfonoesel777.777.777ymidireccines:calleFrankKafkanmero
132.
9. Perfecto,en30minutosrecibirsupedido.Elpreciofinalesde180coronas.
In the following tables there are the result of the benchmark we passed on each translator
engineandthemarktheyobtainedon the translationofeachsentence.Attheendyoucan
findasummarytablewiththeglobalmarkofalltheengines.
AltavistaBabelfish(Babelfishengine)
Translationresult Accuracy
1 GoodmorningittakescareofJuantohimGonzlez,inwhatIcanhelphim? 45
2 It would wish to request a taxi for the 22 hours in the direction: street San Ramon
number2
90
3 Inagreement,itsreservehasbeenmade. 80
4 Pizzawouldwishtorequestonetropicalfamiliarsize. 50
5 Itwishestoaddsomeingredientmore? 95
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
20/56
16 RealTimeSpeechTranslator
6 Yes.Cheeseextra,jamnandpeperoni. 75
7 Verywell,itcansaytoitstelephonenumberanditsdirectiontome? 70
8 My telephone number is the 777.777.777 and my direction is: street Frank Kafka
number132.
85
9 Perfect,in30minutesitwillreceiveitsorder.Thefinalpriceisof180crowns. 85
Table35.AltavistaBabelfishSpanishEnglishbenchmarkresults.
Yahoo!Babelfish(Babelfishengine)
Translationresult Accuracy
1 GoodmorningittakescareofJuantohimGonzlez,inwhatIcanhelphim? 45
2 It would wish to request a taxi for the 22 hours in the direction: street San Ramon
number2.
90
3 Inagreement,itsreservehasbeenmade. 80
4 Pizzawouldwishtorequestonetropicalfamiliarsize. 50
5 Itwishestoaddsomeingredientmore? 95
6 Yes.Cheeseextra,jamnandpeperoni. 75
7 Verywell,itcansaytoitstelephonenumberanditsdirectiontome? 70
8 My telephone number is the 777.777.777 and my direction is: street Frank Kafka
number132.
85
9 Perfect,in30minutesitwillreceiveitsorder.Thefinalpriceisof180crowns. 85
Table36.Yahoo!BabelfishSpanishEnglishbenchmarkresults.
GoogleTranslator
Translationresult Accuracy
1 GoodmorningheattendsJuanGonzalez,howcanIhelp? 80
2 Iwouldliketoaskataxifor22hoursatthefollowingaddress:CalleSanRamonnumber
2.
100
3 Okay,yourreservationhasbeenmade. 100
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
21/56
172BRealTimeVoiceTranslator
4 Iwouldliketoaskapizzatropicalfamilysize. 90
5 Wanttoaddonemoreingredient? 90
6 Yes.Extracheese,hamandpeperoni. 100
7 Well,whatIcansayyourphonenumberandyouraddress? 65
8 My telephone number is 777,777,777 and my address is: Frank Kafka street number
132.
100
9 Perfect,in30minutesyoushouldreceiveyourorder.Thefinalpricewas180kronor. 95
Table37.GoogleTranslatorSpanishEnglishbenchmarkresults.
InterTran
Translationresult Accuracy
1 GoodmorninghimatiendeJohnGonzlez,whereinitcanlendahelpinghand? 20
2 Hunger for order one taxicab in order to the 22 hours on the steerage : street San
Ramnnumeral2.
35
3 Inagreement,herreservationhasbeenachieved. 65
4 Hungerfororderonepizzatropicalsizehomelike. 40
5 Deseaappendanyingredientfurther? 35
6 yesExtraofcheese,hamandpeperoni. 85
7 Welldonemecanyoutellmehernumeraloftelephoneandhersteerage? 30
8 Enumeraloftelephone isthe777.777.777andEsteerage is:streetOutspokenKafka
numeral132.
20
9 Perfect,at30minutesshe'llreceiveherrequest.Thepriceeventualisof180halos. 60
Table38.InterTranSpanishEnglishbenchmarkresults.
FreeTranslation
Translationresult Accuracy
1 GoodmorningJuanattendshimGonzlez,inwhatcanIhelphim? 45
2 Itwoulddesiretoaskataxiforthe22hoursinthedirection:streetSanRamnnumber 80
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
22/56
18 RealTimeSpeechTranslator
2.
3 Inagreement,itsreservehasbeencarriedout. 80
4 Itwoulddesiretoaskapizzatropicalfamilysize. 80
5 Itdesirestoaddsomeingredientmore? 85
6 Yes. Extraofcheese,hamandpeperoni. 100
7 Verywell,cantellmeitsphonenumberanditsdirection? 85
8 Myphonenumber isthe777.777.777andmydirection is:streetFrankKafkanumber
132.
85
9 Perfect,in30minuteswillreceiveitsorder. Thefinalpriceisof180crowns. 85
Table39.FreeTranslationSpanishEnglishbenchmarkresults.
WorldLingo
Translationresult Accuracy
1 GoodmorningittakescareofJuantohimGonzlez, inwhatIcanhelphim? 60
2 It would wish to request a taxi for the 22 hours in the direction: street San Ramon
number2.
80
3 Inagreement,itsreservehasbeenmade. 90
4 Pizzawouldwishtorequestonetropicalfamiliarsize. 70
5 Deseatoaddsomeingredientmore? 65
6 Yes.Cheeseextra,jamnandpeperoni. 60
7 Verywell, cansaytoitstelephonenumberanditsdirectiontome? 80
8 My telephone number is the 777.777.777 and my direction is: street Frank Kafka
number132.
90
9 Perfect,in30minutesitwillreceiveitsorder.Thefinalpriceisof180crowns. 90
Table310.WorldLingoSpanishEnglishbenchmarkresults.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
23/56
192BRealTimeVoiceTranslator
Dictionary.Com
Translationresult Accuracy
1 GoodmorningittakescareofJuantohimGonzlez,inwhatIcanhelphim? 60
2 It would wish to request a taxi for the 22 hours in the direction: street San Ramon
number2.
80
3 Inagreement,itsreservehasbeenrealized. 75
4 Solargerelativewouldwishtorequestatropicalpizza. 50
5 Itwishestoaddsomeingredientmore? 80
6 Yes.Cheeseextra,jamnandpeperoni. 50
7 Verywell,itcansaytoitstelephonenumberanditsdirectiontome? 75
8 My telephone number is the 777.777.777 and my direction is: street Frank Kafka
number132.
80
9 Perfect,in30minutesitwillreceiveitsorder.Thefinalpriceisof180crowns. 85
Table311.Dictionary.comSpanishEnglishbenchmarkresults.
PROMT
Translationresult Accuracy
1 GoodmorningJuanGonzlezattendstohim:inwhatcanIhelphim? 75
2 Hewouldwanttoaskforataxifor22hoursinthedirection:thereisquietSanRamn
number2.
50
3 Inagreement,hisreservationhasbeenrealized. 85
4 Familiarsizewouldwanttoaskforatropicalpizza. 25
5 Doeshewanttoaddanyingredientmore? 75
6 Yes.Extraofcheese,hamandpepperoni. 100
7 Verywell:canhesaytomehistelephonenumberandhisdirection? 80
8 My telephone number is 777.777.777 and my direction is: there is quiet Frank Kafka
number132.
70
9 Perfect,in30minutesitwillreceivehisorder.Thefinalpriceis180crowns. 85
Table312.PROMTSpanishEnglishbenchmarkresults.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
24/56
20 RealTimeSpeechTranslator
Summary
Altavista Yahoo! Google InterTranFreeTrans
lation
WorldLing
o
Dictionary
.ComPROMT
1 45 45 80 20 45 60 60 70
2 90 90 100 35 80 80 80 50
3 80 80 100 65 80 90 75 85
4 50 50 90 40 80 70 50 25
5 95 95 90 35 85 65 80 70
6 75 75 100 85 100 60 50 100
7 70 70 65 30 85 80 75 80
8 85 85 100 20 85 90 80 70
9 85 85 95 60 85 90 85 85
Avg 75 75 91 43 81 76 71 70
Table313.Benchmarkssummary.
In order to differentiate better the engines scores and have a clearer idea about them, it is
requiredtohavesomestatisticscharts.Thefollowinggraphscomparestheaverage,maximum
andminimumaccuracyofeachengine.Itishighlightedwithredcolorthebestscoredengine.
73
73
90
39
79
75
69
67
0 20 40 60 80 1
AltavistaBabelfish
Yahoo!Babelfish
InterTran
FreeTranslation
WorldLingo
Dictionary.com
PROMT
AverageAccuracy
00
Figure31.AverageaccuracygraphinSpanishEnglishBenchmark.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
25/56
212BRealTimeVoiceTranslator
95
95
100
85
100
90
85
100
75 80 85 90 95 100 105
AltavistaBabelfish
Yahoo!Babelfish
InterTran
FreeTranslation
WorldLingo
Dictionary.com
PROMT
MaxAccuracy
Figure32.MaximumaccuracygraphinSpanishEnglishBenchmark.
45
45
65
20
45
6050
25
0 10 20 30 40 50 60 70
AltavistaBabelfish
Yahoo!Babelfish
InterTran
FreeTranslation
WorldLingoDictionary.com
PROMT
MinAccuracy
Figure33.MinimumaccuracygraphinSpanishEnglishBenchmark.
In Figure 34 you can see the range of marks obtained for each engine and where it is the
averageaccuracymarkwithinit.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
26/56
22 RealTimeSpeechTranslator
0
20
40
60
80
100
120
Accuracyrange
Average
Figure34.AccuracyrangechartinSpanishEnglishBenchmark.
3.4.2.2 EnglishsentencesHerearelistedtheEnglishsentencesofthedialogswhichourbenchmarkismadeof:
1. Goodmorning,itsJuanJohnSmith,whatcanIhelpyou?
2. Iwouldliketoaskforataxiat22hoursattheaddress:FifthAvenuenumber2.
3. Ok,yourreservationhasbeenmade.
4. IwouldliketoorderaTropicalpizzafamilysize.
5. Doyouwanttoaddsomeotheringredient?
6. Yes.Extracheese,hamandpepperoni.
7. Allright,canyoutellmeyourtelephonenumberandyouraddress?
8. My telephonenumber is777.777.777andmyaddress is:FrankKafkastreetnumber
132.
9. Allright,youwillreceiveyourorderin30minutes.Thefinalpriceis180crowns.
LookingattheresultsoftheSpanishEnglishbenchmark,wecannoticethatthereareseveral
engines which have a very low accuracy. Therefore, we decided to do the EnglishSpanish
benchmarkonlyonthethreeengineswhichhavethehighestmarksintheprevioustest.
In the following tables there are the benchmark results of each translator engine and the
markstheyobtainedonthetranslationofeachsentence.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
27/56
232BRealTimeVoiceTranslator
GoogleTranslator
Translationresult Accuracy
1 Buenosdas,esJuandeJohnSmith,qupuedohacerparaayudarle? 80
2 Megustarapediruntaxialas22horasenladireccin:QuintaAvenidanmero2. 100
3 Ok,sureservaseharealizado. 100
4 MegustarapedirunpizzaTropicaltamaodelafamilia. 95
5 Deseasaadiralgnotroingrediente? 100
6 S.Extraqueso,jamnypepperoni. 100
7 Estbien,puedeustedmedicesunmerodetelfonoysudireccin? 85
8 Minmerodetelfonoesel777.777.777ymidireccines:calleFrankKafkanmero
132.
100
9 Muybien,ustedrecibirsupedidoen30minutos.Elpreciofinalesde180coronas. 100
Table314.GoogleTranslatorEnglishSpanishbenchmarkresults.
Altavista&Yahoo!(BabelFishEngine)
Translationresult Accuracy
1 Buenamaana,lesJuanJohnSmith,culpuedeyoayudarle? 40
2 Quisierapediruntaxien22horasenladireccin:QuintaAvenidanmero2. 50
3 Sehahecholaautorizacin,sureservacin. 40
4 Quisierapediruntamaodelafamiliatropicaldelapizza. 30
5
Usted
quiere
agregar
un
poco
de
otro
ingrediente?
75
6 S.Queso,jamnysalchichonesadicionales. 80
7 Todoaladerecha,puedeusteddecirmesunmerodetelfonoysudireccin? 70
8 Minmerode telfonoes777.777.777ymidireccines: Callenmero132 deFrank
Kafka.
100
9 Todoaladerecha,ustedrecibirsuordenen30minutos.Elpreciofinales180coronas. 80
Table315.BabelFishengineEnglishSpanishbenchmarkresults.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
28/56
24 RealTimeSpeechTranslator
FreeTranslation
Translationresult Accuracy
1 Buenosdas,esJuanJohnSmith,lequpuedoayudaryo? 70
2 Querrapediruntaxien22horasenladireccin:QuintonmerodelaAvenida2. 60
3 Bueno,sureservacinhasidohecha. 80
4 QuerraordenarunapizzaTropicaleltamaofamiliar. 85
5 Quiereustedagregaralgnotroingrediente? 100
6 S.Elquesoextra,eljamnylasalchicha. 100
7 Bueno,puededecirmeustedsunmerodetelfonoysudireccin? 95
8 Minmerodetelfonoes777.777.777ymidireccines:Elnmerofrancode lacalle
deKafka132.
60
9 9.Bueno,ustedrecibirsuordenen30minutos.Elpreciofinales180coronas. 80
Table316.FreeTranslationEnglishSpanishbenchmarkresults.
Inthefollowinggraphs,youcanseethestatisticscomparisonbetween thedifferentscored
engines.Ineachgraph,itishighlightedwithredcolorthebestscoredengine.Intheaccuracy
range chart, you can see the range of marks obtained for each engine and where it is the
averageaccuracymarkwithinit.
95
59
80
0 20 40 60 80 100 120
Babelfish
FreeTranslation
AverageAccuracy
Figure35.AverageaccuracygraphinEnglishSpanishBenchmark.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
29/56
252BRealTimeVoiceTranslator
100
100
100
0 20 40 60 80 100 120
Babelfish
FreeTranslation
MaxAccuracy
Figure36.MaximumaccuracygraphinEnglishSpanishBenchmark.
80
30
60
0 20 40 60 80 100
Babelfish
FreeTranslation
MinAccuracy
Figure37.MinimumaccuracygraphinEnglishSpanishBenchmark.
0
20
40
60
80
100
120
Google Babelfish FreeTranslation
Accuracyrange
Average
Figure38.AccuracyrangechartinEnglishSpanishBenchmark.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
30/56
26 RealTimeSpeechTranslator
3.4.3 Conclusion
Looking at the both benchmark results, we can see that Google Translate stands out as the
bestwebtranslatorservice.Thetranslationsitprovidesarequiteunderstandableinallcases,
whilsttheresultsprovidedforotherenginesarenotinmanycases.Forthisreason,wechose
GoogleTranslateEngineforusinginourproject.
3.5SoftwareToolsFor thedevelopmentof theRealTime VoiceTranslator ithasbeennecessarytouseseveral
softwaretools.Thistoolscoverawiderangeofsoftwareapplications,fromHTTPserverstoa
programming IDEs.Eachtoolhasbeenusedtoaccomplishataskwithinthelargenumberof
parts our project is made of. In the following sections I am going to explain each of these
softwareapplications,theirfunctioninsidetheprojectarchitectureandhowwehaveusedand
reconfigu dthem.
3.5.1 IBM WebSphere Application Server
IBM WebSphere Application Server (WAS) is the flagship product within IBM's WebSphere
brand. WAS is built using open standards such as Java EE, XML, and Web Services. It works
with a number of Web servers including Apache HTTP Server, Netscape Enterprise Server,
MicrosoftInternetInformationServices(IIS),IBMHTTPServerfori5/OS,IBMHTTPServerfor
z/OS, and IBM HTTP Server for AIX/Linux/Microsoft Windows/Solaris. WAS V6.1 is the
foundationoftheIBMWebSpheresoftwareplatform.Itdeliversthesecure,scalable,resilient
applicationinfrastructurethatisneededforaServiceOrientedArchitecture(SOA).
WeinstalledWebSphereApplicationServerV5.1onaLinuxsystemforusingtheVoiceEnabler
andVoiceEnginesserversitprovidesinourproject.Asitisexplainedinsection2.1Software,
wehavedeployedWebSphereApplicationServerintwoservers:adelaandwas.
3.5.2 Java EE
JavaEEistheversionoftheJavaPlatformusedtodevelopEnterpriseApplications,focusingon
a distributed server environment. The platform was known as Java 2 Platform, Enterprise
Edition or J2EE until the name was changed to Java EE in version 5. The current version is
calledJavaEE5whilstthepreviousversioniscalledJ2EE1.4.
JavaEEbuildsonthesolidfoundationofJavaPlatform,StandardEdition(JavaSE)and isthe
industry standard for implementing enterpriseclass serviceoriented architecture (SOA) and
nextgeneration web applications. The SDKs contain Sun GlassFish Enterprise Server,
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
31/56
272BRealTimeVoiceTranslator
previously named Sun Java System Application Server, and provide support for Java EE 5
specifications.
The Java EE Platform differs from the Standard Edition (SE) of Java in that it adds libraries
which provide functionality to deploy faulttolerant, distributed, multitier Java software,
basedlargelyonmodularcomponentsrunningonanapplicationserver.ItincludesseveralAPI
specifications, such as JDBC, RMI, email, JMS, web services, XML, etc, and defines how to
coordinatethem.JavaEEalsofeaturessomespecificationsuniquetoJavaEEforcomponents.
TheseincludeEnterpriseJavaBeans,servlets,portlets(followingtheJavaPortletspecification),
JavaServerPagesandseveralwebservicetechnologies.
WechoseJavaEEplatformtodevelopourapplication.OurdynamicVoiceXMLapplicationsuse
thepowerofJavaServletstointeractwithotherwebservicesavailableininternetandmanage
data from different sources to create new voice services. We are currently using Java EE 5
version.
3.5.3 Apache Tomcat HTTP Server
Apache Tomcat is a Servlet container developed at the Apache Software Foundation (ASF).
Tomcat implements the Java Servlet and the JavaServer Pages (JSP) specifications from Sun
Microsystems,andprovidesa"pureJava"HTTPwebserverenvironmentforJavacodetorun.
TomcatshouldnotbeconfusedwiththeApachewebserver,whichisaCimplementationofa
HTTPwebserver;thesetwoHTTPwebserversarenotbundledtogether.
ApacheTomcatincludestoolsforconfigurationandmanagement,butcanalsobeconfigured
byeditingconfigurationfilesthatarenormallyXMLformatted.
As I mentioned in previous sections, we use Java Servlets for implementing most of the
applicationsinourproject.Forthisreason,weneedtouseaServletcontainerandwechose
theApacheTomcatone.WearecurrentlyrunningApacheTomcat6forLinuxonadelaserver.
ThisversionisdesignedtouseJavaEE5.
3.5.4 Eclipse IDE
Eclipse is an integrated development environment (IDE) written primarily in Java. The initial
codebase originated from VisualAge, a family of computer integrated development
environments from IBM, which included support for a few popular (and not so popular)
computer programming languages. In its default form it is meant for Java developers,
consistingoftheJavaDevelopmentTools(JDT).Userscanextend itscapabilitiesby installing
plugins written for the Eclipse software framework, such as development toolkits for other
programming languages,and canwrite and contribute theirown plugin modules. Language
packsprovidetranslationsintooveradozennaturallanguages.Itisopensourcesoftwareand
itisavailableforseveraloperationsystems.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
32/56
28 RealTimeSpeechTranslator
The Eclipse Project is supported by the most important industry leaders like Borland, IBM,
MERANT, QNX Software Systems, Rational Software, Red Hat, SuSE, TogetherSoft and
Webgain.
IhavebeenusingEclipse3.3IDEfordevelopingallthejavaservletsforourProject.Insection
4.5.6 Creating and running a dynamic VoiceXML application I will explain how I use it to
developVoiceXMLapplications.
Figure39.EclipseIDEscreenshot.
3.5.5 IBM Rational Web Developer & Voice Toolkit
IBM Rational Application Developer for WebSphere Software extends Eclipse with visual
construction development. It helps Java developers rapidly design, develop, assemble, test,
profileanddeployhighqualityJava/J2EE,Portal,Web,WebservicesandSOAapplications.
We use IBM Rational Web Developer (RWD) in ACC Project to develop static Voice
Applications. It is really easy and intuitive to create a whole application using the visual
constructenvironment that RWD provides. For this purpose it isneeded to install theVoice
ToolkitwhichintegrateswithRWDenvironmentandprovidestheCommunicationFlowBuilder
tool.UsingtheCommunicationFlowBuilder it isveryeasytocreatetheflowdiagramofthe
voiceapplicationonlydragginganddroppingthevisualelementsandconnectingthem.Once
we have created the Communication Flow, the RWD creates the VXML application file
automatically.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
33/56
292BRealTimeVoiceTranslator
Figure310.IBMRationalWebDeveloperandCommunicationFlowBuilderscreenshot.
3.6 wIconfigurethesoftwaretools3.6.1 Tomcat 6 Installation and configuration on Fedora Linux
Ho
ThissectionwilldescribestepbystepwhatitisneededtodotoinstallandconfigureApache
Tomcat6onourFedoraLinuxserver.
3.6.1.1 DownloadbinariesandprerequisitesJDK(JavaDevelopmentKit)
Tomcat 6.0 was designed to run on J2SE 5.0. You can download Java EE SDK from
http://java.sun.com.
Tomcat
Binary downloads of the Tomcat server are available from
http://tomcat.apache.org/download60.cgi.
SetanenvironmentvariableCATALINA_HOMEthatcontainsthepathnametothedirectoryin
whichTomcat6hasbeeninstalled.
Ant
Binary downloads of the Ant build tool are available from
http://ant.apache.org/bindownload.cgi. Download and install Ant from the distribution
http://tomcat.apache.org/download-60.cgihttp://tomcat.apache.org/download-60.cgihttp://tomcat.apache.org/download-60.cgihttp://ant.apache.org/bindownload.cgihttp://ant.apache.org/bindownload.cgihttp://tomcat.apache.org/download-60.cgi8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
34/56
30 RealTimeSpeechTranslator
directorymentionedabove.Then,addthebindirectoryoftheAntdistributiontoyourPATH
environment variable, following the standard practices for your operating system platform.
Once you have done this, you will be able to execute theant shell command directly. You
should use a command like it is shown below and modify your .bashrc file in your home
directory:
PATH=$PATH:/opt/SDK/lib/ant/bin
3.6.1.2 EnvironmentvariablesrequiredCATALINA_HOME
ItmaypointatyourCatalina"build"directory.
CATALINA_BASE (Optional)
ItisthebasedirectoryforresolvingdynamicportionsofaCatalinainstallation. Ifnotpresent,
resolvestothesamedirectorythatCATALINA_HOMEpointsto.
JAVA_HOME
ItmustpointatyourJavaDevelopmentKitinstallation.Requiredtorunthewiththe"debug"
or"javac"argument.TosettheJAVA_HOMEenvironmentvariable it isnecessarytoexecute
thefollowingcommand:
export JAVA_HOME=/opt/SDK/jdk
Youcanaddthislineinthe.bashrcfileofyourhomedirectorytodoitautomaticallyeachtime
thesystemruns.
JRE_HOME
ItmustpointatyourJavaDevelopmentKitinstallation. DefaultstoJAVA_HOMEifempty.
3.6.1.3 ConfigurationfilesServer.xml
ThisistheTomcatservermainconfigurationfile.Bydefaultitisatconf/server.xml.Wewillnot
modifyitandwillleaveitwiththedefaultvalues.
Build.xml
Itisnecessarytomodifythedefaultbuild.xmlfileprovidedbytomcatatthefollowingline:
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
35/56
312BRealTimeVoiceTranslator
stepsrequired.Thisfileisstoredinthetopleveldirectoryofyoursourcecodehierarchy,and
shouldbecheckedintoyoursourcecodecontrolsystem.
Web.xml
Thisisthewebapplicationconfigurationfileandcontrolsalltheparametersrelatedwitheach
web application. This file will be discussed in section4.5.6Creatingand runningadynamic
VoiceXMLapplication.Thereisoneweb.xmlfileforeachapplicationinthedirectorypath:
$CATALINA_HOME/webapps/{my_web_application}/WEB-INF
3.6.1.4 StandarddirectorylayoutAwebapplicationisdefinedasahierarchyofdirectoriesandfilesinastandardlayout.Sucha
hierarchycanbeaccessed in its"unpacked"form,whereeachdirectoryandfileexists inthe
filesystemseparately,orina"packed"formknownasaWebARchive,orWARfile.Theformer
format ismoreusefulduringdevelopment,whilethe latter isusedwhenyoudistributeyour
applicationtobeinstalled.
To run a web application on Tomcat you will need to arrange your application files in the
followinghierarchyinyourapplication's"documentroot"directory:
*.html,*.jsp,etc. TheHTMLandJSPpages,alongwithotherfilesthatmustbevisible
to the client browser (such as JavaScript, stylesheet files, and images) for your
application. In larger applications you may choose to divide these files into a
subdirectoryhierarchy,butforsmallerapps, it isgenerallymuchsimplertomaintain
onlyasingledirectoryforthesefiles.
/WEBINF/web.xml TheWebApplicationDeploymentDescriptorforyourapplication.
This isanXML file describing theservlets andothercomponents thatmakeupyour
application, along with any initializationparameters and containermanaged security
constraintsthatyouwanttheservertoenforceforyou.Thisfileisdiscussedinmore
detailinthefollowingsections.
/WEBINF/classes/ This directory contains any Java class files (and associated
resources)requiredforyourapplication,includingbothservletandnonservletclasses,
thatarenotcombinedintoJARfiles.IfyourclassesareorganizedintoJavapackages,
youmustreflectthisinthedirectoryhierarchyunder/WEBINF/classes/.Forexample,
aJavaclassnamedcom.mycompany.mypackage.MyServletwouldneedtobestoredin
afilenamed/WEBINF/classes/com/mycompany/mypackage/MyServlet.class.
/WEBINF/lib/ This directory contains JAR files that contain Java class files (and
associatedresources)requiredforyourapplication,suchasthirdpartyclass libraries
orJDBCdrivers.
Likemostservletcontainers,Tomcat6alsosupportsmechanismstoinstalllibraryJARfiles(or
unpacked classes) once, and make them visible to all installed web applications (without
havingtobeincludedinsidethewebapplicationitself.
$CATALINA_HOME/common/lib JAR files placed here are visible both to web
applicationsandinternalTomcatcode.ThisisagoodplacetoputJDBCdriversthatare
requiredforbothyourapplicationandinternalTomcatuse(suchasforaJDBCRealm).
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
36/56
32 RealTimeSpeechTranslator
$CATALINA_BASE/shared/lib JARfilesplacedherearevisibletoallwebapplications,
but not to internal Tomcat code. This is the right place for shared libraries that are
specifictoyourapplication.
3.6.2 IBM RWD Voice Language and Voice Engines Configuration
OncetheCommunicationflowisdoneandbeforethegenerationoftheVXMLfile,itisneeded
toconfiguretheVoiceEnginesandtheSpeechLanguageourapplication isgoing touse.For
doingthis,youneedtodothefollowingsteps:
MenuWindow>Preferences
Attheleftpanelofthewindowthatappearsselect:
VoiceTools>SpeechEngines >WebSphereVoiceServer
VoiceLanguage:USEnglish/LatinAmericaSpanish
Compressionformat:mulaw
RecognizerService
Mediaurl:rtsp://was.feld.cvut.cz/media/recognizer
Encoding:UTF8
SynthesizerService
Mediaurl:rtsp://was.feld.cvut.cz/media/synthesizer
Encoding:UTF8
TTSVoice:Default
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
37/56
332BRealTimeVoiceTranslator
Figure311.IBMRWDVoiceLanguageandVoiceEnginesconfiguration.
3.7Creating and running a dynamic VoiceXMLapplication
In thissection Iamgoing toexplainhow touseall thesoftware tools explained in previous
sectionstocreateaVoiceXMLapplicationandrun itonourplatform.Thissection iswritten
like a step by step manual in order it were easier to follow all the procedure and to show
clearlyhowtouseeachtool.
Brieflytheprocedureis:writetheservletcodeinEclipseIDE,copythegeneratedservletbinary
filesintotheserver(insidetheTomcatdirectory)andupdateTomcatandVEtorecognizethe
newapplication.Seethefollowingsubsectionswherethereareexplainedmoredetailedeach
step.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
38/56
34 RealTimeSpeechTranslator
3.7.1 Create a new Java Project in Eclipse IDE
AlthoughEclipsegivesyouthechancetocreateaDynamicWebApplicationProjectwhenyou
trytocreatenewproject,wewillnotusethisoption.Ifyouchoosethis,Eclipseassistyouinall
thetasksinvolvedinthecreationofthiskindofapplications,butalltheprocedureswithinthis
kind of applications are very complex and take much time to understand how to use them.
Therefore,likeforourpurposeswedonotneedmostofthisassistance,wearegoingtocreate
asimplyJavaProject,whichwillprovideuswhatweneedanditismucheasiertounderstand.
Tocreatethenewproject,yousimplyhavetogotothemenuFile>New>JavaProjectand
followthewizard.Afterthat,youcanbeginaddingnewclassesandwritingtheservletscode.
3.7.2 Adding JEE libraries in Eclipse IDE
Inthepackageexplorer,rightclickon theprojectnamewhichyouaregoing todevelop the
servletin.SelectPropertiesonthepopupmenuthatappears.Ontheleftsideoftheproperties
windowsselectJavaBuildPathandthenontherightsideofthepropertieswindowschoose
theLibrariestab.ClickthebuttoncalledAddExternalJARsandthenselectthejavaee.jar
file(orj2ee.jarinsomeJDKversions)locatedin:
$JAVA_EE_SDK/lib/javaee.jar
Figure312.Addinglibrariesstep1.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
39/56
352BRealTimeVoiceTranslator
Figure313.Addinglibrariesstep2.
3.7.3 Create the main procedure in the Servlet class
ItisneededtocreateanemptypublicvoidmainprocedureintheHTTPServletclass,dueifitis
not present, eclipse will not compile the class using the Java perspective. We can use the
JavaEE perspective to create an HTTPServlet (it would be better) instead, but it is more
complextounderstandhowthisperspectiveworks.IfweusetheJavaPerspectivewithclasses
withMainprocedure,wecanworkwithEclipselikewewerewritinganormalJavaapplication
and,later,copythefilesmanuallyintheTomcatwebserverdirectorystructure.
3.7.4 Copy the binary files to the server
Once,youhavecompiledthejavacode(chooseRunJavaApplication ifyouareasked,not
Run on Server) it is need to copy the generated class file to the server (adela) inside the
ApacheTomcat webappsdirectory. Thedestinationpathfortheclassfileshouldbe:
$TOMCAT_DIR/webapps/{my_web_application}/WEB-INF/classes
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
40/56
36 RealTimeSpeechTranslator
IfyouhaveusedsomeexternalAPInot includedneither intheJavaStandardSDKnor inthe
JavaEESDK,youshouldincludetheJARfilewiththeAPIpackagesinthefollowingpath:
$TOMCAT_DIR/webapps/{my_web_application}/WEB-INF/lib
You should restart the server every time you copy a new file or if an existing file has been
modified.
3.7.5 Tomcat application configuration
YouneedtoconfigureyourTomcatApplicationtoallowthenewservletbereachableforthe
users.Youneedtomodifytheweb.xmlfilewhichisat
$TOMCAT_DIR/webapps/{my_web_application}/WEB-INF
Addingthefollowinglines(itisusedtheTransaltorServletconfigurationasexample):
TranslatorServlet
TranslatorServlet
TranslatorServlet
/translator.vxml
Figure314.Linestobeaddedintheapplicationsweb.xmlfile.
Whereeachtagmeans:
Tag Description
In this section we define internal configuration (used inside Tomcat
server)foreachservlet.
In this section we define the relative URL where each servlet will be
reachable.
Specifies the name it is known a servlet inside the web.xml file. It is
definedwiththissametaginsidethesection.
Inthesectionitdefinestheclassfilewhichisassociatedwitha
servletname.
Definestherelativeurlforaspecifiedservletwheretheservletwillbe
reachable.Itisusedinsidethesection. Thevalueof
this tag will be /{servlet_relative_url}, which means that the servlet
willbereachableat:
http://myserver.com/{myWebApplication}/{servlet_relative_url}
Table317.Web.xmlfiletags.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
41/56
372BRealTimeVoiceTranslator
IftheservlethavetoprovideaVXMLoutputtobeinterpretedbytheVoiceEnabler(VE),asis
thecaseofallVoiceXMLapplications;thehastoendwith.vxml,becausethe
VEonlyisabletoreadthisfileformat.Althoughtheoutputoftheservletwouldbethesame
VXMLcode,ifthedontendwith.vxmltheVEwillnotprocessit.
Youshouldrestarttheservereverytimetheweb.xml ismodifiedbeforethechangestake
effect.
Once it has been done, the new servlet will be reachable at the following URL (in this
example):
http://myserver.com/{myWebApplication}/translator.vxml
TheVoiceXMLapplicationsIhavecreatedsofar,needtheTomcatserverwererestartedfrom
the $CATALINA_BASE directory. It is because the servlets need to know the absolute path
wheretherearelocatedtheirbinaryfiles.Maybethisissuecouldbesolvedinthefuture.
Therefore,runtherestartexecutablefilefromthepath$CATALINA_BASE,bydoing:
cd /opt/apache-tomcat-6.0.16
Andonceyouareinthispathyouhavetoexecutethefollowingcommand:
./bin/restart.sh
3.7.6 Configuring VE to use the VXML Application generated by theservlet
At thispoint, itonlyremains tomodify theVEconfigurationfiles tomakeavailable thenew
VoiceXMLapplicationinaspecifiedphonenumber.Whenitwasdoneandtheusercallstothe
assignedphonenumber,VEwilllaunchourVoiceApplicationtoattendhim.
To assign a phone number to the Voice Application generated by the servlet, you should
modify theDefaultInboundCCXML.jsp VE configuration file. It could be found in the
followingpathinadelaserver:
/opt/WebSphere/AppServer/installedApps/adela/
WVX5.1-adela.ear/wvxweb.war/DefaultInboundCCXML.jsp
Insidethisfileyoushouldfindthetextblockshowedin
Figure45.Youhavetoaddanewlineattheendofthisblock,followingtheexamplesofthe
otherlines.InthenewlineyoushouldsetanavailablephonenumberandlinkitwiththeURL
wheretheVXMLfilewhichcontainstheVoiceApplicationcanbefound.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
42/56
38 RealTimeSpeechTranslator
var number = extractNumber(evt.connection.local);
if(!isNaN(number)) switch(number){
case 700: vxml_app = "DefaultVXML.jsp"; break;
case 701: vxml_app = "http://localhost/vxml/identity.vxml"; break;
case 702: vxml_app = "http://localhost/vxml/mexicanRestaurant_v2.vxml"; break;
case 703: vxml_app = "http://localhost/vxml/cvutCallCenter.vxml"; break;
case 704: vxml_app = "http://localhost:8080/myservlets/dynamicVXML_hour.vxml";
break;
case 705: vxml_app = "http://localhost:8080/myservlets/translator.vxml"; break;
case 706: vxml_app = "http://localhost/vxml/mexicanRestaurant_v2_EN.vxml"; break;
case 777: vxml_app = "http://localhost/vxml/icecream_alt.vxml"; break;
default: vxml_app = "http://localhost:8080/vxmlide/file?phone=" + number; break;
}
]]>
Figure315.TextblocktobemodifiedinDefaultInboundCCXML.jspfile.
3.8ApplicationschemaIfyouwanttounderstandthewayIhavefollowedtoaccomplishtheaimofRealTimeSpeech
TranslatorProject,itwouldbereallyinterestingtohaveaclearideaofthedifferentpartsthis
big application is made up. Knowing this you will be able to understand de different step
explainedinsection4.8Achievedgoalsbecauseeachofthemhasthepurposetoaccomplish
one of these tasks. In Figure 46, you can see the whole schema of Real Time Speech
Applicationanditsmainparts.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
43/56
392BRealTimeVoiceTranslator
Figure316.RealTimeSpeechTranslatorApplicationschema.
3.9TextToSpeechandSpeechRecognitionThis point is where we found one of the biggest problems to cope with during the
developmentofthisproject.Whenwebeginthisproject,wethoughttheSpeechRecognition
Enginetransformedinternallythespeechtotextandthenitcomparestheobtainedtextwith
thetextwrittenintheVXMLcode.Thisassumptionwaswrongandifwethoughtthis,wasdue
our lack of knowledge with this technology. Actually, the Speech recognition is done by
convertingthewrittentextintheVXMLfileintoaudioandthencomparingthiswiththeaudio
comingfromthecaller.Therefore,itisneededtohaveagrammarfilewhichcontainsallthe
wordsandsentencesyouwantyourapplicationwereabletorecognize.Thisisabigobstacle
we have not solved yet, because for our purposes we need to have a full recognition of
whateverthecallersaysanditisalmostimpossibletogeneratesuchabiggrammar.
3.10AchievedgoalsIn the following subsections I am going toexplain the aims achieved in the RealTime Voice
Applicationsofar.Asitisaverycomplexproject,Ihavedivideditinseveralsimplerparts.Each
of these parts has the proposal of solving one of the requisites or implements one of the
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
44/56
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
45/56
412BRealTimeVoiceTranslator
3.10.4 Generate dynamic VXML code
Untilnowwehavedevelopedseveraldifferentvoiceapplications,butalltheyarestaticvoice
applications.Itmeansthatallthecallflowhavetobepredictedintheimplementationtime.It
alsomeansthatalltheinformationtheycansupplyandtheoptionstheyoffertothecallerhas
to be known at the development time. Once the application is running, there are not any
chance to do anything it was not being considered when the programmer wrote the
application. This is because VoiceXML applications are based in static files written in VXML
language.
ThisissueisabigobstacletocreatemanyVoiceApplicationswhichwouldbeabletoprovide
veryinterestingservicesthatarenotaccessibleorknownatdevelopmenttime.Inthecaseof
Real Time Voice Application, the developer will know neither which sentence will the caller
wanttotranslatenorthetranslatedsentencetoreturn.
Forthisreason,ournextstepwasgettingtheknowledgetocreatedynamicvoiceapplications.
Toachieveit,Idoadeepresearchwithinmanydifferenttechnologiestofindoutwhichones
weneededforourpurposesandwhichtoolsweneedtouse.Afterthisresearch,Icreatedthe
DynamicTimeApplication,ourfirstDynamicVoiceApplication.Thisisasimpleapplicationthat
saysthecurrenttimetothecaller.Fordoingit,theapplicationgetsthecurrenttimefromthe
machineandgeneratesontheflytheVXMLfilewhichwillbesenttotheVEtobeinterpreted
andfinallybeplayedtothecaller.
Figure317.DynamicTimeApplicationschema.
3.10.5 Make queries to websites and process the response
Once we have the capability to create dynamic voice applications, we have to use this
knowledge for going forward to our aims in the project. To do this, next step is having the
capability to obtain data form a website. This would be very interesting for a big range of
applications that were based on web services. Some examples would be Real Time Voice
Translator, which needs to get the translated sentences from web translation services;
WikipediaVoiceProject,whichenablestogetthe informationfromWikipediaandplay iton
yourmobilephone;theWeatherForecastApplicationwhichgetstheweatherforecastforma
websiteandplaysitbacktotheuser;
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
46/56
42 RealTimeSpeechTranslator
ToachievethisaimIcreatedtheGenericPostservlet.Thisisaservletthatimplementsallthe
stepsnecessarytocarryoutaVoiceApplicationbasedonwebservices.Thisservletisableto
makeaquerytoanywebsitewithindependenceofwhateverHTTPqueriesituses:HTTPPOST
or HTTP GET. GenericPost servlet makes a query to the website, gets the data from it, and
forwardsittotheuserswebbrowser.InourVoiceApplicationswewillusethissameschema
butwewillforwardthedataobtainedfromthewebsitetotheVEinsteadtoawebbrowser.
Figure318.SchemaforaVoiceApplicationwhichneedsgettingdatafromawebsite.
3.10.6 Make queries to Google Translator
At this point, we were very close to one of the main aims of our Project: given a sentence,
translate it and plays it back to the user. To accomplish this goal, we had to modify the
GenericPost servlet to make queries to Google Translate service and integrate it in a Voice
Applicationwhichplaysthetranslatedsentencetotheuser.
Itappearedtobeasimplystep,butaftermodifyingtheGenericPostservletforourpurposes
Google Translatedenied us theaccess to itsservice. After some research, I findout that all
GoogleServicesdeniesaccesstotheirservicestoallautomaticapplications.Theyonlyaccepthuman users to avoid DoS attacks. To avoid this handicap, I modified the behavior of
GenericPost servlet to emulate the behavior of a human user which uses a Mozilla web
browser. I change the HTTP header our application uses to connect with the server for
cheatingit.Inthisway,nowwewereabletoaccesstoallGooglewebservices.
Despite, at this point, we already had a manner to get the translated sentences using the
GenericPostservlet,IcontinuemyresearchandIfindoutabetterwaytodothis.Ifindouta
betaAPIforJavathatenablestouseGoogleTranslateServices,withoutdoingHTTPqueries.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
47/56
432BRealTimeVoiceTranslator
3.10.6.1GoogleTranslatorAPIGoogleTranslatorAPIforJavaisanAPIforJavaprogramminglanguagedevelopedforGoogle
to enable Java applications to use their services. Although it is currently a beta version, the
testsIdoneshowedthatitworkswellforourpurposes.UsingthisAPIisabetterwaytoaccess
GoogleTranslatorservice,because itsaveus toprocess thereturnedHTML code toget the
translatedsentenceanddroptherestofHTMLcodethatgeneratesthevisualinterfaceofthe
page.Anotheradvantageisthatwedonotdependontheinterfacechangesdoneintheweb
inthefuture,whichwouldmodifytheHTMLcodeandwouldforceustochangethealgorithms
weusetoprocessit.
3.10.7 Get dynamic data from a VoiceXML Application
Althoughwehavethewaytogetdatafromanywebserviceandplayitbacktotheuser,we
alsoneedtoknowthewayfor integrating it inawholeVoiceApplication. Itmeanstoknow
howtolaunchourservlets,whichgetthisdataandreturnittotheuser,fromtheVXMLcode
of a Voice Application. For this purpose we decided to use the DATA tag of VXML language
whichenablesaVoiceApplicationtogetdatafromawebservicethroughaHTTPquery.
Therefore,weneededtomakeourservletsbereachableforthistag.Torealizethis,Imodified
allservletsdoneup tonow towereabletoreceive theconfigurationparameterstheyneed
fromaHTTPquery(HTTPGETorHTTPPOST).Beforedoingthis,theygettheparametersfrom
a configuration file. With this modification, Translator Application, for example, is able to
receive the sentence to be translated and the original and destination languages through a
HTTPquery.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
48/56
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
49/56
453BConclusions
4 ConclusionsWhenwebeganthisprojectseveralmonthsago, itwasabigchallengeforallourteam.For
me,itwasthefirsttimeIusesomeofthesetechnologies,likeVoiceXML.Weallhavehadtodo
a hard work to learn how to manage them and reach our aims. It has been a very nice
experience for me, and I have had the possibility to do a true team work with other very
qualified people. This has permitted us to benefit each other and, in my case, improve my
personalskillsquickly.
RegardingtheRealTimeVoiceTranslator,weachieved importantgoals.Although Ihavenot
reach the final aim of creating an application which was able to translate in real time a
conversationbetweentwocallers,Ihavedonemostofthenecessarystepstoachieveit.Itisa
verycomplexProjectandrequiresalotofintermediatesteps,whichIwasnotabletoimagine
atthebeginningduemylackofknowledgeonthetechnologiesinvolvedin.Despitethis,Ithink
that linking theachievedmidstepsandaddingfewmore features Icouldreachthegoalwe
proposed at the beginning. I also think that, with the knowledge acquired during this time,
achieve the new aims would be more easy and quickly due to the most difficult parts have
alreadybeendone.
ForafutureevolutionofmyworkontheRealTimeVoiceTranslatorProject,Iproposetodo
thefollowingstepstosolvetheproblemsIhavefoundortoaddnewfeatures:
CreateaMultilanguageapplication,i.e.usedifferentlanguagesinasameVXMLfile.It
willbenecessaryforexampletogivethecallerthechancetochoosehislanguageand,
afterhischoice,usetheselectedlanguagetospeakwithhim.
Findouthowto launchaVoiceApplicationfromanotherVoiceApplication.Itwillbe
necessarytousethisinmanysituations.Forexample,wecouldcreateanapplication
whichgivesthechancetothecallertochoosewhichapplicationhewantstouse,and
afterit,redirectthecalltothechosenapplication.
Findoutthewaytodoacyclicapplication,i.e.anapplicationthatneverendsuntilthe
userwants.ItwillbenecessarytoimplementthisfeatureinmanyVoiceApplications.
Forexample, itwouldbenecessary intheRealTimeSpeechTranslatortobeableto
translate all the sentences the user says indefinitely until he wants to finish the
conversation.
TrydifferentwaystogetdataintoaVoiceXMLApplication.Nowwehaveonlytested
theDATAtag,butitwouldbenecessarytotryotheroptionstoknowthebenefitsof
eachoneandknowwhenitisbettertouseoneorother.
Find out an efficient algorithm to process the HTML code and get the useful data
withinitforourpurposes.
Find theways to improve thewaiting timewhenweperformaquery toawebsite.
Maybe it would be possible to use a multithreading solution to make simultaneous
queriestoseveralwebsitesatthesametimeoruseabuffertostoretheresponses.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
50/56
46 RealTimeSpeechTranslator
Solve the grammar problem to achieve full conversation recognition without the
restriction of a set of words fixed in advance as it is the case of the current
applications.
Duringthedevelopmentofthisprojectwehavefacedalotofobstaclesthatwerecomingup
andweturnedoutsuccessful inmostofthem. Ithink thehardestpartof thewayhasbeen
doneandnowwehavethechance tocontinue improvingourVoiceApplicationsandcreate
newandevenmorecomplexonesapplyingtheknowledgewehaveobtaineduptonow.
I think that VoiceXML technology has an amazing future and there are a wide range of
applications where it could be used. ACC team has demonstrated its good skills and it is
carryingoutveryambitiousprojects;henceIthinkithasagoodchancetobecomeareference
TeaminVoiceApplicationsdevelopment.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
51/56
474BAPPENDIXA:Definitions
5 APPENDIXA:DefinitionsPrivateBranch eXchange (PBX)
Private Branch eXchange (PBX) is a telephone exchange that serves a particular business or
office, as opposed to one that a common carrier or telephone company operates for many
businessesorforthegeneralpublic.PBXsarealsoreferredtoas:
PABX PrivateAutomaticBrancheXchange.
EPABX ElectronicPrivateAutomaticBrancheXchange.
TrunkLine
When dealing with a PBX, trunk lines are the phone lines coming into the PBX from the
telephoneprovide
SessionInitiationProtocol(SIP)
The Session Initiation Protocol (SIP) is an applicationlayer control (signaling) protocol for
creating,modifying,andterminatingsessionswithoneormoreparticipants.Itcanbeusedto
create twoparty, multiparty, or multicast sessions that include Internet telephone calls,
multimedia distribution, and multimedia conferences (RFC 3261). SIP is designed to be
independentoftheunderlyingtransportlayer;itcanrunonTCP,UDP,orSCTP.
Media Resource ControlProtocol(MRCP)
MRCP is the Media Resource Control Protocol proposed to the IETF. It is a communication
protocol which allows speech servers to provide various speech services (such as speech
recognitionandspeechsynthesis)toitsclients.Typically,thismeanstheserversoftwarewill
be running on one computer and clients can send MRCP messages to the server over a
network,usuallyontopofanotherprotocol,suchasRTSPorTCP.Therearetwoversionsof
theMRCPprotocol.Version2usesSIPasthecontrolprotocol,whereasversion1usesRTSP.
Time StreamingProtocol(RTSP)
TheRealTimeStreamingProtocol(RTSP),developedbythe IETFandcreated in1998asRFC
2326, is a protocol for use in streaming media systems which allows a client to remotely
controlastreamingmediaserver,issuingVCRlikecommandssuchas"play"and"pause",and
allowingtimebasedaccesstofilesonaserver.
VoiceXML(VXML)
VoiceXML(VXML)istheW3C'sstandardXMLformatforspecifyinginteractivevoicedialogues
betweenahumanandacomputer.Itallowsvoiceapplicationstobedevelopedanddeployed
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
52/56
48 RealTimeSpeechTranslator
inananalogouswaytoHTMLforvisualapplications.JustasHTMLdocumentsareinterpreted
byavisualwebbrowser,VoiceXMLdocumentsareinterpretedbyavoicebrowser.Acommon
architecture istodeploybanksofvoicebrowsersattachedto thepublicswitchedtelephone
network(PSTN)sothatuserscanuseatelephonetointeractwithvoiceapplications.
CallControleXtensibleMarkup Language (CCXML)Call Control eXtensibleMarkup Language (CCXML) is anXML standard designed to provide
telephonysupporttoVoiceXML.ItscurrentstatusisaW3CWorkingDraft,adopted19January
2007.WhereasVoiceXML isdesigned toprovideaVoiceUser Interface toavoicebrowser,
CCXML isdesigned to inform thevoicebrowserhow tohandle the telephonycontrolof the
voicechannel.The twoXMLapplicationsarewholly separateandarenot requiredbyeach
othertobeimplemented.
OnecriticalthingtounderstandisthatCCXMLisnotamedia/dialoglanguagelikeVoiceXML.It
only provides support tomove calls around and connect them to dialog resources. CCXML
doesnotprovideanydialog resourceson itsown. (Note:Adialog resource isanything that
interactswith a caller via voice, such as a VoiceXML platform or even a second caller at
anotherlocation.)
VoicebrowserAvoicebrowserisawebbrowserthatpresentsaninteractivevoiceuserinterfacetotheuser.
Inaddition,ittypicallyprovidesaninterfacetothePSTNoraPBX.Justasavisualwebbrowser
workswith HTML pages, a voice browser operates on pages that specify voice dialogues.
Typically these pages are written in VoiceXML, the W3C's standard voice dialog markup
language,butotherproprietaryvoicedialoguelanguagesremaininuse.
Avoicebrowserpresentsinformationaurally,usingprerecordedaudiofileplaybackorusing
texttospeech software to render textual information as audio. A voice browser obtains
informationusingspeechrecognitionandkeypadentry(e.g.,DTMFdetection).
Asspeechrecognitionandwebtechnologieshavematuredoverthepastdecade,thousandsof
voice applications have been deployed commercially and voice browsers are supplanting
traditionalproprietary IVRsystems.Scoresofcompaniesprovidevoicebrowsers.These take
theformofsoftware,packagedhardware/softwaresolutionsorhostedsolutions.
VoiceEnablerHardwareandsoftwarethatenablesPCtoPCvoicecommunicationsorPCtotelephonelong
distancecommunications.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
53/56
494BAPPENDIXA:Definitions
VoiceServer
Its the combinationof hardware and software that enables voicecommunicationsbetween
users. It can manage simultaneously many calls flows to make possible the communication
betweenoneuserandhisspeakers.
CallXML
VoxeoproprietaryCallControlLanguage.ItsaimisthesameasCCXML.
InteractiveVoiceResponse(IVR)
Intelephony,interactivevoiceresponse,orIVR,isaphonetechnologythatallowsacomputer
todetectvoiceandtouchtonesusinganormalphonecall.TheIVRsystemcanrespondwith
prerecordedordynamicallygeneratedaudiotofurtherdirectcallersonhowtoproceed.IVR
systemscanbeusedtocontrolalmostanyfunctionwheretheinterfacecanbebrokendown
into a series of simple menu choices. Once constructed IVR systems generally scale well to
handlelargecallvolumes.
Dual tonemultifrequency(DTMF)
Dualtonemultifrequency(DTMF)signalingisusedfortelephonesignalingoverthelineinthe
voicefrequency band to the call switching center. The version of DTMF used for telephone
tone dialing is known by the trademarked term TouchTone, and is standardised by ITUT
RecommendationQ.23.
Speechrecognition
Speechrecognition(inmanycontextsalsoknownasautomaticspeechrecognition,computer
speechrecognitionorerroneouslyasvoicerecognition)istheprocessofconvertingaspeech
signal to a sequence of words in the form of digital data, by means of an algorithm
implementedasacomputerprogram.
Hidden Markovmodel(HMM) basedspeechrecognition
ModerngeneralpurposespeechrecognitionsystemsaregenerallybasedonHMMs.Theseare
statisticalmodelswhichoutputasequenceofsymbolsorquantities.Onepossiblereasonwhy
HMMsareusedinspeechrecognitionisthataspeechsignalcouldbeviewedasapiecewise
stationarysignalorashorttimestationarysignal.Thatis,onecouldassumeinashorttimein
therangeof10milliseconds,speechcouldbeapproximatedasastationaryprocess.Speech
couldthusbethoughtasaMarkovmodelformanystochasticprocesses(knownasstates).
Dynamictimewarping (DTW) basedspeechrecognition
Dynamictimewarpingisanapproachthatwashistoricallyusedforspeechrecognitionbuthas
now largely been displaced by the more successful HMMbased approach. Dynamic time
http://en.wikipedia.org/wiki/Hidden_Markov_Modelhttp://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Markov_modelhttp://en.wikipedia.org/wiki/Stationary_processhttp://en.wikipedia.org/wiki/Hidden_Markov_Model8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
54/56
50 RealTimeSpeechTranslator
warping is an algorithm for measuring similarity between two sequences which may vary in
timeorspeed.Forinstance,similaritiesinwalkingpatternswouldbedetected,evenifinone
videothepersonwaswalkingslowlyandifinanothertheywerewalkingmorequickly,oreven
iftherewereaccelerationsanddecelerationsduringthecourseofoneobservation.DTWhas
been applied to video, audio, and graphics indeed, any data which can be turned into a
linearrepresentationcanbeanalyzedwithDTW.
StateChartXML(SCXML)
SCXMLstands forStateChartXML :State MachineNotation forControl Abstraction. It is an
XMLlanguagewhichprovidesagenericstatemachinebasedexecutionenvironmentbasedon
Harelstatecharts.
SCXML is able to describe complex statemachines. For example, it is possible to describe
notionssuchassubstates,parallelstates,synchronization,orconcurrency,inSCXML.
ExtensibleStylesheetLanguage Transformations(XSLT)
ExtensibleStylesheetLanguageTransformations(XSLT)isanXMLbasedlanguageusedforthe
transformation of XML documents into other XML or "humanreadable" documents. The
originaldocumentisnotchanged;rather,anewdocumentiscreatedbasedonthecontentof
anexistingone.[2]Thenewdocumentmaybeserialized(output)bytheprocessorinstandard
XMLsyntax or inanother format, such as HTML or plain text.[3] XSLT is most oftenused to
convert data between different XML schemas or to convert XML data into HTML or XHTML
documentsforwebpages,creatingadynamicwebpage,orintoanintermediateXMLformatthatcanbeconvertedtoPDFdocuments.
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
55/56
515BAPPENDIXB:Acronyms
6 APPENDIXB:Acronyms
VXML
VoiceXML(VXML)
CCXML
CallControleXtensibleMarkupLanguage
IVR
InteractiveVoiceResponse
DNIS
Dialednumberinformationservice
DTMF
Dualtonemultifrequency
HMM
HiddenMarkovmodel
DTW
Dynamictimewarping
CCXML
CallControleXtensibleMarkupLanguage
VoiceXML
VoiceeXtensibleMarkupLanguage
SRGS
SpeechRecognitionGrammarSpecification
SISR
SemanticInterpretationforSpeechRecognition
SSML
SpeechSynthesisMarkupLanguage
8/6/2019 20081001 Real Time Speech Translator Report Xavier Garcia Cabrera (1)
56/56
52 RealTimeSpeechTranslator
PLS
PronunciationLexiconSpecification
ECMAScript
Scripting
language
supported
by
most
v