3
Proceedings of the SIGDIAL 2014 Conference, pages 260–262, Philadelphia, U.S.A., 18-20 June 2014. c 2014 Association for Computational Linguistics The Parlance Mobile Application for Interactive Search in English and Mandarin Helen Hastie, Marie-Aude Aufaure * , Panos Alexopoulos, Hugues Bouchard, Catherine Breslin, Heriberto Cuayáhuitl, Nina Dethlefs, Milica Gašić, James Henderson, Oliver Lemon, Xingkun Liu, Peter Mika, Nesrine Ben Mustapha, Tim Potter, Verena Rieser, Blaise Thomson, Pirros Tsiakoulis, Yves Vanrompay, Boris Villazon-Terrazas, Majid Yazdani, Steve Young and Yanchao Yu email: [email protected]. See http://parlance-project.eu for full list of affiliations Abstract We demonstrate a mobile application in English and Mandarin to test and eval- uate components of the Parlance di- alogue system for interactive search un- der real-world conditions. 1 Introduction With the advent of evaluations “in the wild”, emphasis is being put on converting re- search prototypes into mobile applications that can be used for evaluation and data col- lection by real users downloading the ap- plication from the market place. This is the motivation behind the work demonstrated here where we present a modular framework whereby research components from the Par- lance project (Hastie et al., 2013) can be plugged in, tested and evaluated in a mobile environment. The goal of Parlance is to perform inter- active search through speech in multiple lan- guages. The domain for the demonstration system is interactive search for restaurants in Cambridge, UK for Mandarin and San Fran- cisco, USA for English. The scenario is that Mandarin speaking tourists would be able to download the application and use it to learn about restaurants in English speaking towns and cities. 2 System Architecture Here, we adopt a client-server approach as il- lustrated in Figure 1 for Mandarin and Figure 2 for English. The front end of the demon- stration system is an Android application that calls the Google Automatic Speech Recogni- tion (ASR) API and sends the recognized user utterance to a server running the Interaction * Authors are in alphabetical order Manager (IM), Spoken Language Understand- ing (SLU) and Natural Language Generation (NLG) components. Figure 1: Overview of the Parlance Man- darin mobile application system architecture Figure 2: Overview of the Parlance En- glish mobile application system architecture extended to use the Yahoo API to populate the application with additional restaurant in- formation When the user clicks the Start button, a di- alogue session starts. The phone application first connects to the Parlance server (via the Java Socket Server) to get the initial sys- tem greeting which it speaks via the Google 260

The PARLANCE mobile application for interactive …Proceedings of the SIGDIAL 2014 Conference, pages 260–262, Philadelphia, U.S.A., 18-20 June 2014. c 2014 Association for Computational

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Proceedings of the SIGDIAL 2014 Conference, pages 260–262,Philadelphia, U.S.A., 18-20 June 2014. c©2014 Association for Computational Linguistics

The Parlance Mobile Application for Interactive Search inEnglish and Mandarin

Helen Hastie, Marie-Aude Aufaure∗, Panos Alexopoulos,Hugues Bouchard, Catherine Breslin, Heriberto Cuayáhuitl, Nina Dethlefs,

Milica Gašić, James Henderson, Oliver Lemon, Xingkun Liu, Peter Mika, Nesrine Ben Mustapha,Tim Potter, Verena Rieser, Blaise Thomson, Pirros Tsiakoulis, Yves Vanrompay,

Boris Villazon-Terrazas, Majid Yazdani, Steve Young and Yanchao Yu

email: [email protected]. See http://parlance-project.eu for full list of affiliations

Abstract

We demonstrate a mobile application inEnglish and Mandarin to test and eval-uate components of the Parlance di-alogue system for interactive search un-der real-world conditions.

1 Introduction

With the advent of evaluations “in the wild”,emphasis is being put on converting re-search prototypes into mobile applications thatcan be used for evaluation and data col-lection by real users downloading the ap-plication from the market place. This isthe motivation behind the work demonstratedhere where we present a modular frameworkwhereby research components from the Par-lance project (Hastie et al., 2013) can beplugged in, tested and evaluated in a mobileenvironment.The goal of Parlance is to perform inter-

active search through speech in multiple lan-guages. The domain for the demonstrationsystem is interactive search for restaurants inCambridge, UK for Mandarin and San Fran-cisco, USA for English. The scenario is thatMandarin speaking tourists would be able todownload the application and use it to learnabout restaurants in English speaking townsand cities.

2 System Architecture

Here, we adopt a client-server approach as il-lustrated in Figure 1 for Mandarin and Figure2 for English. The front end of the demon-stration system is an Android application thatcalls the Google Automatic Speech Recogni-tion (ASR) API and sends the recognized userutterance to a server running the Interaction

∗Authors are in alphabetical order

Manager (IM), Spoken Language Understand-ing (SLU) and Natural Language Generation(NLG) components.

Figure 1: Overview of the Parlance Man-darin mobile application system architecture

Figure 2: Overview of the Parlance En-glish mobile application system architectureextended to use the Yahoo API to populatethe application with additional restaurant in-formation

When the user clicks the Start button, a di-alogue session starts. The phone applicationfirst connects to the Parlance server (viathe Java Socket Server) to get the initial sys-tem greeting which it speaks via the Google

260

Text-To-Speech (TTS) API. After the systemutterance finishes the recognizer starts to lis-ten for user input to send to the SLU compo-nent. The SLU converts text into a semanticinterpretation consisting of a set of triples ofcommunicative function, attribute, and (op-tionally) value1. Probabilities can be associ-ated with candidate interpretations to reflectuncertainty in either the ASR or SLU. TheSLU then passes the semantic interpretationto the IM within the same server.Chinese sentences are composed of strings of

characters without any space to mark words asother languages do, for example:

In order to correctly parse and understandChinese sentences, Chinese word segmenta-tions must be performed. To do this segmen-tation, we use the Stanford Chinese word seg-mentor2, which relies on a linear-chain condi-tional random field (CRF) model and treatsword segmentation as a binary decision task.The Java Socket Server then sends the seg-mented Chinese sentence to the SLU on theserver.The IM then selects a dialogue act, accesses

the database and in the case of English passesback the list of restaurant identification num-bers (ids) associated with the relevant restau-rants. For the English demonstration system,these restaurants are displayed on the smartphone as seen in Figures 4 and 5. Finally,the NLG component decides how best to re-alise the restaurant descriptions and sends thestring back to the phone application for theTTS to realise. The example output is illus-trated in Figure 3 for Mandarin and Figure 4for English.As discussed above, the Parlance mobile

application can be used as a test-bed for com-paring alternative techniques for various com-ponents. Here we discuss two such compo-nents: IM and NLG.

1This has been implemented for English; Mandarinuses the rule-based Phoenix parser.

2http://nlp.stanford.edu/projects/chinese-nlp.shtml

Figure 3: Screenshot and translation of theMandarin system

Figure 4: Screenshot of dialogue and the listof recommended restaurants shown on a mapand in a list for English

2.1 Interaction Management

The Parlance Interaction Manager is basedon the partially observable Markov decisionprocess (POMDP) framework, where the sys-tem’s decisions can be optimised via reinforce-ment learning. The model adopted for Par-lance is the Bayesian Update of DialogueState (BUDS) manager (Thomson and Young,2010). This POMDP-based IM factors the di-alogue state into conditionally dependent ele-ments. Dependencies between these elementscan be derived directly from the dialogue on-tology. These elements are arranged into a dy-namic Bayesian network which allows for theirmarginal probabilities to be updated duringthe dialogue, comprising the belief state. Thebelief state is then mapped into a smaller-scalesummary space and the decisions are optimisedusing the natural actor critic algorithm. In theParlance application, hand-crafted policies

261

Figure 5: Screenshot of the recommendedrestaurant for the English application

can be compared to learned ones.

2.2 Natural Language Generation

As mentioned above, the server returns thestring to be synthesised by the Google TTSAPI. This mobile framework allows for testingof alternative approaches to NLG. In particu-lar, we are interested in comparing a surface re-aliser that uses CRFs against a template-basedbaseline. The CRFs take semantically anno-tated phrase structure trees as input, which ituses to keep track of rich linguistic contexts.Our approach has been compared with a num-ber of competitive state-of-the art surface real-izers (Dethlefs et al., 2013), and can be trainedfrom example sentences with annotations of se-mantic slots.

2.3 Local Search and Knowledge Base

For the English system, the domain database ispopulated by the search Yahoo API (Bouchardand Mika, 2013) with restaurants in San Fran-sisco. These restaurant search results arereturned based on their longitude and lati-tude within San Francisco for 5 main areas, 3price categories and 52 cuisine types contain-ing around 1,600 individual restaurants.The Chinese database has been partially

translated from an English database for restau-rants in Cambridge, UK and search is basedon 3 price categories, 5 areas and 35 cuisinetypes having a total of 157 restaurants. Dueto the language-agnostic nature of the Par-lance system, only the name and address

fields needed to be translated.

3 Future Work

Investigating application side audio compres-sion and audio streaming over a mobile in-ternet connection would enable further assess-ment of the ASR and TTS components usedin the original Parlance system (Hastie etal., 2013). This would allow for entire researchsystems to be plugged directly into the mobileinterface without the use of third party ASRand TTS.Future work also involves developing a feed-

back mechanism for evaluation purposes thatdoes not put undue effort on the user and putthem off using the application. In addition,this framework can be extended to leveragehyperlocal and social information of the userwhen displaying items of interest.

Acknowledgements

The research leading to this work was fundedby the EC FP7 programme FP7/2011-14under grant agreement no. 287615 (PAR-LANCE).

ReferencesH. Bouchard and P. Mika. 2013. Interactive hy-

perlocal search API. Technical report, YahooIberia, August.

N. Dethlefs, H. Hastie, H. Cuayáhuitl, andO. Lemon. 2013. Conditional Random Fieldsfor Responsive Surface Realisation Using GlobalFeatures. In Proceedings of the 51st AnnualMeeting of the Association for ComputationalLinguistics (ACL), Sofia, Bulgaria.

H. Hastie, M.A. Aufaure, P. Alexopoulos,H. Cuayáhuitl, N. Dethlefs, M. Gasic,J. Henderson, O. Lemon, X. Liu, P. Mika,N. Ben Mustapha, V. Rieser, B. Thomson,P. Tsiakoulis, Y. Vanrompay, B. Villazon-Terrazas, and S. Young. 2013. Demonstrationof the PARLANCE system: a data-drivenincremental, spoken dialogue system for in-teractive search. In Proceedings of the 14thAnnual Meeting of the Special Interest Groupon Discourse and Dialogue (SIGDIAL), Metz,France, August.

B. Thomson and S. Young. 2010. Bayesian up-date of dialogue state: A POMDP frameworkfor spoken dialogue systems. Computer Speechand Language, 24(4):562–588.

262