Bilingual Response Generation Using Semi-Automatically

Embed Size (px)

Citation preview

  • 7/29/2019 Bilingual Response Generation Using Semi-Automatically

    1/4

    Bilingual Response Generation using Semi-Automatically-Induced Templates for a Mixed-Initiative Dialog SystemWingLin Ep andHelen M. Meng

    Human-Computer Communications L aboratoryDepartment of Systems Engineering and Engineering ManagementThe Chinese Universityof Hong Kong., I : ( wl yi p, hmmeng]Bse. cuhk. edu. hkABSTRACTWe have previously developed a framework for naturallanguage response generation for mxed-initiative dialogs in theCUHK Restaurants domain [I]. This paper investigates the useof sem-automatic technique for response templates generation.We adopt a sem-automatic approach for grammar induction [2]to capture the language structures o f responses from un-

    annotated corpora. We wish to use this approach to induce aset of grammars from our response data. The inducedgrammars are coupled with a parser to produce responsetemplatesin a sem-automatic way. Our response data consistsof 2349 waiter responses. It is used as the training corpus forgrammar induction. Unsupervised grammar induction is firstperformed, followed by using the learned grammars as priorknowledge for seeding the clustering process. Results showthat the sem-automatically-induced response templates covermore than 50% of the hand-designed templates in templatescoverage and provide more realization options: Performanceevaluation indicates that the task completion rate has at least90%. and most of the Grices maxims as well as the overalluser satisfaction scored at3. 5pointsorabove.1. INTRODUCTION

    training corpus for grammar induction. We performunsupervised grammar induction first and use the learnedgrammars as prior knowledge for seeding the clusteringprocess. A set of sem-automatically-induced responsetemplates can then derived by parsing our response data withthe induced grammars.

    2. THE RESPONSE DATAOur response data contains 2349 waiter response utteranceswhich are mainly extracted from the CUHK Restaurants corpus[I]. The corpus contains 260 dialogs (with 1785 customerrequest utterances and 2176 waiter response utterances) thatcapture interactions between a customer and a waiter in arestaurant. We use those waiter response utterances from thecorpus and further expandow response data by collecting 173waiter response utterances from bwks [3, 4, 5, 61. Some

    Doyou havea reservation?Howc4n I help you? I wu l d recommend smoked salmon scallop.Table I : Response examples extracted from the response data.

    This paper extends our previous effort of using hand-designedtext generation templates for response verbalization [I]. Wehave previously worked on response generation in the contextof the CUHK Restaurants domain, where our prototype systemsimulates the interaction between a customer and a waiter. Inour earlier framework, we used a set of hand-designed textgenerate templates to verbalize the response message withappropriate selection of semantic, syntactic and lexicalstructures, each of which is associated with a response dialogstate. The templates specify sententid Structures that canincorporate semantic categories parsed from the user requests togenerate a coherent system response. In order to reduce themanual work involved in hand-designing the text templates, weadopt a sem-automatic approach for grammar induction [Z] tocapture the language shuctures of responses. This approachcan reduce manual effort of handcrafting response grammar.This can achieve enhanced portability across domains and

    3. RESPONSE G R AMMAR INDUCTIONThe clustering was previously implemented forannotated corpora, Details have been described in [21,are induced by agglomerative clustering whichgroups words or phrases with similarleft and right linguistic(Div), which

    squiring semantic

    groups words spatially and temporally, spatialand syntactic StIUCtUreS from un.

    by minimizing thetheKullback-Liebler distance (SeeEquationTemporal clusteringcaptureskey phrases whichfrequently by maximzing the mutual information (MI),whichindicate the degreeof CmcurrenCe of two consecutive entities(See Equation2). Spatial clusters(SCs) and temporal clusters(TCs), which are semantic categories and phrasal shuctures

    respectively areproducediterativelylanguages. The clustering algorithm was previouslyimolemented for acauirine semantic structure and swtactic (1)isf xL (e, , 2 =Diu (p:., ptfl )+Diu (p;@ , ; ). - ~,str;ctures from un-annotated corpora. We tend to use this whereapproach to induce a set of grammar fromour response data. D i v b y , p f ) =~ p y ( i ) l o g i + C p f ( i ) l o g -( i ) p4(i)The induced grammar should be useful for producing response ,= PP i ) P? 6)templates in a sem-automatic way. The grammar induction is astatistical approach that uses agglomerative clusteringto groupwords spatially and temporally. We use our response data as the p(i) is the probabilityof the entity i adjacent(adi) to entitye,.Vis the vocabulary size for adjacent context.0-7803-8678-7/04/$20.00 02004E E E 193 ISCSLP 2004

    Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:38 from IEEE Xplore. Restrictions apply.

  • 7/29/2019 Bilingual Response Generation Using Semi-Automatically

    2/4

    There are two freeparameters required in the clusteringprocess: the mnimum count threshold (4nd the number ofmerges in clustering(N). I n each iteration, only entities withfrequencies of occurrence above countM are considered, theNentity pairs with lowest values for Dist are merged to formspatial clusters and theN entity pairs with highest values ofMare merged to form temporal clusters. We empirically set M-3andN=5. If a largerM is used, those contributive entities (e.g.,mushroom,prawns which can be the grammar terminals ofFOOD)will be filtered. Weexperimented with N for differentvaluesN=l, N=3,N=5andN= O and compare the grammarsizeand time consumption for different values ofN. By using N=5,the grammar induction can produce equally good grammarusing fewer iterations and less computation time whencompared with that of N= and N=3. We will not choose N=Oelse the clustering process becomes tw aggressive and theinduced grammar becomes over-generalized.Clustering is allowed to proceed to 140 iterations. A sshown in Figure 1, the growth of clusters number stoppedbeyond iteration 130. From the output grammar, we selected30 categories that we regard as basic semantic categories andphrases for the CUHK Restaurants domain. Examples areFOOD, REST-NAME, NUM, UNIT, etc. We manuallycomplete the terminals for these categories and use them asseed categories to catalyze the rerun of the agglomerativeclustering process. From Figure 2, we found that the numberof terminals saturates within 100 iterations, so the clusteringprocess is terminated. The output grammar from bothclustering processes is post-processed by hand-refinementRefinement involves (i) pruning irrelevant rules;. (ii)consolidating similar rules; (i ii) completing their set ofterminals and (iv) giving meaningful labels to the grammarrules, e.g., FOOD, REST-NAME, etc. Post-processing tookabout three hours to produce a grammar with 109 non-terminals and457 terminals. Some examples of grammar rulesare depicted in Table2.

    induction process.

    .......................... .*_.....-.-..... .......................................

    0IterationsF igure 1: Growth of response grammar units in the grammarinduction process.

    ..................................................................................................................................... i00w..........................

    O r 20 40 60 80 100 1oIterationsF igure 2: Growth of response grammar units for the grammarinduction seeded 30 seed categories.

    4. RESPONSE TEM PLA TES GENERATIONThe induced grammars are coupled with a parser and operatedon our response data for retrieving key semantic concepts orstructural phrases. The key semantic concepts and structuralphrases are tagged automatically with their correspondingcategories. Al l sentences in the response data are parsed withour induced grammars to produce tagged sentences. Weobtain642 distinct tagged sentences. Examples of tagged sentencesobtainedfrom our parser are shownin Table3.I Before parsing: welcome to hilton restaurantAfler parsing: WELCOME RESTGrammars: WELCOME3 welcome toREST-NAME 3 abc 1 hilton ...REST3REST-NAME restaurantBefore parsing: you have reserved a table for our at 7pmM er parsing: you have reservedTABLE-FOR atTIMEGrammars: TABLE-FOR 3a table for NUMNUM 1 I 2 ._ _TI ME3NUM oclock I NUMpm ...Table 3: Examples of tagged sentences parsed using inducedgra-arsAmong those distinct tagged sentences, we have selected 278tagged sentences with categories coverage greater than 30%.Wedefine the categories coverage as the percentage of wordsthat are covered by our induced grammars. Table 4 presentsthe computationof categories coverage. The first one will notbe selected since its categories coverageis less than30%. Theselected tagged sentences are used as response template

    After parsing:Before parsing: you have resenred a table or f our at 7pmM er parsing: you have reservedTABLE-FOR atTIME

    ILWILL look into the matter at once

    . - .I Categories coverage: 6/10=60%Table 4: Examples illustrate the computation of categoriescoverage.We found that some tagged sentences actually belong to similarresponse structure. For example, the following tagged sentencesare all referring to a response that offering further services.They are grouped into a single response template labeledANYTHI NG- ELSE (See Table5). Each template is associatedwith one or more tagged sentences that constitute a variety ofrealization options. Our approach generated 64 responsetemplates in total. Some examples are shown in Table6. The

    194

    Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:38 from IEEE Xplore. Restrictions apply.

  • 7/29/2019 Bilingual Response Generation Using Semi-Automatically

    3/4

    categories prefixed with # can be obtained either from

    would there beANY-ELSEdo you needANY-ELSEMODALUbring youANY-ELSEMODALU serve youANY-ELSE

    grammar termnals or customer requests.rTcmpiate Isbcl: ANYTHI NG ELSE I, IWould.there be anything else?

    Doyou need anything else? Can I bringyou anything else?M ay serve you anythins else?

    lhsociated tamed sentences: IRealization ontions: .- 1

    ordered garden salad, orangeuice, gri lledfishf il let with tomatoberb sauce. They will be ready inI5 minutes. I

    IINY-ELSE IAnything else? WOULD~~UIKE ANY-ELSE Wouldvou like anvthinp else?

    garden salad, orangejuice, &8@35Wf8, &d&@&~#?+5##BH

    Template label: WELCOME RESTTagged sentence: WELCOME RESTGrammars:WELCOME 3 welcome toI REST NAME+abc I hilton ... IREST-+ REST-NAME restaurantContent welcome tu#REST_NA ME restaurant.Template label: SUGGESTTagged sentence:HOW-ABT FOODI . ~ - - . i RECOMMENDtheFOOD IWOULD-U-LIKE someFOOD

    Grammars:HOW ABT 3 how abouti RECOMMEND3 would recommendWOULD-U_LIKE 3 would you likeFOOD3 seafood platter I steak 1 ...Content: How about#FOOD?I would recommend#FOOD.Would you like some#FOOD?Table6: Example templatesWELCOME- REST (for welcomngcustomers) andSUGGEST (for suggesting food).

    4.1. BILINGUAL RESPONSE TEM PLATESWe translated the64 response templates from English toChinesein order to achieve Chinese response generationaswell. Table7 depicts the translated response templatesSUGGEST.

    Table 7 The English and Chinese responses for the templateSUGGEST.

    5. EVALUATIONWe compared the resultant grammar-induced responsetemplates with the band-designed one, as described in [I].Among the64 sem-automatically-induced response templates,57 of them carry simlar semantic meaning and serve the samefunction as those hand-designed ( I O1 templates). Our sem-automatically-induced response templates cover over 50% (57out of 101) of the hand-designed templates. An extra 7templates are discovered using the induced grammars and theydo not appear in the hand-designed templates. One of the extratemplatesSHOW LOCis shownin Table8Template label: SHOW LOC . -Tagged sentence: SHOW you to theLOCGrammars:SHOW 3 will showLOC3bar 1 main restaurant_..Table8 The extra response templateSHOWLOC is used forContent: I will show you to the#LOG.showing location to customer.

    Although the templates coverage of the sem-automatically-derived response templates is not as good as those hand-designed one, we observed that ow sem- automati cal l y-i ndncedresponse templates can increase variability of response. It isbecause each response template offers more realization optionsthan those hand-designed one. The number of realizationoptions for each template has increased50% in average. Takethe template ANYTHI NG- ELSE as an example, the hand-designed template only gives4 options for realizing a response,while the sem-automatically-induced response templates offers6response realizations (See Table9).1 Hand-desimed I Anvthinzelse. sir? ITemplate

    designed response template and sem-automatically-inducedresponse template with template label ANYTHI NG- ELSE.We have previously incorporated the cooperative responsegeneration mechanism in an initial prototype of the interactiveCUHK Restaurants system [I]. The system accepts typednatural language queries in English as input. Wereplaced theprevious hand-designed templates with the sem-automatically-induced response templates for text generation. We asked tensubjects to interact with the system. Each subject isgiven threetasks: (i) reserve a table; (i i) order a meal; and (iii) ask for thebill. All interactions are automatically logged by the system.An example evaluation dialog is shown in Table IO . We cansee that the food name garden salad and orange juice areparsed directly from the customer request into the categoty#FOOD, without English-toChinese translation, while thosesuggested food items are obtained from existing Chinesegrammar termnals. The average number of dialog tums.foreach task is shown inTable 11 .

    195

    Authorized licensed use limited to: Tec de Monterrey. Downloaded on July 23, 2009 at 19:38 from IEEE Xplore. Restrictions apply.

  • 7/29/2019 Bilingual Response Generation Using Semi-Automatically

    4/4

    Tasks I Reservation I Order I BillAverage #dialog tums I 6.5 I 5.5 I 3.2Table 11 :evaluation dialogs for each of the three tasks.

    We evaluate the dialogs in terms of the task completion rate,Grices maxims [7] as well as overall user satisfaction.

    Average number of dialog tums across the 10

    5.1 Task Completion RateAll the evaluation dialogs logged by the system have beenchecked for task completion. A task is considered complete ifthe appropriate confirmation message is present in the dialog.For the reservation task, we search for the system confirmation.A task is considered complete as long as the appropriateconfirmationmessage exists, even if there are incoherent dialogturns involved. The simplicity of our evaluation tasks have ledto high task completion rates across the evaluation dialogs (SeeTable 12).Tasks I Reservation I Order I Bill

    3 I. .

    ask Competonate 90% 9/10) I 100% ( l O / lO ) l 100% (10/10)Table 12: Task completion rates across the I O evaluationdialogs for each of the tasks- eserving a table, ordering fwdand requesting the bill.5.2 Grices MaximsandPerceivedUser SatisfactionWe also evaluate response generation in terms of GricesMaxims as well asoverall user satisfaction. Each subjectwas asked to fill out a questionnaire that contains threesets of questions, one for each task (i.e. reservation,ordering food and requesting the bill). The set ofquestions is identical across the tasks and relate toGrices Maxims as well asoverall user satisfaction. Thesubjects were asked to respond to these questions on afive-point Likert scaleand the resultsare shown in Table13. A t-test shows that most of the maxims aresignificantly better than average (Likert score 3) ata=O.OS, except the maxim of Quantity for reservationtask and the maxim of Manner for order task.

    Table13: Average scores and standard deviations (in brackets)in a five-point Likert scale (]--very poor, 5-very good)obtained from evaluation in terms of Grices Maxims andoverall user satisfaction.The score for Maxim of Quantity for reservation task isrelatively low. This reflects that some of our semi-automatically-induced response templates do not give sufficientinformation to the customer. Table 14 presents an illustration-in the second and third dialog turns, the customer does not givea specific location, however, the system is expecting an answerwith finite location such as nenr the w indod . This situationdoes not happen when we use the hand-designed template. It isbecause the hand-designed one includes the location options forcustomer to choose (i.e., Wherewould you like to sit? By thewindow, in the main resrnurnntor in the bnr? 7.

    Toble 14:dialogs to illustrate incoherent dialog turns.

    An example extracted from the evaluation

    6. CONCLUSIONSThis paper reports on .our approach towards sem-automaticnatural language response templates generation in the CUHKRestaurants domain. Wehave tried to develop a set of responsetemplates semi-automatically romcorpus. We have adopted asem-automatic approach for grammar induction to capture thelanguage structures of responses. Agglomerative clustering isused to group words spatially and temporally. Severalexperiments are conducted to determine the two freeparametersM = 3 (minimum count threshold) and N = 5(number of merges). The resultant grammar is post-processedby labeling tags with meaningful labels, completing theterminals for some categories, pruning irrelevant clusters andconsolidating clusters that belonging to same categories. Wehave injected some prior knowledge that was learned from theunsupervised grammar induction. Seed categories obtainedfrom unsupervised process are used to catalyze grammarinductionsoas to produce longer phrases using less iteration. Aset of sem-automatically-induced response templates wasderived by parsing our response data with the inducedgrammar. Those templates are compared with the hand-designed templates in terms of templates coverage and numberof realization options, Although the sem-automatically-inducedresponse templates cannot outperform the handdesigned one intemplates coverage, they still have a competitive performancewith coverage greater than 50% (57 out of 101). Our approachalso increases the variability of response by providing morerealization options. Performance evaluation based on the 30interactive dialogs from 10 subjects showed at least 90% taskcompletion rate. Most of the Grices maxims as well as theoverall user satisfaction scored at 3.5 points or above.

    7. ACKNOWLEDGEMENTSThe work described in this paper was partially supported bywants from the Research Grants Council of the Hong KongSARGovernment, China (Project Nos. CUHK4326/02E).8. REFERENCESH. Meng, W. L. Yip, M. Y. Mok, S . F. Chan, NaturalLanguage Response Generation in Mixed-InitiativeDialogs using Task Goals and Dialog Acts. InProceedings of Europenn Conference on SpeechCommunication and Technologv, 2003H. Men& K. C. Siu, Semiautomatic Acquisition of

    Semantic Structures for Understanding Domain-SpecificNatural language Queries. I n IEEE Transactions onKnowledge nnd Datn Engineering. Vol.14, Issue I , p.172-181. J adF eb2002.Carmen Tang, @H@&k%%.@fi&%6&.hEHWW+.&BSEXfi3,wAa~m=BBs.@Ei?kWWk,BatWBlssiR.&@X&&.R. Frederking, Grices maxims: do the right thing. InF rederkng, R. E ., 1996.

    196

    A h i d li d li i d T d M D l d d J l 23 2009 19 38 f IEEE X l R i i l