6
Answer Generation for Chinese Cuisine QA System Ling XIA Faculty of Engineering, The University of Tokushima Tokushima, 770-8506, Japan [email protected] Zhi TENG Faculty of Engineering, The University of Tokushima Tokushima, 770-8506, Japan [email protected] Fuji REN Faculty of Engineering, The University of Tokushima Tokushima, 770-8506, Japan [email protected] Abstract: In this paper, we propose an approach on answer generation for cooking question answering system. We first review previous work of question analysis. Then, we give annotation scheme for knowledge database. Finally, we present the answer planning based approach for generate an exact answer in natural language. An evaluation has been conducted on natural language questions and the result shows that the system can satisfy user's demand. Keywords: QA system; answer generation (AG); domain knowledge; annotation; answer planning 1. Introduction Cuisine is a specific set of cooking traditions and practices, which is often associated with a specific culture. For many people, cooking in the kitchen is part of their daily life. Several researches have been done for computer-aided cooking. Such as constructing cooking content automatically [1], improving web search performance in cooking domain [2]. Our goal focuses on QA system in cooking domain which serves as an intelligent agent to guide meals (e.g. dish cooking, cooking techniques) in our daily life. In this work we address the problem of answer generation (AG) for the Chinese cuisine QA system. General QA systems adhere to the pipeline architecture which mainly includes three parts: question analysis, information retrieval, and answer extraction (AE). If the QA system is based on non-structural or semi-structural database, prior to generating the exact answer, the answer extraction should be done from the related retrieval documents. For example, recent research in restricted domain, such as sightseeing QA system, Analects of Confucius QA system, are worked in this way. They return answers by various similarity calculation and answer ranking [3-4]. Our QA system is aiming at real application of the cooking domain. We can obtain rich structural cooking documents from web and build an offline knowledge sources. After annotating the 978-1-4244-4538-7/09/$25.00 ©2009 IEEE corpus with cooking attribute blocks, the answer information may extract directly from the knowledge database. Then, an exact answer in natural language can be generated. Several systems have specifically addressed the task of answer generation or real application of text generation. For example, generating intelligent numerical answer in a QA system, generating intensional answers in intelligent QA systems, generating weather forecast text, and generating approximate geographic description [5-8]. Our work involves three types of questions, they are factoid question, complex question and list question. From the view of real application, we established our question taxonomy. Based on question classification which have been done by [9] before, we capture various aspects of the question-answer relationship by answer planning, match question subject with answer components which are extracted from related database, and generate succinct answer in natural language. Our main challenge is: 1) to put effective annotation on domain document. 2) to find the suitable answer generating strategies for different question type. The rest of this paper is organized as follows: Section 2 give a review of our question taxonomy and question classification. Section 3 present the annotation of cooking knowledge database. In section 4, we propose the strategies of AG based on answer planning. Section 5 is evaluation and discussion. Finally we address conclusion and future work in section 6. 2. Related Work The answer generation described here is established on the foundation of question classification. In our previous work, we have established question taxonomy and conducted the question classification, so we review the question analysis briefly. 2.1. Question Taxonomy and Domain Word The service objects of our cooking QA system are the people who are interested in cooking and want to cook dishes by themselves. After a survey on the

[IEEE 2009 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE) - Dalian, China (2009.09.24-2009.09.27)] 2009 International Conference on Natural

  • Upload
    fuji

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Answer Generation for Chinese Cuisine QA System

Ling XIA

Faculty of Engineering,

The University of Tokushima

Tokushima, 770-8506, Japan

[email protected]

Zhi TENG

Faculty of Engineering,

The University of Tokushima

Tokushima, 770-8506, Japan

[email protected]

Fuji REN

Faculty of Engineering,

The University of Tokushima

Tokushima, 770-8506, Japan

[email protected]

Abstract: In this paper, we propose an approach on answer

generation for cooking question answering system. We first review previous work of question analysis. Then, we give annotation scheme for knowledge database. Finally, we present the answer planning based approach for generate an exact answer in natural language. An evaluation has been conducted on natural language questions and the result shows that the system can satisfy user's demand.

Keywords:

QA system; answer generation (AG); domain knowledge; annotation; answer planning

1. Introduction

Cuisine is a specific set of cooking traditions and practices, which is often associated with a specific culture. For many people, cooking in the kitchen is part of their daily life. Several researches have been done for computer-aided cooking. Such as constructing cooking content automatically [1], improving web search performance in cooking domain [2]. Our goal focuses on QA system in cooking domain which serves as an intelligent agent to guide meals (e.g. dish cooking, cooking techniques) in our daily life. In this work we address the problem of answer generation (AG) for the Chinese cuisine QA system.

General QA systems adhere to the pipeline architecture which mainly includes three parts: question analysis, information retrieval, and answer extraction (AE). If the QA system is based on non-structural or semi-structural database, prior to generating the exact answer, the answer extraction should be done from the related retrieval documents. For example, recent research in restricted domain, such as sightseeing QA system, Analects of Confucius QA system, are worked in this way. They return answers by various similarity calculation and answer ranking [3-4]. Our QA system is aiming at real application of the cooking domain. We can obtain rich structural cooking documents from web and build an offline knowledge sources. After annotating the

978-1-4244-4538-7/09/$25.00 ©2009 IEEE

corpus with cooking attribute blocks, the answer information may extract directly from the knowledge database. Then, an exact answer in natural language can be generated.

Several systems have specifically addressed the task of answer generation or real application of text generation. For example, generating intelligent numerical answer in a QA system, generating intensional answers in intelligent QA systems, generating weather forecast text, and generating approximate geographic description [5-8]. Our work involves three types of questions, they are factoid question, complex question and list question. From the view of real application, we established our question taxonomy. Based on question classification which have been done by [9] before, we capture various aspects of the question-answer relationship by answer planning, match question subject with answer components which are extracted from related database, and generate succinct answer in natural language. Our main challenge is: 1) to put effective annotation on domain document. 2) to find the suitable answer generating strategies for different question type. The rest of this paper is organized as follows: Section 2 give a review of our question taxonomy and question classification. Section 3 present the annotation of cooking knowledge database. In section 4, we propose the strategies of AG based on answer planning. Section 5 is evaluation and discussion. Finally we address conclusion and future work in section 6.

2. Related Work

The answer generation described here is established on the foundation of question classification. In our previous work, we have established question taxonomy and conducted the question classification, so we review the question analysis briefly.

2.1. Question Taxonomy and Domain Word

The service objects of our cooking QA system are the people who are interested in cooking and want to cook dishes by themselves. After a survey on the

frequently asked cooking questions from Internet2345, we categorized the cooking domain questions into four classes. Our question types and examples are shown in Table 1.

Table 1. Question Categories and examples

We divided questions according to their expected

answer type. In order to carry on the question classification effectively, the questions should be preprocessed. Namely, to do lexical analysis, create domain word and so on. We carry on the segmentation and POS tagging by using Chinese lexical analysis tools ICTCLAS. To use the domain knowledge effectively and express the question subject distinctively, we increase five kinds of domain words. Table 2 shows some examples of our domain words.

Table 2. Types and examples of domain words

2.2. Question Classification

Question classification has two major functions, one is reducing candidate answer's set and increasing the accuracy of returned answer, the other is to decide the strategy of answer extraction. In restricted-domain, the service and the question type are relatively fixed, therefore by the means of the domain knowledge, the question type can be recognized effectively based on rules. We first extract classification features by virtue of the domain attributes. According to matching filtering

2 http://www.chu6.com/ 3 http://www.mabuyu.com/ 4 http://www.scysw.com/ 5 http://zhidao.baidu.com/

strategy, we classify the questions using the production rules. The rule-based method can achieve good performance when the rule matches well with the question, but it is helpless for the questions that can not be matched. In this case, we deliver them to the SVM model for secondary classification, instead of label them with types by hard rules. The experiments showed the effectiveness of such method, it achieved an overall accuracy of 96.22% [9].

3. Knowledge Representation of Domain Document

The documents of cooking area, such as fundamental cooking techniques, raw materials, condiments, are relatively stable. Although new dishes may appear from time to time, the main menu is rarely changed completely. Therefore, this work focus on extracting answers from local database.

In cooking domain, users frequently ask the same types of questions, a few well-chosen knowledge sources are sufficient to provide good knowledge coverage. Our corpus is collected from website. We use two different annotations resulting in two different resources. The first source is composed of dish cooking documents and the second one is for cooking techniques.

3.1. Annotation Scheme for Dish Cooking Document

Our dish cooking corpus is consisted of Sichuan cuisine menus which is mainly collected from Cookgod6. To provide an effective mechanism for answering questions, we annotate dish cooking documents with domain attributes in several blocks, namely, dish name block, cooking form block, raw material block, and directions block. Among raw material block, there are three sub-blocks: main ingredient, ingredient, and condiments. Figure 1 show an example of annotation on dish cooking document [10].

3.2. Annotation Scheme for Cooking Technique Document

The documents of cooking techniques are collected from the website such as wikipedia7, baidu8, meishij9. The characteristic of this source is different from the dish cooking document. Generally, the explanation of cooking techniques is a paragraph constituted by several sentences, and in some situations, the sentences may not contain a domain attribute word. Figure 2 show an

6 http://www.cookgod.com 7 http://zh.wikipedia.org 8 http://zhidao.baidu.com 9 http://www.meishij.net

example of annotation on cooking technique document.

Figure 1. Example of annotation for dish cooking

document

Figure 2. Example of annotation for cooking technique document

4. Answer Planning Based Approach for Answer Generation

Discourse generation is not possible without text planning. The main purpose of the text planning is to make sure the generated contents and the relationship between the contents. However, the planning content is dependent upon the user models.

In this work, we use answer planning based approach to generating answers for QA system in cooking domain. We use a two-stage planning for an 三wer generating, i.e. answer content planning and answer sentence planning. Once answer components have been

determined by content planning, the answer will be generated in natural language by sentence planning. The architecture of answer generation is shown in Figure 3.

Figure 3. The architecture of answer generation

4.1 Answer Content Planning

Content planning decide what information must be communicated. Namely, content selection and ordering. As refered to in section 2, we categorized the cooking questions into four classes. The answers which correspond with first, second and the third type of questions could be generate from dish cooking documents, the answer components are attribute blocks. However, as the difference in domain characteristic, the answers corresponding the fourth type will be generated from the resources of cooking technique documents, and the answer component is an explanation paragraph.

4.1.1 Answer content for the first type

The first type asks for raw material, for example: "请问蚂蚁上树的主料是什么?(What is the main ingredient of ants climbing a tree?)". The answer content will be delimitated from the dish cooking database. According to the focus words in question, there are four kinds of possible extraction forms: raw material block, main ingredient subblock, ingredient subblock, and condiment subblock (see Figure 1.).

4.1.2 Answer content for the second type

The second type asks the cooking recipe, for example: "请问蚂蚁上树怎么做?(How to cook ants climbing a tree?)". The answer of which should be about how to cook a dish. There are two answer components for this kind of question, they are raw material block and the directions block.

4.1.3 Answer content for the third type

The third type asks dish name, for example:"以豆腐

为主料可以做什么菜?(What dishes can be made with

bean curd?)". "请推荐几款汤菜。 (Please recommend several kinds of soup.)". The answer should be the dish recommendations according to the raw material or cooking form. The answer component in database is dish name block. It is noteworthy that there are multiple possibilities in our database. An example is shown in Figure 4, there are six dish names for the question "以豆

腐为主料可以做什么菜?(What kind of dishes can be made using bean curd?)".

Figure 4. Answer set for "what kind of dishes can

be made using bean curd?".

In the real application scene, we do not need to list all of them. In this work, we do the pretreatment to the candidates set. Let us assume that there are N candidate answers coming from N different documents, i is the number of recommended answer candidates.

If N ≥ 3 then i =3 otherwise i =N (1) Thus we recommend at most three candidates to

user. Therefore according to answer set shown in Figure 4., we give the answer: "以豆腐为主料可以做麻婆豆腐、

家常豆腐或熊掌豆腐。 (You can cook mapo bean curd, home style bean curd or bear's paw-like bean curd by using bean curd.)".

4.1.4 Answer content for the fourth type

Type 4 asks about some basic cooking techniques, for example: "炒菜时怎样勾芡?(How to put in starch when stir-frying dish?)". The answer component of this type is an explanation paragraph, which can be obtained from the second knowledge source (see Figure 2.) by subject indexing.

To sum up, the answer content planning for each kind of questions are shown in Table 3.

In this section, we proposed a method for content determination based on question classification. In the following section, we use the obtained content

information to generate an answer in natural language.

Table 3. Answer content planning corresponding to question type

4.2 Answer Sentence Planning

A sentence planning makes explicitly the lexical elements and relations, which have to be realized in the output answer. Namely, what words and syntactic constructions will be used for describing the content in well-formed human-like sentences. Our goal is to summarize generic syntactic templates for answer sentences according to four question categories.

To using generic templates to represent answers,we denote an user's question as q, then we can use dish name(q), cooking form(q), answer block(q), and so on to express the attribute blocks and answer component of the question. For example:

In this paper, to express generic templates

succinctly, we use abbreviations as following:

Type 1: The answer sentence of the question type1 include three parts: Question subject, focus word, and answer component. The characteristic of question type 1 is that the answer granularity may be different. According

to focus word(q), the answer component(q) is possibly raw material(q), main ingredient(q), ingredient(q), or condiment(q). We call them as an unified name answer block(q) (Ans Blk(q)). The generic answer template of type1 is:

Type 2: The answer sentences of type 2, type 3 and type 4 are all consisted of two parts. Namely, Question subject, and answer component.

The answer component of type 2 is the combination of two blocks, they are raw material blocks and direction blocks. The generic answer template of type 2 is:

Type 3: As discussed in Section 4.1.3, the characteristic of question type 3 is that there are usually multiple corresponding answers in our database and we adopt at most three candidates. Moreover, the attribute of question subject has two possibilities. This kind of answer's generation can be described with sentence plan tree which is shown in Figure 5.

Figure 5. Sentence plan tree of type 3

According to the different subjects of question, two kinds of the answer templates were included in this type.

Type 4: The answer of this type is the paragraph description and the attribute of the question subject also

has two possibilities. So there are two kinds of the answer templates for type 4.

No answer: When the answer component can be extracted from knowledge database (regardless of being correct or incorrect), we can generate a certain answer. However we should give user a reply even if there is no answer in some cases. The reply template is:

5. Evaluation

We evaluate our approach by applying our answer generating method to 53 questions which have been classified, and we distinguish three cases: correct answer, incorrect answer, and no answer. The performance of AG module is evaluated by means of coverage and accuracy. Here coverage is defined as the ratio of the number of answered questions to the total number of questions. Accuracy is the ratio between the number of the correct answers and the total number of answers [11].

The result of evaluation is shown in Table 4. Where, QT is the type of question;

#Q is the number of question;

#CA is the number of correct answer;

#ICA is the number of incorrect answer;

#NA is the number of no answer;

Cov signifies coverage and Acc express accuracy.

Table 4. Evaluation of the answer generation task

Our answer generation is based on the question

classification. When the question has been wrongly

classified, there are two kinds of situation. 1) we can not extract the answer component according to content planning. 2) we extract a wrong answer component. The system can evaluate the coverage automatically, however, the accuracy relies on human judgement. As seen in Table 4, we achieve a high accuracy, but the coverage is somewhat low. From the application's point of view, it is better that there is no answer rather than a wrong one.

6. Conclusion and future Work

In this work, we carried on the answer generation for cooking QA system. Our answer generation is made through the use of well-structured knowledge database and is based on question classification. First, we extracted question subject according to the question category. Then, corresponding to question subject, we extracted answer component(s) in structured cooking database. Finally, we generate a succinct and explicit answer in natural language to return to user. Evaluation shows that NLG techniques can be used successfully in question answering of cooking domain.

Future work will consider how to generate the answers in the situations such as there is no needed knowledge in domain database, there are wrongly written characters in user's input.

Acknowledgements

This research has been partially supported by the Challenging Exploratory Research, 21650030. Many thanks to all our colleagues participating in this project. We also thank Dr. Suzuki and Dr. Matsumoto for useful discussion.

References

[1] Y. yamakata, K. kakusho, "A method of recipe to cooking video mapping for automated cooking content construction", IEICE Transactions on Information and Systems (Japanese Edition) Vol.J90-D, pp. 2817-2829 , 2007. (in Japanese)

[2] T. Kokubo, S. Oyam, "Keyword Spice Method for Building Domain-specific Web Search Engines", IEEJ Transactions on Electronics, Information and Systems, Vol. 43 No.6, 2002. (in Japanese)

[3] Hiaqing Hu, Peilin Jiang, Fuji Ren, Shingo Kuroiwa, "A New Question Answering System for Chinese Restricted Domain", IEICE Transactions on Information and Systems, Vol.E89-D, No.6, pp.1848-1859, 2006.

[4] Ye Yang, Peilin Jiang, Seiji Tsuchiya, Fuji Ren, "Effect of Using Pragmatics Information on Question Answering System of Analects of

Confucius", International Journal of Innovative Computing, Information and Control, (IJICIC), 5, (5), 2009.

[5] Moriceau, V., "Generating Intelligent Numerical Answers in a Question-Answering System," Proceedings of the Fourth International Natural Language Generation Conference ACL sydney Australia, pp.103-110, July 2006.

[6] Farah Benamara, "Generating Intensional Answers in Intelligent Question Answering Systems," Lecture notes in computer science, Springer, Berlin, Allemagne , vol. 3123, pp. 11-20, 2004.

[7] Belz, Anja and Kow, Eric, "System Building Cost vs. Output Quality in Data-to-Text Generation," Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009), Athens, Greece, pp. 16-24, March, 2009.

[8] Turner, Ross and Sripada, Yaji and Reiter, Ehud: "Generating Approximate Geographic Descriptions," Proceedings of ENLG 2009, Athens, Greece, pp. 42-49, March, 2009.

[9] Ling Xia, Zhi Teng, Fuji Ren, "Question Classification for Chinese Cuisine Question Answering System," IEEJ Trans on Electrical and Electronic Engineering. Vol. 4 , No. 6, 2009. (in press)

[10] Jimmy Lin, Aaron Fernandes, Boris Katz, Gregory Marton, Stefanie Tellex: "Extracting Answers from the Web Using Knowledge Annotation and Knowledge Mining Techniques," In Proceedings of the Eleventh Text REtrieval Conference (TREC 2002), Gaithersburg, Maryland, November, 2002.

[11] Atsushi Fujii, Tetsuya Ishikawa, "Question answering using encyclopedic knowledge generated from the web," Proceedings of the workshop on ARABIC language processing: status and prospects, Toulouse, France, p.1-8, July 06, 2001.

[12] F. van Harmelen and D. Fensel, "Practical Knowledge Representation for the Web," In Proceedings of the IJCAI'99 Workshop on Intelligent Information Integration. Editors D. Fensel. 1999.

[13] A. Stent, M. Marge, and M. Singhai, "Evaluating Evaluation Methods for Generation in the Presence of Variation," In Proceedings of CICLing 2005, Mexico City, Mexico, 341-351, February 13-19, 2005.

[14] Haiqing Hu, Fuji Ren, Shingo Kuroiwa, "Chinese Automatic Question Answering System of Specific-Domain Based on Vector Space Model", IEEJ Trans. EIS, Vol,125, No.5, pp.698-706, 2005.