15
Applied Mathematical Sciences, Vol. 9, 2015, no. 130, 6491 - 6505 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2015.54361 Semantics Question Analysis Model for Question Answering System Syarilla I. Ahmad Saany 1* , Ali Mamat 2 , Aida Mustapha 2 , Lilly S. Affendey 2 and M. Nordin A. Rahman 1 1 Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin (UniSZA), Tembila Campus, Besut 22200, Terengganu, Malaysia Corresponding author * 2 Faculty of Computer Science and Information Technology, Universiti Putra Malaysia (UPM), 43400 UPM Serdang, Selangor Malaysia Copyright © 2015 Syarilla I. Ahmad Saany et al. This article is distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Question Answering (QA) system takes Natural Language (NL) questions and output the exact answer extracted from Knowledge Base (KB). Handling NL question requires semantic analysis and interpretation to obtain the potential relevant answers from the KB. Providing users with the most relevant answers to their questions is an issue. Many answers returned are not relevant to the questions. The purpose of this paper is to present a semantic question analysis model for question answering system that correctly interpret the user’s question in order to yield correct answers. The Semantic Question Analysis Model exploits the approach of User Modelling and Relevance Feedback. To measure the ability of Semantic Question Analysis Model in analysing and translating user NL question, the model is implemented in an ontology-based question answering system (known as QAUF). Comparison of the results is done to determine the success of the proposed model with the existing QA systems, AquaLog and FREyA. The Semantic Question Analysis Model in QAUF shows a relatively increased in the QAUF F-measure. The finding of this study demonstrates that QAUF has a good precision percentage in returning relevant answers for each NL question. Keywords: Semantic Question Analysis Model Question Answering System Ontology

Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

Applied Mathematical Sciences, Vol. 9, 2015, no. 130, 6491 - 6505

HIKARI Ltd, www.m-hikari.com

http://dx.doi.org/10.12988/ams.2015.54361

Semantics Question Analysis Model for

Question Answering System

Syarilla I. Ahmad Saany1*, Ali Mamat2, Aida Mustapha2,

Lilly S. Affendey2 and M. Nordin A. Rahman1

1 Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin

(UniSZA), Tembila Campus, Besut 22200, Terengganu, Malaysia

Corresponding author*

2Faculty of Computer Science and Information Technology, Universiti Putra

Malaysia (UPM), 43400 UPM Serdang, Selangor Malaysia

Copyright © 2015 Syarilla I. Ahmad Saany et al. This article is distributed under the Creative

Commons Attribution License, which permits unrestricted use, distribution, and reproduction in

any medium, provided the original work is properly cited.

Abstract

Question Answering (QA) system takes Natural Language (NL) questions and

output the exact answer extracted from Knowledge Base (KB). Handling NL

question requires semantic analysis and interpretation to obtain the potential

relevant answers from the KB. Providing users with the most relevant answers to

their questions is an issue. Many answers returned are not relevant to the

questions. The purpose of this paper is to present a semantic question analysis

model for question answering system that correctly interpret the user’s question in

order to yield correct answers. The Semantic Question Analysis Model exploits

the approach of User Modelling and Relevance Feedback. To measure the ability

of Semantic Question Analysis Model in analysing and translating user NL

question, the model is implemented in an ontology-based question answering

system (known as QAUF). Comparison of the results is done to determine the

success of the proposed model with the existing QA systems, AquaLog and

FREyA. The Semantic Question Analysis Model in QAUF shows a relatively

increased in the QAUF F-measure. The finding of this study demonstrates that

QAUF has a good precision percentage in returning relevant answers for each NL

question.

Keywords: Semantic Question Analysis Model Question Answering System

Ontology

Page 2: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

6492 Syarilla I. Ahmad Saany et al.

1. Introduction

Question answering (QA) resides within Information Retrieval (IR) and Natural

Language Processing (NLP) research areas. Information retrieval (IR) was coined

in [3] as the process or method to pull out a list of document citation containing

the needed information from the storage. At this present, many changes and

improvements are done to IR which resulted the retrieval list may in images,

multimedia, web pages and accurate answers as opposed as a list of documents’

citation. Examples of IR systems are Web’s most popular search engines such as

Google, Bing or Yahoo. Any IR system aims to seek texts, images videos or any

types of medium that matched with user’s query [14]. The user’s query or also can

be known as user’s specified needs, is conveyed in keywords manner. However,

this user’s specified needs may encounter loss of semantic information. The IR

system only yields the most related (matched) links of the documents list. This

method could not retrieve the exact answer of the user’s query except the user has

to search, locate and extract the necessary and adequate information from the

returned documents as opposed to QA.

QA is an IR that involves NLP mechanisms. NLP was formed firstly with the aim

of studying problems in the automatic generation and understanding of natural

language in artificial intelligence and linguistic [2]. NLP creates an easy and

friendly interface known as natural language interface (NLI). NLI eliminates the

need to understand a particular query language or query syntax. Research in NLI,

such as in [1] [8] [20] [24], are exponentially fostered with the availability of

many NLP tools for text processing.

QA system is a system that takes NL questions and the output is either the small

text fragments containing the answer or the exact answer extracted from

texts/documents or Knowledge Base (KB) [17]. It captures the piece of

information from the repository (KB or corpus) that actually matches the user’s

information need. The goal is to return an accurate answer for any user’s NL

question. Essentially, QA system needs to have the ability to linguistically

analyse, interpret and process a NL question before an accurate answer can be

returned. This is not an easy task to automatically capture the semantics lies in a

complex question structure. In the context of this research a question is

considered to be simple if the answer is a piece of information that has been

located and retrieved directly as it appears in the information source. On the other

hand, a question is considered complex if its answer needs more on elaboration

[7]. The semantic information contained in the user Natural Language (NL)

question may miss or lose during the question analysis process.

Handling NL question requires semantic analysis, interpretation and

transformation into an executable query so that the potential answers can be

obtained from the corpus or knowledge base. The purpose of this paper is to

present a semantic question analysis model for question answering system.

Page 3: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

Semantics question analysis model 6493

The proposed of Question Analysis Model (QAM) system model employs the

approach of UM and RF approach. The remainder of this paper is prepared as

follows: Section 2 reviews the related works in QA system. The proposed model

and result analysis are presented in Section 3. Section 4 sums up the paper with

conclusions and future works.

2. Related Works

Generally, the QA systems consist of three main modules: question analysis,

document/information processing and answer. User’s NL question is analysed and

processed. The QA system has to understand exactly what the question is in order

to extract the answer from the text corpus or knowledge base (KB). The keyword

(focus) and the answer type of the question is taken from the extracted NL

question. The keyword (focus) of the question is used as the input to other

modules in QA system and an answer type is a specification of type of entity that

would comprise a potential answer to the question. Figure 1 illustrates the general

components of QA system.

Figure 1: General Components for QA system

A question-answering (QA) system takes in user questions in natural language

(NL). This NL question is handled through three stages, which are question

analysis, document processing, and answer processing. Other supporting

components to a QA system include the knowledge base (KB), ontology and

WordNet. The document analysis component obtains some potential answer and

throws to the answer processing component.

noitseuQ

gnesoiisuP

esnmoutQ

gnessoisuP

Text

Fragments

/ Passage Sources of

Answer

Answer

esnmouti

BK

NL Question

uiwonQ

gnesoiisuP

Query

Page 4: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

6494 Syarilla I. Ahmad Saany et al.

Traditionally, QA systems run on text document collections to get answers that

match with user’s query. In recent QA systems which are ontology-based QA

systems use Knowledge Bases (KBs) as their sources of answers [6] [12] [21].

This can be seen in the works of [5] [13] [14] [16] [21] and many more. During

the early time, many QA systems used ontology as a mechanism to support query

expansion [21]. In recent studies of QA systems, KB is exploited not only for

answer searching but also used in mapping and transforming user’s NL question

into query representation.

In general, ontology provides general understanding of the domain knowledge,

common vocabulary used and specific definition of the relation between these

vocabularies. Knowledge of the domain is described by using concepts-based

[10]. In recent, improvements in semantic capability of QA have become one of

the research focuses. Many research have shown that ontology promotes the

semantic capability of a QA system which may eliminate the need to build

heuristics to recognize name entities and question classification [5] [14] [21].

AquaLog is the first ontology-based QA system that relies on the knowledge

encoded in the underlying ontology and its explicit semantics for question

analysis and answer retrieval [20]. In AquaLog, there are two components which

are the Linguistic Component (LC) and the Relation Similarity Service (RSS)

component. LC transforms the NL-question into NL query-triple format form.

The RSS component maps the query-triple format with ontology elements

generating the Ontology-Compliant queries known as Onto-Triples. AquaLog

only obtained 48.68% correct answers out of 76 total questions. This precision

score has opened up many research issues. One of the drawbacks is, AquaLog

failed to analyse and process user’s question containing modifier term. This

contributes to the lower percentage in the precision score obtained.

The stochastic syntax-parse model named Lexical Semantic Frame (LSF) is

adopted in [14]. The basis of this model is on analysing semantic probability

through the function of semantic relation, P (LSF | wi … wj) where wi is among

words of character string s. Both words in corpus and the Hyponym part-of-

speech (POS) information are used in estimating semantic probability, P (LSF | wi

… wj). They apply the pattern matching technique on user’s question with the

ontology. They also introduce a novel question classification hierarchy that is

based on answer type and question pattern. During the situation of no answer, the

system will expand the ontology in an attempt to provide the answer.

Nevertheless, the consistency and possibility of self-produced knowledge based

on ontology semantic relations have become one of the drawbacks.

FREyA is a Feedback Refinement and Extended Vocabulary Aggregation QA

system which combines syntactic parsing with knowledge in ontology to reduce

the customization effort [6]. To understand the user’s question, this system relies

on knowledge encoded in the ontology. The user’s NL question is translated into

Page 5: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

Semantics question analysis model 6495

the set of Ontology Concepts (OCs). The user question is mapped to the Ontology

Concept (OC) automatically or semi-automatically. However, different users may

interact differently with the suggestion given by FREyA for the ambiguous

question phrases or terms. This makes FREyA’s method requires heavy

supervision to be utilized. FREyA always returns the answer to user’s question

although the answers returned are partial correct or incorrect. FREyA obtains

92.4% of both recall and precision values [5].

The intention of this research is to semantically analyse, interpret and transform

user question into executable queries for a better performance QA system. Here, a

better performance QA means the system is able to return a correct answer based

on user’s intent question. The first essential task is the understanding of the users’

needs and users’ information seeking behaviour. The second essential task is the

analysing and processing of the users’ needs expressed in a question (request).

Lastly, the third task is providing a strategy for matching of the user’s question to

data or information on the document collections or knowledge base.

The investigations of this paper are embedded in the following research questions:

i. How users’ needs and information seeking shall be understood from the user’s

NL question?

ii. How shall users’ needs that are expressed in a question be analysed, interpreted

and processed?

iii. How shall the user’s question be matched for the answers on the knowledge

base?

3. The Semantic Question Analysis Model

The general context of a question-answering (QA) system formalizes U, as user’s

natural language (NL) query. U holds a specific condition of retrieving a non-

empty set of answer from a knowledge base; here, a knowledge base comprises a

set of information labeled as I = {i1, i2, …, in} and a set of Si I, consists of

information that is totally (or partially) relevant to the user’s requirement [18]19].

To yield Si, the QA system depends on the user requirements and all information

in the knowledge base, which denotes in the form of function f: (U, I).

In this paper, Semantic Question Analysis Model is implemented in an ontology-

based QA system called QAUF (Question Answering system with User Modeling

and Relevance Feedback), which models the user requirement, U, as a

combination of three elements: NL query, user modeling, and relevance feedback.

This is denoted as U = (Unl-query, Uuser-modeling, Urelevance-feedback). Within this context,

[18] [19] defined each element as:

Unl-query is the goal/answer to be retrieved

Uuser-modeling is a set of user interest, defined as Uuser-modeling = {knowledge

base concept, question context, language theory}

Page 6: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

6496 Syarilla I. Ahmad Saany et al.

Urelevance-feedback is a set of interactions with the system Urelevance-feedback =

{modification of term weight, query expansion, query simplification}

A complete function of the proposed QA system model is defined in [18] [19] as

follows:

f: (Unl-query, Uuser-modeling, Urelevance-feedback, I) Si (1)

In the following sub-section, details explanation of the question analysis process

will be presented.

3.1 The Semantic Question Analysis Model and Process Flow

Generally the combination of User Modelling (UM) and Relevance Feedback

(RF) in question analysis model will allow additional information to be obtained

from the user and user’s NL question [18][19]. Additional information is needed

to enrich the information description of the question and to avoid any ambiguities

in user’s question. As a result, this contributes to the correct and accurate answer

yielded. Figure 2 depicts that the Semantic Question Analysis Model.

Figure 2: The Semantic Question Analysis Model [18][19].

The above model uses user modelling to acquire user interest and profile in

categorizing the answers. Relevance feedback is utilized in the second phase of

question analysis process in order to better up the user’s question specifications.

User’s NL question consists of words which also referred as terms in this

research. Terms of user’s question are extracted in order to proceed with further processes of finding the correct answer. Based on Figure 2, terms focus, head focus,

head focus, modifier of

head focus, focus complement, user

profile

User modeling process

<obtaining user interest and

profile greedily>

Relevance feedback process

<query modifying

mechanism>

Knowledge

base

Ontology

WordNet

Lookup catalog

<New query vector>

Page 7: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

Semantics question analysis model 6497

modifier term, modified term, focus complement and user profile become input to

the Question Analysis Model. This model considers four (4) lexical elements

which are focus, head focus, modifier term and modified term. Example and

details explanation of lexical elements terms are illustrated in [18] [19].

The user’s NL question is also extracted its triple which is in the form of <subject,

verb, object>. A number of pre-requirements components are also needed such as

knowledge base, ontology, WordNet and Modifier Lookup [18] [19].

3.2 Identifying and Extracting Terms

In this model, a user’s NL question is received and analysed. The submitted NL

question is denoted using vector space model as follows:

Q = {(T1, W1), (T2, W2), …, (Tn, Wn)} (2)

Where, T represents the terms that exist in the question which include

{head_focus, modifier_head_focus, focus_complement}. W is the weight of T.

The weight of each term is calculated using the formula tf × idf where tf is the

term frequency in knowledge base and idf is the inverse term frequency in the

whole knowledge base collections.

First, the NL question will be tokenized and parsed for its Part-Of-Speech (POS)

using Tokenizer and POS tagger respectively. NL question such as “Could you

tell me what is the highest point in the state of Oregon?” will be tokenized and

parsed. In this research, NL question is based on the Raymond Mooney gold

standard dataset. The selected NL questions that are used for the experiment must

contain the modifier term. Language rules are exploited in identifying and

extracting the substantial terms of NL question. Later, an initial user profile will

be auto-generated using a language model.

3.3 Applying User Modelling (UM) in User’s Question

UM is applied during question analysis in order to enrich user’s question with

additional terms. This is to ensure that all the terms that carry semantic meanings

of the original user’s question are regarded. In constructing the user profile, a set

of knowledge sources needs to be identified. There are three (3) methods used in

constructing user profile attributes: default assumptions, assumption and dialog

contribution from system [23]. Based on these methods, the model intends to

yield user interest based on three (3) aspects using assumption methods:

knowledge base concept, question context and language theory.

As mentioned previously, an initial user profile will be auto-generated based on

question context using language model. NL user’s question and the initial user

profile will be used to produce new query. In UM process, the modifier terms will

Page 8: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

6498 Syarilla I. Ahmad Saany et al.

be searched for their semantic similarity using WordNet, Knowledge Base,

ontology and Modifier LookUp. These terms will be added to the initial profile

which will be used to generate new query. The new query is represented in VSM

as follows:

Q’ = {(C1, W1), (C2, W2), …(Cn, Wn)} (3)

Where, C denotes the instances to the KB class concept of query or terms and W

is the weight of C. Figure 3 summarizes the steps of the UM phase.

Algorithm : User Modelling phase

Input : user profile, knowledge base, assumption from question context

and language theory, question (Q)

Output : A query in vector space form (Q’)

1. Start

2. Q’

3. If (user profile is available) then

i. Construct the possible Q’ that related to Q,

Q’ Q’ + (ci, wi)

4. Initialize knowledge base for ontology analysis

i. Generate Q’, Q’ Q’ + (ci, wi)

5. Apply the assumption method based on query context and language theory

i. Generate Q’, Q’ Q’ + (ci, wi)

6. End

Figure 3: Algorithm of Applying UM on User’s Question

As an example, the user’s NL question: “What are the cities of the state with the

highest point?” gives Unl-question = {city, state, highest, point}.The items of Unl-

question are terms of the lexical elements. User question also has the triple format as

<city, are, state>. This triple is mapped to the concepts and relationships from the

knowledge base. As for the subject, city is mapped to any corresponding class. For

the verb, are is mapped to any corresponding property in the knowledge base

which has relationship with the class city.

3.4 Applying Relevance Feedback (RF) in User’s Query

In the RF process (Phase 2), the model considers three (3) aspects of modifying

the query representation (query generated from phase 1). They are modification of

term weight, question expansion and question simplifying. In Phase 2, based on

user’s question, if necessary, user is allowed to modify the weight associated to

Page 9: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

Semantics question analysis model 6499

the question term. This will contribute to the formation of the new query vector.

To avoid ambiguities in user’s question, additional information is also acquired in

regards of the question entered through inserting new terms. New terms are

suggested by consulting WordNet, Knowledge Base, Ontology and Modifier

LookUp. At the end of Phase 2, another new following set of solutions (new query

vector) will be produced which represented in VSM:

𝑄" = {𝑞1" , 𝑞2

" , 𝑞3" … , 𝑞𝑛

" } (4)

Thus, the set of solutions can be represented as follows:

qi”= {{(T1,i, ∆W1,i), (T2,i, ∆W2,i), …, (Tn,i, ∆Wn,i)},{ (Tn+1,i, Wn+1,i),

(Tn+2,i, Wn+2,i), …, (Tk,i, Wk,i)}} (5)

Where, ∆W represents the change of weight value, and Tn+1,i, Tn+2,i, Tk,i are the

new inserted terms that associated with their weight Wn+1,i, Wn+2,i, Wk,I

respectively and i = 1, 2, 3,…n and k > n [19].

To determine the best solution to be used, the initial NL question Q is compared

with all solutions 𝑄′′ and the similarity scores are computed. The similarity score

formula used is as follows:

)

1()",( imW

mWQQSim

k

m

(6)

The best possible of 𝑞𝑖"(that is the nearest similar to Q) is selected from all

possible Q’’. Here,𝑞𝑖"that has the highest similarity score is recognized as nearest

similar to Q. If 𝑄𝑛𝑒𝑤 is the new query vector that is submitted for retrieving

answer, thus𝑄𝑛𝑒𝑤 can be denoted as follows:

𝑄𝑛𝑒𝑤 = (𝑟𝑢𝑙𝑒, ℎ𝑖𝑔ℎ𝑒𝑠𝑡(𝑞𝑖")) (7)

where, 𝑟𝑢𝑙𝑒 is the rule obtained from the Modifier Lookup during the stage of

applying UM technique and will be applied to the query containing the modifier

and modified terms and ℎ𝑖𝑔ℎ𝑒𝑠𝑡(𝑞𝑖”) is the nearest similar query vector that

generated in the stage of applying RF technique. Figure 4 summarizes the steps of

the RF phase

Page 10: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

6500 Syarilla I. Ahmad Saany et al.

Algorithm : Relevance Feedback phase

Input : knowledge base, WordNet, ontology, Modifier Lookup, question (Q)

Output : A set of modified query in vector space form (Q”)

1. Start

2. Flag1 1

3. While (Flag1) // generating of i possible Q”

i. i 1

ii. 𝑄" ← 𝑄

iii. Map with knowledge base for terms weight modification

a. 𝑄" ← ∆𝑄

iv. Flag2 1

v. While (Flag2) // inserting of k possible new terms

a. k 1

b. Map with WordNet, ontology and Modifier Lookup for

new terms

insertion - 𝑞𝑖" ← 𝑞𝑖

" + (𝑇𝑘, 𝑊𝑘)

c. k k + 1

d. If (no more possible new terms can be inserted)

then - Flag2 0

vi. ii + 1

vii. If (no more possible Q” can be generated) then

a. Flag1 0

4. End

Figure 4: Algorithm of Applying RF Technique

4. Experimental Result

This research uses the Raymond Mooney dataset in Geobase ontology of United

States geographical information which has a total of 880 annotated user questions

dataset [11]. A review of relevant literature has shown that the Raymond Mooney

dataset is extensively used for the evaluation of ontology-based QA systems and

other natural language interfaces systems [4][5][16][22]. Out of 880 questions

dataset, 607 questions containing modifier terms will be used in this study.

To measure the ability of Semantic Question Analysis Model in analysing and

translating user NL question, the model is implemented in an ontology-based

question answering system (known as QAUF). Comparison of the results is done

to determine the effectiveness of the proposed model as compared to the existing

QA systems, AquaLog [21] and FREyA [6]. This is to determine the ability of

Semantic Question Analysis Model in analysing and interpreting user’s NL

question into executable query which resulting an accurate returned answers. The

Page 11: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

Semantics question analysis model 6501

evaluation metrics are similar to those used in [6][21], which are the percentage of

quantitative retrieval performance recall and precision [15].

The experimental settings for this research are as follows:

i. Experiment A:

Experiment A is to test the performance of AquaLog [21] using the gold standard

dataset.

ii. Experiment B:

Experiment B is to test the performance of FREyA [6] using the gold standard

dataset.

ii. Experiment C:

Experiment C is to test the effectiveness of user model and user feedback in

question analysis process. The objective of this experiment is to quantify the

accuracy of returned answer by incorporating user model and user’s relevance

feedback during question analysis process. User model and user’s relevance

feedback are the experimental parameters which will be evaluated by determining

the accuracy of returned answer when additional information is provided to

disambiguate any ambiguities during question analysis process.

These experiments are using the same group of dataset i.e. the Raymond Mooney

question dataset over US geography ontology and knowledge base (KB). Table 1

summarizes the overall experiment results on 607 questions from Raymond

Mooney Geoquery dataset. The experimental results show significantly impacts

on the success of 90 % relevant and correct returned answers as shown in Table 1.

Table 1: Overall Performance of QAUF Average

R

AP

F-Measure MAP

Applying UM

and RF 0.947 0.947 0.947 0.9

This question analysis model generated 94.7 % average recall and average

precision. This is due to several users’ questions that cannot be analysed and

processed by the proposed model such as 'Which capitals are not major cities?',

'What is the lowest point in Nebraska in meters?', 'How many square kilometres in

the US?', 'What is the average population per square km in Pennsylvania?' and

'What is the longest river that passes the states that border the state that borders

the most states?'.

Experimental evidence shows that question that falls under negation-type question

(e.g.: 'Which capitals are not major cities?'), question contains explicit

measurement unit (e.g.: 'What is the average population per square km in

Pennsylvania?' ) and complex question that requires more than two level in

comparative or evaluative processing (e.g.: 'What is the longest river that passes

the states that border the state that borders the most states?') failed to get relevant

and correct answer from QAUF.

Page 12: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

6502 Syarilla I. Ahmad Saany et al.

Table 2: Performance Comparison on QAUF, AquaLog and FREyA

Average R AP F-

measure

QAUF 0.947 0.947 0.947

AquaLog 0.43 0.42 0.42

FREyA 0.924 0.924 0.924

From Table 2, AquaLog shows a relatively poor performance in comparison of

FREyA and QAUF. This is because, AquaLog does not have the facility to

process the type of how much/how many questions and questions contain

modifier. Whereas, FREyA, generated slightly lower in precision as compared to

QAUF. Experimental evidence suggests that the proposed question analysis model

outperforms AquaLog and FREyA.

In the Semantic Question Analysis Model, UM is utilized for filtering and

classifying the answers of user NL question based on the KB concept, question

context or/and the language theory. Then, RF is employed in modifying the query

representation based on the term’s weight, query expansion or query simplifying.

5. Conclusion and Future Works

The extracted lexical of the user’s question may contain some terms that influence

the correctness of the answer returned. This term is referred as modifier term. The

main objective of this research is proposing a new a question analysis model to

correctly interpret all the modifier terms of the user’s question in order to yield

correct answers in the prototype of QAUF (Question Answering system with User

modelling and relevance Feedback).The Semantic Question Analysis Model in

QAUF shows a relatively increased in the F-measure where QAUF is 94.7%,

FREyA is 92.4% and Aqualog is only 42%. The finding of this study

demonstrates that QAUF has a good precision percentage in returning relevant

answers for each NL question.

In the near future, QAUF aims to implement an automatic modification,

expansion and simplifying of the user’s NL question by exploiting the machine

learning mechanism.

Acknowledgements. Special thanks to Universiti Sultan Zainal Abidin (UniSZA),

Centre for Research and Innovation Management (CRIM), UniSZA, Universiti

Putra Malaysia (UPM) and Management of Faculty of Informatics and Computing

(FiK), UniSZA for their supports in this research.

References

[1] A.M. Popescu, O. Etzioni, & H. Kautz, Towards A Theory of Natural

Language Interfaces to Databases, International Conference on Intelligent User

Interfaces, (2003), 149-157. http://dx.doi.org/10.1145/604045.604070

Page 13: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

Semantics question analysis model 6503

[2] A. Turing, Computing Machinery and Intelligence, Mind, LIX (1950), no.

236, 433 – 460. http://dx.doi.org/10.1093/mind/lix.236.433

[3] C.N. Mooers, Zatocoding Applied to Mechanical Organization of Knowledge,

American Documentation, 2 (1951), no. 1, 20-32.

http://dx.doi.org/10.1002/asi.5090020107

[4] C. Wang, M. Xiong, Q. Zhou, & Y. Yu, PANTO: A Portable Natural

Language Interface to Ontologies, European Semantics Web Conference

(ESWC'07), 4519 (2007), 473-487.

http://dx.doi.org/10.1007/978-3-540-72667-8_34

[5] D. Damljanovic, M. Agatonovic, & H. Cunningham, Natural language

interface to ontologies: Combining syntactic analysis and ontology-based lookup

through the user interaction, Proceedings ESWC-2010, Part I, LNCS, 6088 (2010),

106-120. http://dx.doi.org/10.1007/978-3-642-13486-9_8

[6] D. Damljanovic, M. Agatonovic, and H. Cunningham, FREyA: an Interactive

Way of Querying Linked Data Using Natural Language, Proceedings of the

European Semantic Web Conference, (2012), 125-138.

http://dx.doi.org/10.1007/978-3-642-25953-1_11

[7] D. Mollá-Aliod, & J. Vicedo, Question answering, Indurkhya and Damerau

(eds) Handbook of Natural Language Processing, (2010), 485-510.

[8] D. Moldovan, S. Harabagiu, R. Girju, P. Morarescu, F. Lacatusu, A. Novischi,

et al., LCC Tools for Question Answering, Proceedings of TREC, (2002).

[9] E.M. Voorhees, The TREC question answering track, Natural Language

Engineering, 7 (2001), no. 4, 361-378.

http://dx.doi.org/10.1017/s1351324901002789

[10] J. Heflin, OWL Web Ontology Language Use Cases and Requirements,

(2004). http://www.w3.org/TR/2004/REC-webont-req-20040210/

[11] MooneyData, Retrieved from.

http://www.cs.utexas.edu/~ml/nldata/geoquery.html

[12] O. Fernandez, R. Izquierdo, S. Ferrandez, and J.L. Vicedo, Addressing

Ontology-based question answering with collections of user queries, Journal of

Information Processing and Management, 45 (2009), no. 2, 175-188.

http://dx.doi.org/10.1016/j.ipm.2008.09.001

[13] P. Atzeni, R. Basili, D. H. Hansen, P. Missie, P. Paggio, M.T. Pazienza, et

al., Ontology-based Question Answering in a Federation of University Sites: The

Page 14: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

6504 Syarilla I. Ahmad Saany et al.

MOSES Case Study, 9th International Conference on Applications of Natural

Language to Information Systems (NLDB '04), Manchester (United Kingdom),

Lecture Notes in Computer Science, 3136 (2004), 413-420.

http://dx.doi.org/10.1007/978-3-540-27779-8_40

[14] Q. Guo, & M. Zhang, Question Answering System Based on Ontology and

Semantic Web, Third International Conference of Rough Set and Knowledge

Technology 2008, Chengdu, China, 5009 (2008), 652-659.

http://dx.doi.org/10.1007/978-3-540-79721-0_87

[15] R. Baeza-Yates, & B. Ribeiro, Modern Information Retrieval, Harlow:

Addison-Wesley (1999).

[16] R. Iqbal, M.A.A. Murad, M.H. Selamat, & A. Azman, Negation Query

Handling Engine For Natural Language Interfaces to Ontologies, International

Conference on Information Retrieval and Knowledge Management (CAMP '12),

(2012), 249-253. http://dx.doi.org/10.1109/infrkm.2012.6204983

[17] S. Harabagiu, D. Moldovan, C. Clark, M. Bowden, A. Hickl, P. Wang,

Employing Two Question Answering Systems in TREC-2005, Proceedings of the

Fourteenth Text REtrieval Conference, (2005).

[18] S. I. A Saany, A. Mamat, A. Mustapha, L. S. Affendey, A Strategy for

Question Interpretation in Question Answering System, International Journal of

Computer Science and Telecommunications, 4 (2013), no. 5, 38-43.

[19] S.I.A. Saany, A. Mamat, A. Mustapha, L.S. Affendey, Incorporating User

Modeling and Relevance Feedback in Question Analysis Model, Proc. in 2nd 2013

IEEE Conference on Control, Systems and Industrial Informatics (ICCSII),

Bandung, Indonesia, (2013), 113- 118.

[20] T. Strzalkowski, J.P. Carballo, J. Karlgren, A. Hulth, P. Tapanainen, & T.

Lahtinen, Natural Language Information Retrieval: TREC-8 Report, In TREC,

(1999).

[21] V. Lopez, V. Uren, E. Motta, & M. Pasin. AquaLog: An Ontology-driven

Question Answering System for Organizational Semantic Intranets, Journal of

Web Semantics, 5 (2007), no. 2, 72-105.

http://dx.doi.org/10.1016/j.websem.2007.03.003

[22] V. Tablan, D. Damljanovic, & K. Bontcheva, A Natural Language Query

Interface to Structured Information, European Semantic Web Conference (ESWC

2008), 5021 (2008), 361-375.

http://dx.doi.org/10.1007/978-3-540-68234-9_28

Page 15: Semantics Question Analysis Model for Question Answering ... · AquaLog is the first ontology-based QA system that relies on the knowledge encoded in the underlying ontology and its

Semantics question analysis model 6505

[23] W. Wahlster, & A. Kobsa, User Models in Dialog Systems, Heidelberg:

Springer, (1989). http://dx.doi.org/10.1007/978-3-642-83230-7

[24] X. Li, & D. Roth, Learning Question Classifiers, Proceedings of the 19th

International Conference on Computational Linguistics - Volume 1 (COLING

'02), Association for Computational Linguistics, Stroudsburg, PA, USA, 1 (2002),

1-7. http://dx.doi.org/10.3115/1072228.1072378

Received: April 15, 2015; Published: November 2, 2015