Arabic question answering

Preview:

Citation preview

Imam University College of Computer and Information systems

Computer sciences Department

Arabic Question Answering :by Asma Ahmad Asma alharbi

nadia AL-Mutiri Supervised by: Dr .Amal Al seef

Second semester :1434-14352013

Arabic Question Answering

Overview:O The implementation of Arabic

Question-Answering system components .

O QASAL & QARAB System components.

O Yes/No Arabic Question Answering.

ARABIQA GENERIC ARCHITECTURE

Named Entity Recognizer

O A NER system identifies proper names, temporal and numeric expressions .

O in this Arabic NER system is based ME approach.

O For the proper names recognition:

O For temporal and numeric expressions: is totally based on patterns and a small dictionary containing the names of days and months in Arabic, and numbers written in letters.

The implementation of Arabic Question-Answering

systemO NooJ is a linguistic environment that

includes large-coverage dictionaries and grammars.

O a spell-checker that corrects the most frequent errors.

O a named entity recognition tool which is set of rules described into local grammars

QASAL System components

Question analysis: this step it is apply the set of linguistic resources to the input

question.For example shows the NooJ’s text annotation structure that gives the

linguistic analysis of each word form in our sample question

Passage retrieval: The first task of this step could be the selection of one or more automatically extract the answer of the input question.

Answer Extraction: this last step uses the displayed concordance table to automatically extract the answer of the input question.

Example1 :Answer Extraction for the factoid question: متى تونس ؟استقّل0ت

Example 2:

QARAB System

f

NLB Tool

Question

Question analyzer

IR Ranked Document

s

Passage selection

Hypothesized

Answer

Al-Raya Newspape

rDocument

Answer Generati

on

full IR

system

Information Retrieval system.

O To search the document collection to select documents containing information relevant to the user’s query.

O Lundquist et al. [1999] IR system that can be constructed using a relational database management system (RDBMS).

O But in this paper it contain following database relations:

1. ROOT_TABLE.2. STEM_TABLE.3. POSTING_TABLE.4. DOCUMENT_TABLE.5. PARAGRAPH_TABLE.

The NLb system

The NLB model is:1. Tokenizer.2. type finder.3. feature finder.4. proper noun phrase parser.

How to extract the Answer

Assume the user posed the following question to QARAB:

ليس بالده بأن قال والذي الكويتي المركزي البنك محافظ هو منالميزانية؟ عجز من للحد الدينار قيمه لخفض نيه لديها

The IR return this passage . How!?

الصباح العزيز عبد سالم الشيخ الكويتي المركزي البنك محافظللحد الكويتي الدينار قيمة لخفض النية لديها ليس بالده ان أمس

الدينار . قيمة خفض بأن وقال الميزانية في المتزايد العجز من. الدولية المالية األسواق في ومصداقيتها الكويت باقتصاد سيضر

Step1: O performing token and remove the

stop word of question , Then tagging the word for POS.

Step 2:O QARAB constructs the query as a

“bag of words” and passes it to the IR system.

Exampleالكويتي محافظقال المركزي العزيز البنك عبد سالم الشيخ

لديها ليس بالده ان امس الدينار الصباح قيمة لخفض النيةلّلحد في العجزمن الكويتي خفض. الميزانيةالمتزايد بأن وقالالدينار باقتصاد قيمة االسواق الكويت سيضر في ومصداقيتها. الدولية المالية

Step 3: Determine the expected type of the answer: Who? >>> personal name .من

Step4: Generating the answer.الكويتي المركزي البنك محافظ الصباح قال العزيز عبد سالم ان الشيخ امس

في المتزايد العجز من للحد الكويتي الدينار قيمة لخفض النية لديها ليس بالدهفي. ومصداقيتها الكويت باقتصاد سيضر الدينار قيمة خفض بأن وقال الميزانية

. الدولية المالية االسواق

Yes/No ArabicQuestion

Answering

SYSTEM ARCHITECTURE:

Question Analysis module

Text retrieval module

Answer Selection module

Question AnalysisO Removing the question mark.O Removing the interrogative particleO Tokenizing: the tokenizer divides the

user question into its separate words .And normalize the (Alef) letter.

O Removing the stop words.O Removing the negation particles. (if it

exits) and set the negation property of the question representation

Question AnalysisO Tagging: to determine the type of a

word, verb or noun and obtain its root.

O Parsing: recall that the Arabic sentence after the interrogative particle is nominal or verbal.

Question AnalysisIn nominal sentence, we are interested with the

beginning noun “topic” (مبتدأ) which is the firstnoun after the interrogative particle (هل). And

the comment noun (خبر) and we can mark it as the

last noun without the article (ال).In verbal sentence we are interested with the

verb of the sentence which occur immediately after

the interrogative particle (ال) , and the subject that follow the verb.

Question Analysis

Logical Representation(With Nominal Sentences)Affirmative questions O N (Topic, root (Comment), root

({remaining words }))O N (Topic, root (Comment Synonyms),

root ({remaining words}))O ~N (Topic, root (Comment

Antonyms), root ({remaining words}))

Question AnalysisLogical Representation(With Nominal Sentences)

O Negated questions :O ~N (Topic, root (Comment), root

({remaining words}))O ~N (topic, root (Comment

Synonyms), root ({remaining words}))

O N (Topic, root (Comment Antonyms), root ({remaining words}))

Question AnalysisO Example

النافذه؟ كسرت سميره هلمبتدأ : سميره

حطمت -----> ) خبر (synonymكسرتO N(سميره, root ( كسرت),root(النافذه))O N(سميره, root (حطمت ),root(النافذه))

Question AnalysisLogical Representation(With Verbal Sentences)Affirmative questions :O V (Subject noun, root (verb), root ({remaining

words}))O V (Subject noun, root (verb Synonyms), root

({remaining words}))O ~V (Subject noun, root (verb Antonyms), root

({remaining words}))

Question Analysis

Logical Representation(With Verbal Sentences)

Negated questions O ~V (Subject noun, root (verb), root

({remaining words}))O ~ V (Subject noun, root (verb Synonyms),

root ({remaining words}))O V (Subject noun, root (verb Antonyms),

root ({remaining words}))

Question Analysis

Exampleالباب؟ محمد فتح هل

( اغلق : ---> فعل (Antonymفتحفاعل : محمد

O V(محمد, root (فتح),root(الباب))O ~V(محمد, root (اغلق),root(الباب))

Text Processing & Retrieval

They are 20 documents in corpus. This module uses two techniques to retrieve the top 5

candidate paragraphs (with variable length (that are most relevant to the user question:

O Paragraphs technique: - Split the documents into its built-in paragraphs and retrieve the top 5 paragraphs regardless from which document they are, according to some indexing scheme.

O Document technique-:Retrieve the top 5 documents after they are ranked, then use the first indexing scheme to retrieve the top 5 paragraphs.

Answer Selection & generation

After the 5 paragraphs are selected using documents technique or paragraphs technique, we need to select the best sentence to represent the answer, and accordingly generates yes or no .

Answer Selection & generation

O Split the paragraphs into their sentences .

O In normal sentences we are interested in the exact topic (مبتدأ) not its used root, so we omit each sentence that does not contain it (in the original form )In verbal sentence we are interested in the exact subject (فاعل) not its used root , so we omit each sentence that does not contain it (in the original form )

Answer Selection & generation

O In the result sentence , we look for the remaining terms (in root form) that derived from the

question in the logical representation (except the subject or the topic ), if the they exist , assign

those indexes according to their position in the sentence. So each sentence will have its own rank

as follow :Rank =last occurrence - first occurrenceO look for ( النفي negation particles in the (ادوات

selected answer (if exist).

Answer Selection & generation

O Using the selected answer and the logical representation of the question to generate yes ,or no a follows :

1. Yes ,if : The question and the answer are affirmative .The question and the answer are negated.

2. No, if :The question if affirmative and the answer are negated.The question is negated and the answer is affirmative.

EXPERIMENTS AND RESULTS

69% Arabic QA system

97.3% Arabic Q-A uses QARAB

83.3% PR system

conclusionO We have described the generic

architecture for AQ answer O compare with deferent system O How presses the question and give

the answers.

Recommended