35
Imam University College of Computer and Information systems Computer sciences Department Arabic Question Answering : by Asma Ahmad Asma alharbi nadia AL-Mutiri Supervised by: Dr .Amal Al seef Second semester :1434-1435 2013

Arabic question answering

Embed Size (px)

Citation preview

Page 1: Arabic question answering

Imam University College of Computer and Information systems

Computer sciences Department

Arabic Question Answering :by Asma Ahmad Asma alharbi

nadia AL-Mutiri Supervised by: Dr .Amal Al seef

Second semester :1434-14352013

Page 2: Arabic question answering

Arabic Question Answering

Overview:O The implementation of Arabic

Question-Answering system components .

O QASAL & QARAB System components.

O Yes/No Arabic Question Answering.

Page 3: Arabic question answering

ARABIQA GENERIC ARCHITECTURE

Page 4: Arabic question answering

Named Entity Recognizer

O A NER system identifies proper names, temporal and numeric expressions .

O in this Arabic NER system is based ME approach.

O For the proper names recognition:

O For temporal and numeric expressions: is totally based on patterns and a small dictionary containing the names of days and months in Arabic, and numbers written in letters.

Page 5: Arabic question answering

The implementation of Arabic Question-Answering

systemO NooJ is a linguistic environment that

includes large-coverage dictionaries and grammars.

O a spell-checker that corrects the most frequent errors.

O a named entity recognition tool which is set of rules described into local grammars

Page 6: Arabic question answering

QASAL System components

Page 7: Arabic question answering

Question analysis: this step it is apply the set of linguistic resources to the input

question.For example shows the NooJ’s text annotation structure that gives the

linguistic analysis of each word form in our sample question

Page 8: Arabic question answering

Passage retrieval: The first task of this step could be the selection of one or more automatically extract the answer of the input question.

Page 9: Arabic question answering

Answer Extraction: this last step uses the displayed concordance table to automatically extract the answer of the input question.

Example1 :Answer Extraction for the factoid question: متى تونس ؟استقّل0ت

Page 10: Arabic question answering

Example 2:

Page 11: Arabic question answering

QARAB System

f

NLB Tool

Question

Question analyzer

IR Ranked Document

s

Passage selection

Hypothesized

Answer

Al-Raya Newspape

rDocument

Answer Generati

on

full IR

system

Page 12: Arabic question answering

Information Retrieval system.

O To search the document collection to select documents containing information relevant to the user’s query.

O Lundquist et al. [1999] IR system that can be constructed using a relational database management system (RDBMS).

O But in this paper it contain following database relations:

1. ROOT_TABLE.2. STEM_TABLE.3. POSTING_TABLE.4. DOCUMENT_TABLE.5. PARAGRAPH_TABLE.

Page 13: Arabic question answering

The NLb system

The NLB model is:1. Tokenizer.2. type finder.3. feature finder.4. proper noun phrase parser.

Page 14: Arabic question answering

How to extract the Answer

Assume the user posed the following question to QARAB:

ليس بالده بأن قال والذي الكويتي المركزي البنك محافظ هو منالميزانية؟ عجز من للحد الدينار قيمه لخفض نيه لديها

The IR return this passage . How!?

الصباح العزيز عبد سالم الشيخ الكويتي المركزي البنك محافظللحد الكويتي الدينار قيمة لخفض النية لديها ليس بالده ان أمس

الدينار . قيمة خفض بأن وقال الميزانية في المتزايد العجز من. الدولية المالية األسواق في ومصداقيتها الكويت باقتصاد سيضر

Page 15: Arabic question answering

Step1: O performing token and remove the

stop word of question , Then tagging the word for POS.

Page 16: Arabic question answering

Step 2:O QARAB constructs the query as a

“bag of words” and passes it to the IR system.

Page 17: Arabic question answering

Exampleالكويتي محافظقال المركزي العزيز البنك عبد سالم الشيخ

لديها ليس بالده ان امس الدينار الصباح قيمة لخفض النيةلّلحد في العجزمن الكويتي خفض. الميزانيةالمتزايد بأن وقالالدينار باقتصاد قيمة االسواق الكويت سيضر في ومصداقيتها. الدولية المالية

Step 3: Determine the expected type of the answer: Who? >>> personal name .من

Step4: Generating the answer.الكويتي المركزي البنك محافظ الصباح قال العزيز عبد سالم ان الشيخ امس

في المتزايد العجز من للحد الكويتي الدينار قيمة لخفض النية لديها ليس بالدهفي. ومصداقيتها الكويت باقتصاد سيضر الدينار قيمة خفض بأن وقال الميزانية

. الدولية المالية االسواق

Page 18: Arabic question answering

Yes/No ArabicQuestion

Answering

Page 19: Arabic question answering

SYSTEM ARCHITECTURE:

Question Analysis module

Text retrieval module

Answer Selection module

Page 20: Arabic question answering

Question AnalysisO Removing the question mark.O Removing the interrogative particleO Tokenizing: the tokenizer divides the

user question into its separate words .And normalize the (Alef) letter.

O Removing the stop words.O Removing the negation particles. (if it

exits) and set the negation property of the question representation

Page 21: Arabic question answering

Question AnalysisO Tagging: to determine the type of a

word, verb or noun and obtain its root.

O Parsing: recall that the Arabic sentence after the interrogative particle is nominal or verbal.

Page 22: Arabic question answering

Question AnalysisIn nominal sentence, we are interested with the

beginning noun “topic” (مبتدأ) which is the firstnoun after the interrogative particle (هل). And

the comment noun (خبر) and we can mark it as the

last noun without the article (ال).In verbal sentence we are interested with the

verb of the sentence which occur immediately after

the interrogative particle (ال) , and the subject that follow the verb.

Page 23: Arabic question answering

Question Analysis

Logical Representation(With Nominal Sentences)Affirmative questions O N (Topic, root (Comment), root

({remaining words }))O N (Topic, root (Comment Synonyms),

root ({remaining words}))O ~N (Topic, root (Comment

Antonyms), root ({remaining words}))

Page 24: Arabic question answering

Question AnalysisLogical Representation(With Nominal Sentences)

O Negated questions :O ~N (Topic, root (Comment), root

({remaining words}))O ~N (topic, root (Comment

Synonyms), root ({remaining words}))

O N (Topic, root (Comment Antonyms), root ({remaining words}))

Page 25: Arabic question answering

Question AnalysisO Example

النافذه؟ كسرت سميره هلمبتدأ : سميره

حطمت -----> ) خبر (synonymكسرتO N(سميره, root ( كسرت),root(النافذه))O N(سميره, root (حطمت ),root(النافذه))

Page 26: Arabic question answering

Question AnalysisLogical Representation(With Verbal Sentences)Affirmative questions :O V (Subject noun, root (verb), root ({remaining

words}))O V (Subject noun, root (verb Synonyms), root

({remaining words}))O ~V (Subject noun, root (verb Antonyms), root

({remaining words}))

Page 27: Arabic question answering

Question Analysis

Logical Representation(With Verbal Sentences)

Negated questions O ~V (Subject noun, root (verb), root

({remaining words}))O ~ V (Subject noun, root (verb Synonyms),

root ({remaining words}))O V (Subject noun, root (verb Antonyms),

root ({remaining words}))

Page 28: Arabic question answering

Question Analysis

Exampleالباب؟ محمد فتح هل

( اغلق : ---> فعل (Antonymفتحفاعل : محمد

O V(محمد, root (فتح),root(الباب))O ~V(محمد, root (اغلق),root(الباب))

Page 29: Arabic question answering

Text Processing & Retrieval

They are 20 documents in corpus. This module uses two techniques to retrieve the top 5

candidate paragraphs (with variable length (that are most relevant to the user question:

O Paragraphs technique: - Split the documents into its built-in paragraphs and retrieve the top 5 paragraphs regardless from which document they are, according to some indexing scheme.

O Document technique-:Retrieve the top 5 documents after they are ranked, then use the first indexing scheme to retrieve the top 5 paragraphs.

Page 30: Arabic question answering

Answer Selection & generation

After the 5 paragraphs are selected using documents technique or paragraphs technique, we need to select the best sentence to represent the answer, and accordingly generates yes or no .

Page 31: Arabic question answering

Answer Selection & generation

O Split the paragraphs into their sentences .

O In normal sentences we are interested in the exact topic (مبتدأ) not its used root, so we omit each sentence that does not contain it (in the original form )In verbal sentence we are interested in the exact subject (فاعل) not its used root , so we omit each sentence that does not contain it (in the original form )

Page 32: Arabic question answering

Answer Selection & generation

O In the result sentence , we look for the remaining terms (in root form) that derived from the

question in the logical representation (except the subject or the topic ), if the they exist , assign

those indexes according to their position in the sentence. So each sentence will have its own rank

as follow :Rank =last occurrence - first occurrenceO look for ( النفي negation particles in the (ادوات

selected answer (if exist).

Page 33: Arabic question answering

Answer Selection & generation

O Using the selected answer and the logical representation of the question to generate yes ,or no a follows :

1. Yes ,if : The question and the answer are affirmative .The question and the answer are negated.

2. No, if :The question if affirmative and the answer are negated.The question is negated and the answer is affirmative.

Page 34: Arabic question answering

EXPERIMENTS AND RESULTS

69% Arabic QA system

97.3% Arabic Q-A uses QARAB

83.3% PR system

Page 35: Arabic question answering

conclusionO We have described the generic

architecture for AQ answer O compare with deferent system O How presses the question and give

the answers.